[sisyphus] IQ: enca -- charset guesser

Michael Shigorin =?iso-8859-1?q?mike_=CE=C1_lic145=2Ekiev=2Eua?=
Чт Мар 28 23:52:47 MSK 2002


	Здравствуйте.
Разгребая ~/Download (и дописывая шестидесятую строчку в
~/ALT/TODO :), наткнулся на любопытную штучку -- enca:

---
Name        : enca                         Relocations: /usr 
Version     : 0.9.3                             Vendor: Trific soft.
Release     : 1                             Build Date: Thu Mar 28 22:10:23 2002
Install date: Thu Mar 28 22:18:09 2002      Build Host: work.fair.net
Group       : Applications/Text             Source RPM: enca-0.9.3-1.src.rpm
Size        : 173998                           License: GNU GPL v2
Packager    : David Necas (Yeti) <yeti на physics.muni.cz>
URL         : http://physics.muni.cz/~yeti/software/enca.shtml
Summary     : A program that guesses encoding of text files.
Description :
Enca (Extremely Naive Charset Analyser) is a simple utility guessing
encoding of text files and optionally converting them to some other
encoding using either a built-in convertor, a system conversion library
or an external conversion program.  Currently, it has support for Czech,
Slovak, Russian and some multibyte encodings (mostly variants of Unicode)
independent on language.

Install Enca if you need to cope with text files of dubious origin
and unknown encoding and convert them to some reasonable encoding.
---

Есть подозрение, что в постмастерскую эпоху я буду это дело
собирать в Sisyphus (уж больно понравилось), а пока есть
неформальное предложение всем заинтересованным посмотреть и
оценить возможности вкручивания в существующий софт (навскидку --
у меня сейчас прикручен в mc view последним дефолтом подобный
конвертор -- собственно, я как-то его упоминал).  Оно довольно
умное:

---
Usage:  enca [-L LANGUAGE] [OPTION]... [FILE]...
Guess encoding of text files and convert them if required.

Output type selectors:
-d, --details           print detailed information about how the guess was made
-e, --enca-name         print enca's encoding name (passed to convertors)
-f, --human-readable    print full (descriptive) encoding name (default)
-i, --iconv-name        print how iconv calls the encoding
-r, --rfc1345-name      print RFC 1345 (or otherwise canonized) encoding name
-s, --cstocs-name       print how cstocs calls the encoding
-n, --name=WORD         print required name (enca-name, human-readable, etc.)
-x, --convert-to=ENC    convert file to some other encoding ENC

Guessing parameters:
-L, --language=LANG     set language of FILEs---obligatory, when cannot be
                        determined from locale settings
-m, --no-short-message  turn off short message mode, reset defaults
-M, --short-message     turn on short message (ambiguous) mode
-R, --max-chars=NUM     set maximum number of bytes read from input file
-S, --significant=NUM   set required number of significant characters
-T, --threshold=FLOAT   set threshold (the smallest allowed ratio between the
                        most probable encoding and the second most probable)
-u, --multibyte         try multibyte encodings too (default)
-U, --no-multibyte      don't try multibyte encodings (somewhat faster)

Conversion parameters:
-E, --external-convertor-program=PATH
                        set external convertor program name (default: )
-C, --try-convertors=LIST  convertors to be tried (associative)
                        (default: built-in,iconv)

General options:
-p, --with-filename     print the file name for each result
-P, --no-filename       suppress the prefixing filename on output
-V, --verbose           increase verbosity level

Listings:
-G, --license           print full enca license (GNU GPL v2) and terminate
-h, --help              print this help and terminate
-l, --list=WORD         print required list (built-in-encodings, convertors,
                        encodings, languages, lists, names, surfaces)
                        and terminate
-v, --version           print version and build information and terminate

With no FILE, read standard input and possibly write converted stream to
standard output.  Exit status is 0 if all files were successfully proceeded,
1 if some were not recognized or converted, 2 in troubles.

Report bugs to <yeti на physics.muni.cz> (please include `enca' in subject).
---

Украинский/белорусский там в данный момент не поддерживаются --
но, судя по описанию, это дело техники.

-- 
 ---- WBR, Michael Shigorin <mike на altlinux.ru>
  ------ http://visa.chem.univ.kiev.ua/~mike/
----------- следующая часть -----------
Было удалено вложение не в текстовом формате...
Имя     : =?iso-8859-1?q?=CF=D4=D3=D5=D4=D3=D4=D7=D5=C5=D4?=
Тип     : application/pgp-signature
Размер  : 232 байтов
Описание: =?iso-8859-1?q?=CF=D4=D3=D5=D4=D3=D4=D7=D5=C5=D4?=
Url     : <http://lists.altlinux.org/pipermail/sisyphus/attachments/20020328/6e69511e/attachment-0012.bin>


Подробная информация о списке рассылки Sisyphus