[devel] Re: Q: ispell-en and russian words

Чт Июн 17 21:59:09 MSD 2004

On Tue, Jun 15, 2004 at 09:42:53PM +0400, Alexey Tourbin wrote:
> У меня ispell из Мастра 2.2 стоял на холде из-за одной неприятной
> особенности.  Теперь при тестировании epa7 я вижу, что эта особенность
> осталась.
> 
> $ echo айспелл | ispell -a
> @(#) International Ispell Version 3.1.20 10/10/95
> 
> $ rpm -Uvh ...
> $ echo айспелл | ispell -a
> @(#) International Ispell Version 3.2.06 08/01/01
> # айспелл 0
> 
> $
> 
> Т.е. в более новых версиях при использовании английского словаря все
> русские слова отмечаются как ошибочные.  Почему так происходит, я пока
> не знаю.  Никто не сталкивался?

Дело, кажется, в следующем: 

--- ispell-3.1.20-ipl18mdk/ispell-3.1/languages/english/english.aff     1995-01-23 18:28:30 +0000
+++ ispell-3.2.06-alt2/ispell-3.2.06/languages/english/english.aff      2001-07-25 21:51:47 +0000
<...>

-# First we declare the character set.  Since it's English, it's easy.
-# The only special character is the apostrophe, so that possessives can
+# First we declare the character set.  Since it's English, it would be
+# easy, except that English likes to borrow accents (notably
+# acute/grave) from other languages.  To be safe, we'll declare a majority
+# of ISO Latin-1.  However, we do not declare the German "ess-zed" in
+# capitalized form, because doing so would cause troubles with certain
+# other misspellings; see the German affix files for more information.
+#
+# In keeping with the march of progress, ISO Latin-1 is the default
+# encoding.  This helps us avoid some of the more obviously difficult
+# problems involving encoding acute and grave accents as apostrophes.
+#
+# We also declare the apostrophe, so that possessives can
 # be handled.  We declare it as a boundary character, so that quoting with

<...>

 boundarychars '
-wordchars [a-z] [A-Z]

-altstringtype "tex" "tex" ".tex" ".bib"
+wordchars      a       A
+stringchar     \xE0    \xC0    # аА Latin letter A with grave
+stringchar     \xE1    \xC1    # бБ Latin letter A with acute
+stringchar     \xE2    \xC2    # вВ Latin letter A with circumflex
+stringchar     \xE3    \xC3    # гГ Latin letter A with tilde
+stringchar     \xE4    \xC4    # дД Latin letter A with diaeresis
+stringchar     \xE5    \xC5    # еЕ Latin letter A with ring above
+stringchar     \xE6    \xC6    # жЖ Latin letter AE
+wordchars      [bc]    [BC]
<...>

Т.е. вместо [a-zA-Z] стали перечислять всё подряд, включая acute, grave
и т.д.  В результате невозможно работать с (частично)английским текстом,
который содержит русские слова (по крайней мере в кодировке cp1251).
Т.е. все русские слова считаются у меня ошибочными.

Что с этим можно сделать?

PS: перевожу тему в devel на .
----------- следующая часть -----------
Было удалено вложение не в текстовом формате...
Имя     : =?iso-8859-1?q?=CF=D4=D3=D5=D4=D3=D4=D7=D5=C5=D4?=
Тип     : application/pgp-signature
Размер  : 189 байтов
Описание: =?iso-8859-1?q?=CF=D4=D3=D5=D4=D3=D4=D7=D5=C5=D4?=
Url     : <http://lists.altlinux.org/pipermail/devel/attachments/20040617/342b1801/attachment-0001.bin>