diff options
Diffstat (limited to 'NEWS')
-rw-r--r-- | NEWS | 646 |
1 files changed, 646 insertions, 0 deletions
@@ -0,0 +1,646 @@ +2011-02-02: Hunspell 1.3.2 release: + - fix library versioning + - improved manual + +2011-02-02: Hunspell 1.3.1 release: + - bug fixes + +2011-01-26: Hunspell 1.2.15/1.3 release: + - new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual + - bug fixes + +2011-01-21: + - new features: FORCEUCASE and WARN, see manual + - new options: -r to filter potential mistakes (rare words + signed by flag WARN in the dictionary) + - limited and optimized suggestions + +2011-01-06: Hunspell 1.2.14 release: + - bug fix +2011-01-03: Hunspell 1.2.13 release: + - bug fixes + - improved compound handling and + other improvements supported by OpenTaal Foundation, Netherlands +2010-07-15: Hunspell 1.2.12 release +2010-05-06: Hunspell 1.2.11 release: + - Maintenance release bug fixes +2010-04-30: Hunspell 1.2.10 release: + - Maintenance release bug fixes +2010-03-03: Hunspell 1.2.9 release: + - Maintenance release bug fixes and warnings + - MAP support for composed characters or character sequences +2008-11-01: Hunspell 1.2.8 release: + - Default BREAK feature and better hyphenated word suggestion to accept + and fix (compound) words with hyphen characters by spell checker + instead of by work breaking code of OpenOffice.org. With this feature + it's possible to accept hyphenated compound words, such as "scot-free", + where "scot" is not a correct English word. + + - ICONV & OCONV: input and output conversion tables for optional character + handling or using special inner format. Example: + + # Accepting de facto replacements of the Romanian comma acuted letters + SET UTF-8 + ICONV 4 + ICONV ÅŸ È™ + ICONV Å£ È› + ICONV Åž Ș + ICONV Å¢ Èš + + Typical usage of ICONV/OCONV is to manage an inner format for a segmental + writing system, like the Ethiopic script of the Amharic language. + + - Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like + sandhi feature of Telugu and other writing systems. + + - SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and + Norwegian compound word forms, like tillÃ¥ta (till|lÃ¥ta) and + bussjÃ¥før (buss|sjÃ¥før) + + - wordforms: word generator script for dictionary developers (Hunspell + version of unmunch). + + - bug fixes + +2008-08-15: Hunspell 1.2.7 release: + - FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can + strip full words, not only one less characters. + - COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern + matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE + for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd + etc.). + - optimized suggestions: + - modified 1-character distance suggestion algorithms: search a TRY character + in all position instead of all TRY characters in a character position + (it can give more readable suggestion order, also better suggestions + in the first positions, when TRY characters are sorted by frequency.) + For example, suggestions for "moze": + ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6), + maze, more, mote, ooze, mole etc. (Hunspell 1.2.7). + - extended compound word checking for better COMPOUNDRULE related + suggestions, for example English ordinal numbers: 121323th -> 121323rd + (it needs also a th->rd REP definition). + - bug fixes + +2008-07-15: Hunspell 1.2.6 release: + - bug fix release (fix affix rule condition checking of sk_SK dictionary, + iconv support in stemming and morphological analysis of the Hunspell + utility, see also Changelog) + +2008-07-09: Hunspell 1.2.5 release: + - bug fix release (fix affix rule condition checking of en_GB dictionary, + also morphological analysis by dictionaries with two-level suffixes) + +2008-06-18: Hunspell 1.2.4-2 release: + - fix GCC compiler warnings + +2008-06-17: Hunspell 1.2.4 release: + - add free_list() for C, C++ interfaces to deallocate suggestion lists + + - bug fixes + +2008-06-17: Hunspell 1.2.3 release: + - extended XML interface to use morphological functions by standard + spell checking interface, spell() and suggest(). See hunspell.3 manual page. + + - default dash suggestions for compound words: newword-> new word and new-word + + - new manual pages: hunspell.3, hzip.1, hunzip.1. + + - bug fixes + +2008-04-12: Hunspell 1.2.2 release: + - extended dictionary (dic file) support to use multiple base and + special dictionaries. + + - new and improved options of command line hunspell: + -m: morphological analysis or flag debug mode (without affix + rule data it signs the flag of the affix rules) + -s: stemming mode + -D: list available dictionaries and search path + -d: support extra dictionaries by comma separated list. Example: + + hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt + + - forbidding in personal dictionary (with asterisk, / signs affixation) + + - optional compressed dictionary format "hzip" for aff and dic files + usage: + hzip example.aff example.dic + mv example.aff example.dic /tmp + hunspell -d example + hunzip example.aff.hz >example.aff + hunzip example.dic.hz >example.dic + + - new affix compression tool "affixcompress": compression tool for + large (millions of words) dictionaries. + + - support encrypted dictionaries for closed OpenOffice.org extensions or + other commercial programs + + - improved manual + + - bug fixes + +2007-11-01: Hunspell 1.2.1 release: + - new memory efficient condition checking algorithm for affix rules + + - new morphological functions: + - stem() for stemming + - analyze() for morphological analysis + - generate() for morphological generation + + - new demos: + - analyze: stemming, morphological analysis and generation + - chmorph: morphological conversion of texts + +2007-09-05: Hunspell 1.1.12 release: + - dictionary based phonetic suggestion for words with + special or foreign pronounciation or alternative (bad) transliteration + (see Changelog, tests/phone.* and manual). + + - improved data structure and memory optimization for dictionaries + with variable count fields + + - bug fixes for Unicode encoding dictionaries and ngram suggestions + + - improved REP suggestions with space: it works without dictionary + modification + + - updated and new project files for Windows API + +2007-08-27: Hunspell 1.1.11 release: + - portability fixes + +2007-08-23: Hunspell 1.1.10 release: + - pronounciation based suggestion using Björn Jacke's original Aspell + phonetic transcription algorithm (http://aspell.net), relicensed under + GPL/LGPL/MPL tri-license with the permission of the author + + - keyboard base suggestion by KEY (see manual) + + - better time limits for suggestion search + + - test environment for suggestion based on Wikipedia data + + - bug fixes for non standard Mozilla platforms etc. + +2007-07-25: Hunspell 1.1.9 release: + - better tokenization: + - for URLs, mail addresses and directory paths (default: skip these tokens) + - for colons in words (for Finnish and Swedish) + + - new examples: + - affixation of personal dictionary words + - digits in words + + - bug fixes (see ChangeLog) + +2007-07-16: Hunspell 1.1.8 release: + - better Mac OS X/Cygwin and Windows compatibility + + - fix Hunspell's Valgrind environment and memory handling errors + detected by Valgrind + + - other bug fixes (see ChangeLog) + +2007-07-06: Hunspell 1.1.7 release: + - fix warning messages of OpenOffice.org build + +2007-06-29: Hunspell 1.1.6 release: + - check capitalization of the following word forms + - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG + - allcap words and suffixes: UNICEF's - UNICEF'S + - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA + + - suggestion for missing sentence spacing: something.The -> something. The + + - Hunspell executable: improved locale support + - -i option: custom input encoding + - use locale data for default dictionary names. + - tools/hunspell.cxx: fix 8-bit tokenization (letters without + casing, like ß or Hebrew characters now are handled well) + - dictionary search path (automatic detection of OpenOffice.org directories) + - DICPATH environmental variable + - -D option: show directory path of loaded dictionary + + - patches and bug fixes for Mozilla, OpenOffice.org. + +2007-03-19: Hunspell 1.1.5 release: + - optimizations: 10-100% speed up, smaller code size and memory footprint + (conditional experimental code and warning messages) + + - extended Unicode support: + - non BMP Unicode characters in dictionary words and affixes (except + affix rules and conditions) + - support BOM sequence in aff and dic files + + - IGNORE feature for Arabic diacritics and other optional characters + + - New edit distance suggestion methods: + - capitalisation: nasa -> NASA + - long swap: permenant -> permanent + - long move: Ghandi -> Gandhi, greatful -> grateful + - double two characters: vacacation -> vacation + - spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word) + + - patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua, + German and Arabic language, etc. + +2006-02-01: Hunspell 1.1.4 release: + - Improved suggestion for typical OCR bugs (missing spaces between + capitalized words). For example: "aNew" -> "a New". + http://qa.openoffice.org/issues/show_bug.cgi?id=58202 + + - tokenization fixes (fix incomplete tokenization of input texts on big-endian + platforms, and locale-dependent tokenization of dictionary entries) + +2006-01-06: Hunspell 1.1.3.2 release: + - fix Visual C++ compiling errors + +2006-01-05: Hunspell 1.1.3 release: + - GPL/LGPL/MPL tri-license for Mozilla integration + + - Alias compression of flag sets and morphological descriptions. + (For example, 16 MB Arabic dic file can be compressed to 1 MB.) + + - Improved suggestion. + + - Improved, language independent German sharp s casing with CHECKSHARPS + declaration. + + - Unicode tokenization in Hunspell program. + + - Bug fixes (at new and old compound word handling methods), etc. + +2005-11-11: Hunspell 1.1.2 release: + + - Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND + suggestions) + + - Checked with 51 regression tests in Valgrind debugging environment, + and tested with 52 OOo dictionaries on i686-pc-linux platform. + +2005-11-09: Hunspell 1.1.1 release: + + - Compound word patterns for complex compound word handling and + simple word-level lexical scanning. Ideal for checking + Arabic and Roman numbers, ordinal numbers in English, affixed + numbers in agglutinative languages, etc. + http://qa.openoffice.org/issues/show_bug.cgi?id=53643 + + - Support ISO-8859-15 encoding for French (French oe ligatures are + missing from the latin-1 encoding). + http://qa.openoffice.org/issues/show_bug.cgi?id=54980 + + - Implemented a flag to forbid obscene word suggestion: + http://qa.openoffice.org/issues/show_bug.cgi?id=55498 + + - Checked with 50 regression tests in Valgrind debugging environment, + and tested with 52 OOo dictionaries. + + - other improvements and bug fixes (see ChangeLog) + +2005-09-19: Hunspell 1.1.0 release + +* complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta) + +* improved ngram suggestion with swap character detection and + case insensitivity + +------ examples for ngram improvement (input word and suggestions) ----- + +1. pernament (instead of permanent) + +MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, + ornament, ornamentals, ornamental, ornamentally + +Hunspell 1.0.9: ornamental, ornament, tournament + +Hunspell 1.1.0: permanent + +Note: swap character detection + + +2. PERNAMENT (instead of PERMANENT) + +MySpell 3.2: - + +Hunspell 1.0.9: - + +Hunspell 1.1.0: PERMANENT + + +3. Unesco (instead of UNESCO) + +MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's, + Frescoed, Fresco, Escorts, Escorting + +Hunspell 1.0.9: Genesco, Ionesco, Fresco + +Hunspell 1.1.0: UNESCO + + +4. siggraph's (instead of SIGGRAPH's) + +MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's, + physiography, digraphs, serigraph, stratigraphy's, stratigraphy + epigraphs + +Hunspell 1.0.9: serigraph's, epigraph's, digraph's + +Hunspell 1.1.0: SIGGRAPH's + +--------------- end of examples -------------------- + +* improved testing environment with suggestion checking and memory debugging + + memory debugging of all tests with a simple command: + + VALGRIND=memcheck make check + +* lots of other improvements and bug fixes (see ChangeLog) + + +2005-08-26: Hunspell 1.0.9 release + +* improved related character map suggestion + +* improved ngram suggestion + +------ examples for ngram improvement (O=old, N = new ngram suggestions) -- + +1. Permenant (instead of Permanent) + +O: Endangerment, Ferment, Fermented, Deferment's, Empowerment, + Ferment's, Ferments, Fermenting, Countermen, Weathermen + +N: Permanent, Supermen, Preferment + +Note: Ngram suggestions was case sensitive. + +2. permenant (instead of permanent) + +O: supermen, newspapermen, empowerment, endangerment, preferments, + preferment, permanent, preferment's, permanently, impermanent + +N: permanent, supermen, preferment + +Note: new suggestions are also weighted with longest common subsequence, +first letter and common character positions + +3. pernemant (instead of permanent) + +O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent, + supernatant, impermanent, semipermanent, impermanently + +N: permanent, supernatant, pimpernel + +Note: new method also prefers root word instead of not +relevant affixes ('s, s and ly) + + +4. pernament (instead of permanent) + +O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, + ornament, ornamentals, ornamental, ornamentally + +N: ornamental, ornament, tournament + +Note: Both ngram methods misses here. + + +5. obvus (instad of obvious): + +O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse, + obviates, obviate, Travus + +N: obvious, obtuse, obverse + +Note: new method also prefers common first letters. + + +6. unambigus (instead of unambiguous) + +O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous, + unambitious, ambiguities, ambiguousness + +N: unambiguous, unambiguity, unambitious + + + +7. consecvence (instead of consequence) + +O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence, + consecutiveness's, convenience's, consistences, consistence + +N: consequence, consecutive, consecrates + + +An example in a language with rich morphology: + +8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]): + +O: Misikédéiben, Pisisedéiben, Misikéiéiben, Pisisekéiben, Misikéiben, + Misikéidéiben, Misikékéiben, Misikéikéiben, Misikéiméiben, Mississippiiben + +N: Mississippiben, Mississippiiben, Misiiben + +Note: Suggesting not relevant affixes was the biggest fault in ngram + suggestion for languages with a lot of affixes. + +--------------- end of examples -------------------- + +* support twofold prefix cutting + +* lots of other improvements and bug fixes (see ChangeLog) + +* test Hunspell with 54 OpenOffice.org dictionaries: + +source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries + +testing shell script: +------------------------------------------------------- +for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'` +do + dic=`basename $i .zip` + mkdir $dic + echo unzip $dic + unzip -d $dic $i 2>/dev/null + cd $dic + echo unmunch and test $dic + unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' | + hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result + cd .. +done +-------------------------------------------------------- + +test result (0 size is o.k.): + +$ for i in *_*/*.result; do wc -c $i; done +0 af_ZA/af_ZA.result +0 bg_BG/bg_BG.result +0 ca_ES/ca_ES.result +0 cy_GB/cy_GB.result +0 cs_CZ/cs_CZ.result +0 da_DK/da_DK.result +0 de_AT/de_AT.result +0 de_CH/de_CH.result +0 de_DE/de_DE.result +0 el_GR/el_GR.result +6 en_AU/en_AU.result +0 en_CA/en_CA.result +0 en_GB/en_GB.result +0 en_NZ/en_NZ.result +0 en_US/en_US.result +0 eo_EO/eo_EO.result +0 es_ES/es_ES.result +0 es_MX/es_MX.result +0 es_NEW/es_NEW.result +0 fo_FO/fo_FO.result +0 fr_FR/fr_FR.result +0 ga_IE/ga_IE.result +0 gd_GB/gd_GB.result +0 gl_ES/gl_ES.result +0 he_IL/he_IL.result +0 hr_HR/hr_HR.result +200694989 hu_HU/hu_HU.result +0 id_ID/id_ID.result +0 it_IT/it_IT.result +0 ku_TR/ku_TR.result +0 lt_LT/lt_LT.result +0 lv_LV/lv_LV.result +0 mg_MG/mg_MG.result +0 mi_NZ/mi_NZ.result +0 ms_MY/ms_MY.result +0 nb_NO/nb_NO.result +0 nl_NL/nl_NL.result +0 nn_NO/nn_NO.result +0 ny_MW/ny_MW.result +0 pl_PL/pl_PL.result +0 pt_BR/pt_BR.result +0 pt_PT/pt_PT.result +0 ro_RO/ro_RO.result +0 ru_RU/ru_RU.result +0 rw_RW/rw_RW.result +0 sk_SK/sk_SK.result +0 sl_SI/sl_SI.result +0 sv_SE/sv_SE.result +0 sw_KE/sw_KE.result +0 tet_ID/tet_ID.result +0 tl_PH/tl_PH.result +0 tn_ZA/tn_ZA.result +0 uk_UA/uk_UA.result +0 zu_ZA/zu_ZA.result + +In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but +`eqn.' is missing. Presumably it is a dictionary bug. Myspell also +haven't accepted it. + +Hungarian dictionary contains pseudoroots and forbidden words. +Unmunch haven't supported these features yet, and generates bad words, too. + +* check affix rules and OOo dictionaries. Detected bugs in cs_CZ, +es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries). + +Details: +-------------------------------------------------------- +cs_CZ +warning - incompatible stripping characters and condition: +SFX D us ech [^ighk]os +SFX D us y [^i]os +SFX Q os ech [^ghk]es +SFX M o ech [^ghkei]a +SFX J ém ej ám +SFX J ém ejme ám +SFX J ém ejte ám +SFX A ou¾it up oupit +SFX A ou¾it upme oupit +SFX A ou¾it upte oupit +SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy +SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy + +es_ES +warning - incompatible stripping characters and condition: +SFX W umar úse [ae]husar +SFX W emir iñáis eñir + +es_NEW +warning - incompatible stripping characters and condition: +SFX I unan únen unar + +es_MX +warning - incompatible stripping characters and condition: +SFX A a ote e +SFX W umar úse [ae]husar +SFX W emir iñáis eñir + +lt_LT +warning - incompatible stripping characters and condition: +SFX U ti siuosi tis +SFX U ti siuosi tis +SFX U ti siesi tis +SFX U ti siesi tis +SFX U ti sis tis +SFX U ti sis tis +SFX U ti simës tis +SFX U ti simës tis +SFX U ti sitës tis +SFX U ti sitës tis + +nn_NO +warning - incompatible stripping characters and condition: +SFX D ar rar [^fmk]er +SFX U Øre orde ere +SFX U Øre ort ere + +pt_PT +warning - incompatible stripping characters and condition: +SFX g ãos oas ão +SFX g ãos oas ão + +ro_RO +warning - bad field number: +SFX L 0 le [^cg] i +SFX L 0 i [cg] i +SFX U 0 i [^i] ii +warning - incompatible stripping characters and condition: +SFX P l i l [<- there is an unnecessary tabulator here) +SFX I a ii [gc] a +warning - bad field number: +SFX I a ii [gc] a +SFX I a ei [^cg] a + +sk_SK +warning - incompatible stripping characters and condition: +SFX T µa» olú kla» +SFX T µa» olúc kla» +SFX T sµa» ¹lú sla» +SFX T sµa» ¹lúc sla» +SFX R µc» lèiem åc» +SFX R iás» ätie mias» +SFX R iez» iem [^i]ez» +SFX R iez» ie¹ [^i]ez» +SFX R iez» ie [^i]ez» +SFX R iez» eme [^i]ez» +SFX R iez» ete [^i]ez» +SFX R iez» ú [^i]ez» +SFX R iez» úc [^i]ez» +SFX R iez» z [^i]ez» +SFX R iez» me [^i]ez» +SFX R iez» te [^i]ez» + +sv_SE +warning - bad field number: +SFX C 0 net nets [^e]n +-------------------------------------------------------- + +2005-08-01: Hunspell 1.0.8 release + +- improved compound word support +- fix German S handling +- port MySpell files and MAP feature + +2005-07-22: Hunspell 1.0.7 release + +2005-07-21: new home page: http://hunspell.sourceforge.net |