1 files changed, 646 insertions, 0 deletions
diff --git a/NEWS b/NEWS
new file mode 100644
index 0000000..957a70e
--- /dev/null
+++ b/NEWS
@@ -0,0 +1,646 @@
+2011-02-02: Hunspell 1.3.2 release:
+  - fix library versioning
+  - improved manual 
+
+2011-02-02: Hunspell 1.3.1 release:
+  - bug fixes
+
+2011-01-26: Hunspell 1.2.15/1.3 release:
+  - new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual
+  - bug fixes
+
+2011-01-21:
+  - new features: FORCEUCASE and WARN, see manual
+  - new options: -r to filter potential mistakes (rare words
+    signed by flag WARN in the dictionary)
+  - limited and optimized suggestions
+
+2011-01-06: Hunspell 1.2.14 release:
+  - bug fix
+2011-01-03: Hunspell 1.2.13 release:
+  - bug fixes
+  - improved compound handling and
+    other improvements supported by OpenTaal Foundation, Netherlands
+2010-07-15: Hunspell 1.2.12 release
+2010-05-06: Hunspell 1.2.11 release:
+  - Maintenance release bug fixes
+2010-04-30: Hunspell 1.2.10 release:
+  - Maintenance release bug fixes
+2010-03-03: Hunspell 1.2.9 release:
+  - Maintenance release bug fixes and warnings
+  - MAP support for composed characters or character sequences
+2008-11-01: Hunspell 1.2.8 release:
+  - Default BREAK feature and better hyphenated word suggestion to accept
+    and fix (compound) words with hyphen characters by spell checker
+    instead of by work breaking code of OpenOffice.org. With this feature
+    it's possible to accept hyphenated compound words, such as "scot-free",
+    where "scot" is not a correct English word.
+
+  - ICONV & OCONV: input and output conversion tables for optional character
+    handling or using special inner format. Example:
+
+  # Accepting de facto replacements of the Romanian comma acuted letters
+  SET UTF-8
+  ICONV 4
+  ICONV ş ș
+  ICONV ţ ț
+  ICONV Ş Ș
+  ICONV Ţ Ț
+
+    Typical usage of ICONV/OCONV is to manage an inner format for a segmental
+    writing system, like the Ethiopic script of the Amharic language.
+
+  - Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like
+    sandhi feature of Telugu and other writing systems.
+
+  - SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and
+    Norwegian compound word forms, like tillåta (till|låta) and
+    bussjåfør (buss|sjåfør)
+
+  - wordforms: word generator script for dictionary developers (Hunspell
+    version of unmunch).
+
+  - bug fixes
+
+2008-08-15: Hunspell 1.2.7 release:
+  - FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can
+    strip full words, not only one less characters.
+  - COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern
+    matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE
+    for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd
+    etc.).
+  - optimized suggestions:
+    - modified 1-character distance suggestion algorithms: search a TRY character
+      in all position instead of all TRY characters in a character position
+      (it can give more readable suggestion order, also better suggestions
+      in the first positions, when TRY characters are sorted by frequency.)
+      For example, suggestions for "moze":
+      ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6),
+      maze, more, mote, ooze, mole etc. (Hunspell 1.2.7).
+    - extended compound word checking for better COMPOUNDRULE related
+      suggestions, for example English ordinal numbers: 121323th -> 121323rd
+      (it needs also a th->rd REP definition).
+  - bug fixes
+
+2008-07-15: Hunspell 1.2.6 release:
+  - bug fix release (fix affix rule condition checking of sk_SK dictionary,
+    iconv support in stemming and morphological analysis of the Hunspell
+    utility, see also Changelog)
+
+2008-07-09: Hunspell 1.2.5 release:
+  - bug fix release (fix affix rule condition checking of en_GB dictionary,
+    also morphological analysis by dictionaries with two-level suffixes)
+
+2008-06-18: Hunspell 1.2.4-2 release:
+  - fix GCC compiler warnings
+
+2008-06-17: Hunspell 1.2.4 release:
+  - add free_list() for C, C++ interfaces to deallocate suggestion lists
+  
+  - bug fixes
+
+2008-06-17: Hunspell 1.2.3 release:
+  - extended XML interface to use morphological functions by standard
+    spell checking interface, spell() and suggest(). See hunspell.3 manual page.
+
+  - default dash suggestions for compound words: newword-> new word and new-word
+
+  - new manual pages: hunspell.3, hzip.1, hunzip.1.
+  
+  - bug fixes
+
+2008-04-12: Hunspell 1.2.2 release:
+  - extended dictionary (dic file) support to use multiple base and
+    special dictionaries.
+    
+  - new and improved options of command line hunspell:
+    -m: morphological analysis or flag debug mode (without affix
+        rule data it signs the flag of the affix rules)
+    -s: stemming mode
+    -D: list available dictionaries and search path
+    -d: support extra dictionaries by comma separated list. Example:
+    
+    hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt
+
+    - forbidding in personal dictionary (with asterisk, / signs affixation)
+
+  - optional compressed dictionary format "hzip" for aff and dic files
+    usage:
+    hzip example.aff example.dic
+    mv example.aff example.dic /tmp
+    hunspell -d example
+    hunzip example.aff.hz >example.aff
+    hunzip example.dic.hz >example.dic
+
+  - new affix compression tool "affixcompress": compression tool for
+    large (millions of words) dictionaries.
+
+  - support encrypted dictionaries for closed OpenOffice.org extensions or
+    other commercial programs
+
+  - improved manual
+
+  - bug fixes
+
+2007-11-01: Hunspell 1.2.1 release:
+  - new memory efficient condition checking algorithm for affix rules
+  
+  - new morphological functions:
+    - stem() for stemming
+    - analyze() for morphological analysis
+    - generate() for morphological generation
+
+  - new demos:
+    - analyze: stemming, morphological analysis and generation
+    - chmorph: morphological conversion of texts
+
+2007-09-05: Hunspell 1.1.12 release:
+  - dictionary based phonetic suggestion for words with
+    special or foreign pronounciation or alternative (bad) transliteration
+    (see Changelog, tests/phone.* and manual).
+
+  - improved data structure and memory optimization for dictionaries
+    with variable count fields
+
+  - bug fixes for Unicode encoding dictionaries and ngram suggestions
+  
+  - improved REP suggestions with space: it works without dictionary
+    modification
+
+  - updated and new project files for Windows API
+
+2007-08-27: Hunspell 1.1.11 release:
+  - portability fixes
+
+2007-08-23: Hunspell 1.1.10 release:
+  - pronounciation based suggestion using Bj�rn Jacke's original Aspell
+    phonetic transcription algorithm (http://aspell.net), relicensed under
+    GPL/LGPL/MPL tri-license with the permission of the author
+
+  - keyboard base suggestion by KEY (see manual)
+
+  - better time limits for suggestion search
+
+  - test environment for suggestion based on Wikipedia data
+
+  - bug fixes for non standard Mozilla platforms etc.
+
+2007-07-25: Hunspell 1.1.9 release:
+  - better tokenization:
+    - for URLs, mail addresses and directory paths (default: skip these tokens)
+    - for colons in words (for Finnish and Swedish)
+  
+  - new examples:
+    - affixation of personal dictionary words
+    - digits in words
+
+  - bug fixes (see ChangeLog)
+
+2007-07-16: Hunspell 1.1.8 release:
+  - better Mac OS X/Cygwin and Windows compatibility
+
+  - fix Hunspell's Valgrind environment and memory handling errors
+    detected by Valgrind
+
+  - other bug fixes (see ChangeLog)
+
+2007-07-06: Hunspell 1.1.7 release:
+  - fix warning messages of OpenOffice.org build
+
+2007-06-29: Hunspell 1.1.6 release:
+  - check capitalization of the following word forms
+    - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG
+    - allcap words and suffixes: UNICEF's - UNICEF'S
+    - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA
+
+  - suggestion for missing sentence spacing: something.The -> something. The
+
+  - Hunspell executable: improved locale support
+    - -i option: custom input encoding
+    - use locale data for default dictionary names. 
+    - tools/hunspell.cxx: fix 8-bit tokenization (letters without
+      casing, like ß or Hebrew characters now are handled well) 
+    - dictionary search path (automatic detection of OpenOffice.org directories)
+    - DICPATH environmental variable
+    - -D option: show directory path of loaded dictionary
+
+  - patches and bug fixes for Mozilla, OpenOffice.org.
+
+2007-03-19: Hunspell 1.1.5 release:
+  - optimizations: 10-100% speed up, smaller code size and memory footprint
+    (conditional experimental code and warning messages)
+
+  - extended Unicode support:
+    - non BMP Unicode characters in dictionary words and affixes (except
+      affix rules and conditions)
+    - support BOM sequence in aff and dic files
+
+  - IGNORE feature for Arabic diacritics and other optional characters
+
+  - New edit distance suggestion methods:
+    - capitalisation: nasa -> NASA
+    - long swap: permenant -> permanent
+    - long move: Ghandi -> Gandhi, greatful -> grateful
+    - double two characters: vacacation -> vacation
+    - spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word)
+
+  - patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua,
+    German and Arabic language, etc.
+
+2006-02-01: Hunspell 1.1.4 release:
+  - Improved suggestion for typical OCR bugs (missing spaces between
+    capitalized words). For example: "aNew" -> "a New".
+    http://qa.openoffice.org/issues/show_bug.cgi?id=58202
+
+  - tokenization fixes (fix incomplete tokenization of input texts on big-endian
+    platforms, and locale-dependent tokenization of dictionary entries)
+
+2006-01-06: Hunspell 1.1.3.2 release:
+  - fix Visual C++ compiling errors
+
+2006-01-05: Hunspell 1.1.3 release:
+  - GPL/LGPL/MPL tri-license for Mozilla integration
+  
+  - Alias compression of flag sets and morphological descriptions.
+    (For example, 16 MB Arabic dic file can be compressed to 1 MB.)
+  
+  - Improved suggestion.
+  
+  - Improved, language independent German sharp s casing with CHECKSHARPS
+    declaration.
+
+  - Unicode tokenization in Hunspell program.
+  
+  - Bug fixes (at new and old compound word handling methods), etc.
+
+2005-11-11: Hunspell 1.1.2 release:
+
+  - Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND
+    suggestions)
+
+  - Checked with 51 regression tests in Valgrind debugging environment,
+    and tested with 52 OOo dictionaries on i686-pc-linux platform.
+
+2005-11-09: Hunspell 1.1.1 release:
+
+  - Compound word patterns for complex compound word handling and
+    simple word-level lexical scanning. Ideal for checking
+    Arabic and Roman numbers, ordinal numbers in English, affixed
+    numbers in agglutinative languages, etc.
+    http://qa.openoffice.org/issues/show_bug.cgi?id=53643
+
+  - Support ISO-8859-15 encoding for French (French oe ligatures are
+    missing from the latin-1 encoding).
+    http://qa.openoffice.org/issues/show_bug.cgi?id=54980
+    
+  - Implemented a flag to forbid obscene word suggestion:
+    http://qa.openoffice.org/issues/show_bug.cgi?id=55498
+
+  - Checked with 50 regression tests in Valgrind debugging environment,
+    and tested with 52 OOo dictionaries.
+
+  - other improvements and bug fixes (see ChangeLog)
+
+2005-09-19: Hunspell 1.1.0 release
+
+* complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta)
+
+* improved ngram suggestion with swap character detection and
+  case insensitivity
+
+------ examples for ngram improvement (input word and suggestions) -----
+
+1. pernament (instead of permanent)
+
+MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
+        ornament, ornamentals, ornamental, ornamentally
+
+Hunspell 1.0.9: ornamental, ornament, tournament
+
+Hunspell 1.1.0: permanent
+
+Note: swap character detection
+
+
+2. PERNAMENT (instead of PERMANENT)
+
+MySpell 3.2: -
+
+Hunspell 1.0.9: -
+
+Hunspell 1.1.0: PERMANENT
+
+
+3. Unesco (instead of UNESCO)
+
+MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's,
+             Frescoed, Fresco, Escorts, Escorting
+
+Hunspell 1.0.9: Genesco, Ionesco, Fresco
+
+Hunspell 1.1.0: UNESCO
+
+
+4. siggraph's (instead of SIGGRAPH's)
+
+MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's,
+             physiography, digraphs, serigraph, stratigraphy's, stratigraphy
+             epigraphs
+
+Hunspell 1.0.9: serigraph's, epigraph's, digraph's
+
+Hunspell 1.1.0: SIGGRAPH's
+
+--------------- end of examples --------------------
+
+* improved testing environment with suggestion checking and memory debugging
+
+  memory debugging of all tests with a simple command:
+  
+  VALGRIND=memcheck make check
+
+* lots of other improvements and bug fixes (see ChangeLog)
+
+
+2005-08-26: Hunspell 1.0.9 release
+
+* improved related character map suggestion
+
+* improved ngram suggestion
+
+------ examples for ngram improvement (O=old, N = new ngram suggestions) --
+
+1. Permenant (instead of Permanent)
+
+O: Endangerment, Ferment, Fermented, Deferment's, Empowerment,
+        Ferment's, Ferments, Fermenting, Countermen, Weathermen
+
+N: Permanent, Supermen, Preferment
+
+Note: Ngram suggestions was case sensitive.
+
+2. permenant (instead of permanent) 
+
+O: supermen, newspapermen, empowerment, endangerment, preferments,
+        preferment, permanent, preferment's, permanently, impermanent
+
+N: permanent, supermen, preferment
+
+Note: new suggestions are also weighted with longest common subsequence,
+first letter and common character positions
+
+3. pernemant (instead of permanent) 
+
+O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent,
+        supernatant, impermanent, semipermanent, impermanently
+
+N: permanent, supernatant, pimpernel
+
+Note: new method also prefers root word instead of not
+relevant affixes ('s, s and ly)
+
+
+4. pernament (instead of permanent)
+
+O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
+        ornament, ornamentals, ornamental, ornamentally
+
+N: ornamental, ornament, tournament
+
+Note: Both ngram methods misses here.
+
+
+5. obvus (instad of obvious):
+
+O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse,
+        obviates, obviate, Travus
+
+N: obvious, obtuse, obverse
+
+Note: new method also prefers common first letters.
+
+
+6. unambigus (instead of unambiguous) 
+
+O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous,
+        unambitious, ambiguities, ambiguousness
+
+N: unambiguous, unambiguity, unambitious
+
+
+
+7. consecvence (instead of consequence)
+
+O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence,
+        consecutiveness's, convenience's, consistences, consistence
+
+N: consequence, consecutive, consecrates
+
+
+An example in a language with rich morphology:
+
+8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]):
+
+O: Misik�d�iben, Pisised�iben, Misik�i�iben, Pisisek�iben, Misik�iben,
+        Misik�id�iben, Misik�k�iben, Misik�ik�iben, Misik�im�iben, Mississippiiben
+
+N: Mississippiben, Mississippiiben, Misiiben
+
+Note: Suggesting not relevant affixes was the biggest fault in ngram
+   suggestion for languages with a lot of affixes.
+
+--------------- end of examples --------------------
+
+* support twofold prefix cutting
+
+* lots of other improvements and bug fixes (see ChangeLog)
+
+* test Hunspell with 54 OpenOffice.org dictionaries:
+
+source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries
+
+testing shell script:
+-------------------------------------------------------
+for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'`
+do
+	dic=`basename $i .zip`
+	mkdir $dic
+	echo unzip $dic
+	unzip -d $dic $i 2>/dev/null
+	cd $dic
+	echo unmunch and test $dic
+	unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' |
+	hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result
+	cd ..
+done
+--------------------------------------------------------
+
+test result (0 size is o.k.):
+
+$ for i in *_*/*.result; do wc -c $i; done 
+0 af_ZA/af_ZA.result
+0 bg_BG/bg_BG.result
+0 ca_ES/ca_ES.result
+0 cy_GB/cy_GB.result
+0 cs_CZ/cs_CZ.result
+0 da_DK/da_DK.result
+0 de_AT/de_AT.result
+0 de_CH/de_CH.result
+0 de_DE/de_DE.result
+0 el_GR/el_GR.result
+6 en_AU/en_AU.result
+0 en_CA/en_CA.result
+0 en_GB/en_GB.result
+0 en_NZ/en_NZ.result
+0 en_US/en_US.result
+0 eo_EO/eo_EO.result
+0 es_ES/es_ES.result
+0 es_MX/es_MX.result
+0 es_NEW/es_NEW.result
+0 fo_FO/fo_FO.result
+0 fr_FR/fr_FR.result
+0 ga_IE/ga_IE.result
+0 gd_GB/gd_GB.result
+0 gl_ES/gl_ES.result
+0 he_IL/he_IL.result
+0 hr_HR/hr_HR.result
+200694989 hu_HU/hu_HU.result
+0 id_ID/id_ID.result
+0 it_IT/it_IT.result
+0 ku_TR/ku_TR.result
+0 lt_LT/lt_LT.result
+0 lv_LV/lv_LV.result
+0 mg_MG/mg_MG.result
+0 mi_NZ/mi_NZ.result
+0 ms_MY/ms_MY.result
+0 nb_NO/nb_NO.result
+0 nl_NL/nl_NL.result
+0 nn_NO/nn_NO.result
+0 ny_MW/ny_MW.result
+0 pl_PL/pl_PL.result
+0 pt_BR/pt_BR.result
+0 pt_PT/pt_PT.result
+0 ro_RO/ro_RO.result
+0 ru_RU/ru_RU.result
+0 rw_RW/rw_RW.result
+0 sk_SK/sk_SK.result
+0 sl_SI/sl_SI.result
+0 sv_SE/sv_SE.result
+0 sw_KE/sw_KE.result
+0 tet_ID/tet_ID.result
+0 tl_PH/tl_PH.result
+0 tn_ZA/tn_ZA.result
+0 uk_UA/uk_UA.result
+0 zu_ZA/zu_ZA.result
+
+In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but
+`eqn.' is missing. Presumably it is a dictionary bug. Myspell also
+haven't accepted it.
+
+Hungarian dictionary contains pseudoroots and forbidden words.
+Unmunch haven't supported these features yet, and generates bad words, too.
+
+* check affix rules and OOo dictionaries. Detected bugs in cs_CZ,
+es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries).
+
+Details:
+--------------------------------------------------------
+cs_CZ
+warning - incompatible stripping characters and condition:
+SFX D   us          ech        [^ighk]os
+SFX D   us          y          [^i]os
+SFX Q   os          ech        [^ghk]es
+SFX M   o           ech        [^ghkei]a
+SFX J   �m          ej         �m
+SFX J   �m          ejme       �m
+SFX J   �m          ejte       �m
+SFX A   ou�it       up         oupit
+SFX A   ou�it       upme       oupit
+SFX A   ou�it       upte       oupit
+SFX A   nout        l          [aeiouy��������r][^aeiouy��������rl][^aeiouy
+SFX A   nout        l          [aeiouy��������r][^aeiouy��������rl][^aeiouy
+
+es_ES
+warning - incompatible stripping characters and condition:
+SFX W umar �se [ae]husar
+SFX W emir i��is e�ir
+
+es_NEW
+warning - incompatible stripping characters and condition:
+SFX I unan �nen unar
+
+es_MX
+warning - incompatible stripping characters and condition:
+SFX A a ote e
+SFX W umar �se [ae]husar
+SFX W emir i��is e�ir
+
+lt_LT
+warning - incompatible stripping characters and condition:
+SFX U ti      siuosi          tis       
+SFX U ti      siuosi          tis       
+SFX U ti      siesi           tis       
+SFX U ti      siesi           tis       
+SFX U ti      sis             tis       
+SFX U ti      sis             tis       
+SFX U ti      sim�s           tis       
+SFX U ti      sim�s           tis       
+SFX U ti      sit�s           tis       
+SFX U ti      sit�s           tis       
+
+nn_NO
+warning - incompatible stripping characters and condition:
+SFX D   ar  rar  [^fmk]er
+SFX U   �re  orde  ere
+SFX U   �re  ort  ere
+
+pt_PT
+warning - incompatible stripping characters and condition:
+SFX g   �os        oas        �o
+SFX g   �os        oas        �o
+
+ro_RO
+warning - bad field number:
+SFX L   0          le         [^cg] i
+SFX L   0          i          [cg] i
+SFX U   0          i          [^i] ii
+warning - incompatible stripping characters and condition:
+SFX P   l          i          l	[<- there is an unnecessary tabulator here)
+SFX I   a          ii         [gc] a
+warning - bad field number:
+SFX I   a          ii         [gc] a
+SFX I   a          ei         [^cg] a
+
+sk_SK
+warning - incompatible stripping characters and condition:
+SFX T   �a�         ol�        kla�
+SFX T   �a�         ol�c       kla�
+SFX T   s�a�        �l�        sla�
+SFX T   s�a�        �l�c       sla�
+SFX R   �c�         l�iem      �c�
+SFX R   i�s�        �tie       mias�
+SFX R   iez�        iem        [^i]ez�
+SFX R   iez�        ie�        [^i]ez�
+SFX R   iez�        ie         [^i]ez�
+SFX R   iez�        eme        [^i]ez�
+SFX R   iez�        ete        [^i]ez�
+SFX R   iez�        �          [^i]ez�
+SFX R   iez�        �c         [^i]ez�
+SFX R   iez�        z          [^i]ez�
+SFX R   iez�        me         [^i]ez�
+SFX R   iez�        te         [^i]ez�
+
+sv_SE
+warning - bad field number:
+SFX  C  0  net  nets [^e]n
+--------------------------------------------------------
+
+2005-08-01: Hunspell 1.0.8 release
+
+- improved compound word support
+- fix German S handling
+- port MySpell files and MAP feature
+
+2005-07-22: Hunspell 1.0.7 release
+
+2005-07-21: new home page: http://hunspell.sourceforge.net