12014-06-02: Hunspell 1.3.3 release: 2 - OpenDocument (ODF and Flat ODF) support (ODF needs unzip program) 3 - various bug fixes 4 52011-02-02: Hunspell 1.3.2 release: 6 - fix library versioning 7 - improved manual 8 92011-02-02: Hunspell 1.3.1 release: 10 - bug fixes 11 122011-01-26: Hunspell 1.2.15/1.3 release: 13 - new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual 14 - bug fixes 15 162011-01-21: 17 - new features: FORCEUCASE and WARN, see manual 18 - new options: -r to filter potential mistakes (rare words 19 signed by flag WARN in the dictionary) 20 - limited and optimized suggestions 21 222011-01-06: Hunspell 1.2.14 release: 23 - bug fix 242011-01-03: Hunspell 1.2.13 release: 25 - bug fixes 26 - improved compound handling and 27 other improvements supported by OpenTaal Foundation, Netherlands 282010-07-15: Hunspell 1.2.12 release 292010-05-06: Hunspell 1.2.11 release: 30 - Maintenance release bug fixes 312010-04-30: Hunspell 1.2.10 release: 32 - Maintenance release bug fixes 332010-03-03: Hunspell 1.2.9 release: 34 - Maintenance release bug fixes and warnings 35 - MAP support for composed characters or character sequences 362008-11-01: Hunspell 1.2.8 release: 37 - Default BREAK feature and better hyphenated word suggestion to accept 38 and fix (compound) words with hyphen characters by spell checker 39 instead of by work breaking code of OpenOffice.org. With this feature 40 it's possible to accept hyphenated compound words, such as "scot-free", 41 where "scot" is not a correct English word. 42 43 - ICONV & OCONV: input and output conversion tables for optional character 44 handling or using special inner format. Example: 45 46 # Accepting de facto replacements of the Romanian comma acuted letters 47 SET UTF-8 48 ICONV 4 49 ICONV ş ș 50 ICONV ţ ț 51 ICONV Ş Ș 52 ICONV Ţ Ț 53 54 Typical usage of ICONV/OCONV is to manage an inner format for a segmental 55 writing system, like the Ethiopic script of the Amharic language. 56 57 - Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like 58 sandhi feature of Telugu and other writing systems. 59 60 - SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and 61 Norwegian compound word forms, like tillåta (till|låta) and 62 bussjåfør (buss|sjåfør) 63 64 - wordforms: word generator script for dictionary developers (Hunspell 65 version of unmunch). 66 67 - bug fixes 68 692008-08-15: Hunspell 1.2.7 release: 70 - FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can 71 strip full words, not only one less characters. 72 - COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern 73 matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE 74 for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd 75 etc.). 76 - optimized suggestions: 77 - modified 1-character distance suggestion algorithms: search a TRY character 78 in all position instead of all TRY characters in a character position 79 (it can give more readable suggestion order, also better suggestions 80 in the first positions, when TRY characters are sorted by frequency.) 81 For example, suggestions for "moze": 82 ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6), 83 maze, more, mote, ooze, mole etc. (Hunspell 1.2.7). 84 - extended compound word checking for better COMPOUNDRULE related 85 suggestions, for example English ordinal numbers: 121323th -> 121323rd 86 (it needs also a th->rd REP definition). 87 - bug fixes 88 892008-07-15: Hunspell 1.2.6 release: 90 - bug fix release (fix affix rule condition checking of sk_SK dictionary, 91 iconv support in stemming and morphological analysis of the Hunspell 92 utility, see also Changelog) 93 942008-07-09: Hunspell 1.2.5 release: 95 - bug fix release (fix affix rule condition checking of en_GB dictionary, 96 also morphological analysis by dictionaries with two-level suffixes) 97 982008-06-18: Hunspell 1.2.4-2 release: 99 - fix GCC compiler warnings 100 1012008-06-17: Hunspell 1.2.4 release: 102 - add free_list() for C, C++ interfaces to deallocate suggestion lists 103 104 - bug fixes 105 1062008-06-17: Hunspell 1.2.3 release: 107 - extended XML interface to use morphological functions by standard 108 spell checking interface, spell() and suggest(). See hunspell.3 manual page. 109 110 - default dash suggestions for compound words: newword-> new word and new-word 111 112 - new manual pages: hunspell.3, hzip.1, hunzip.1. 113 114 - bug fixes 115 1162008-04-12: Hunspell 1.2.2 release: 117 - extended dictionary (dic file) support to use multiple base and 118 special dictionaries. 119 120 - new and improved options of command line hunspell: 121 -m: morphological analysis or flag debug mode (without affix 122 rule data it signs the flag of the affix rules) 123 -s: stemming mode 124 -D: list available dictionaries and search path 125 -d: support extra dictionaries by comma separated list. Example: 126 127 hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt 128 129 - forbidding in personal dictionary (with asterisk, / signs affixation) 130 131 - optional compressed dictionary format "hzip" for aff and dic files 132 usage: 133 hzip example.aff example.dic 134 mv example.aff example.dic /tmp 135 hunspell -d example 136 hunzip example.aff.hz >example.aff 137 hunzip example.dic.hz >example.dic 138 139 - new affix compression tool "affixcompress": compression tool for 140 large (millions of words) dictionaries. 141 142 - support encrypted dictionaries for closed OpenOffice.org extensions or 143 other commercial programs 144 145 - improved manual 146 147 - bug fixes 148 1492007-11-01: Hunspell 1.2.1 release: 150 - new memory efficient condition checking algorithm for affix rules 151 152 - new morphological functions: 153 - stem() for stemming 154 - analyze() for morphological analysis 155 - generate() for morphological generation 156 157 - new demos: 158 - analyze: stemming, morphological analysis and generation 159 - chmorph: morphological conversion of texts 160 1612007-09-05: Hunspell 1.1.12 release: 162 - dictionary based phonetic suggestion for words with 163 special or foreign pronounciation or alternative (bad) transliteration 164 (see Changelog, tests/phone.* and manual). 165 166 - improved data structure and memory optimization for dictionaries 167 with variable count fields 168 169 - bug fixes for Unicode encoding dictionaries and ngram suggestions 170 171 - improved REP suggestions with space: it works without dictionary 172 modification 173 174 - updated and new project files for Windows API 175 1762007-08-27: Hunspell 1.1.11 release: 177 - portability fixes 178 1792007-08-23: Hunspell 1.1.10 release: 180 - pronounciation based suggestion using Bj�rn Jacke's original Aspell 181 phonetic transcription algorithm (http://aspell.net), relicensed under 182 GPL/LGPL/MPL tri-license with the permission of the author 183 184 - keyboard base suggestion by KEY (see manual) 185 186 - better time limits for suggestion search 187 188 - test environment for suggestion based on Wikipedia data 189 190 - bug fixes for non standard Mozilla platforms etc. 191 1922007-07-25: Hunspell 1.1.9 release: 193 - better tokenization: 194 - for URLs, mail addresses and directory paths (default: skip these tokens) 195 - for colons in words (for Finnish and Swedish) 196 197 - new examples: 198 - affixation of personal dictionary words 199 - digits in words 200 201 - bug fixes (see ChangeLog) 202 2032007-07-16: Hunspell 1.1.8 release: 204 - better Mac OS X/Cygwin and Windows compatibility 205 206 - fix Hunspell's Valgrind environment and memory handling errors 207 detected by Valgrind 208 209 - other bug fixes (see ChangeLog) 210 2112007-07-06: Hunspell 1.1.7 release: 212 - fix warning messages of OpenOffice.org build 213 2142007-06-29: Hunspell 1.1.6 release: 215 - check capitalization of the following word forms 216 - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG 217 - allcap words and suffixes: UNICEF's - UNICEF'S 218 - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA 219 220 - suggestion for missing sentence spacing: something.The -> something. The 221 222 - Hunspell executable: improved locale support 223 - -i option: custom input encoding 224 - use locale data for default dictionary names. 225 - tools/hunspell.cxx: fix 8-bit tokenization (letters without 226 casing, like ß or Hebrew characters now are handled well) 227 - dictionary search path (automatic detection of OpenOffice.org directories) 228 - DICPATH environmental variable 229 - -D option: show directory path of loaded dictionary 230 231 - patches and bug fixes for Mozilla, OpenOffice.org. 232 2332007-03-19: Hunspell 1.1.5 release: 234 - optimizations: 10-100% speed up, smaller code size and memory footprint 235 (conditional experimental code and warning messages) 236 237 - extended Unicode support: 238 - non BMP Unicode characters in dictionary words and affixes (except 239 affix rules and conditions) 240 - support BOM sequence in aff and dic files 241 242 - IGNORE feature for Arabic diacritics and other optional characters 243 244 - New edit distance suggestion methods: 245 - capitalisation: nasa -> NASA 246 - long swap: permenant -> permanent 247 - long move: Ghandi -> Gandhi, greatful -> grateful 248 - double two characters: vacacation -> vacation 249 - spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word) 250 251 - patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua, 252 German and Arabic language, etc. 253 2542006-02-01: Hunspell 1.1.4 release: 255 - Improved suggestion for typical OCR bugs (missing spaces between 256 capitalized words). For example: "aNew" -> "a New". 257 http://qa.openoffice.org/issues/show_bug.cgi?id=58202 258 259 - tokenization fixes (fix incomplete tokenization of input texts on big-endian 260 platforms, and locale-dependent tokenization of dictionary entries) 261 2622006-01-06: Hunspell 1.1.3.2 release: 263 - fix Visual C++ compiling errors 264 2652006-01-05: Hunspell 1.1.3 release: 266 - GPL/LGPL/MPL tri-license for Mozilla integration 267 268 - Alias compression of flag sets and morphological descriptions. 269 (For example, 16 MB Arabic dic file can be compressed to 1 MB.) 270 271 - Improved suggestion. 272 273 - Improved, language independent German sharp s casing with CHECKSHARPS 274 declaration. 275 276 - Unicode tokenization in Hunspell program. 277 278 - Bug fixes (at new and old compound word handling methods), etc. 279 2802005-11-11: Hunspell 1.1.2 release: 281 282 - Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND 283 suggestions) 284 285 - Checked with 51 regression tests in Valgrind debugging environment, 286 and tested with 52 OOo dictionaries on i686-pc-linux platform. 287 2882005-11-09: Hunspell 1.1.1 release: 289 290 - Compound word patterns for complex compound word handling and 291 simple word-level lexical scanning. Ideal for checking 292 Arabic and Roman numbers, ordinal numbers in English, affixed 293 numbers in agglutinative languages, etc. 294 http://qa.openoffice.org/issues/show_bug.cgi?id=53643 295 296 - Support ISO-8859-15 encoding for French (French oe ligatures are 297 missing from the latin-1 encoding). 298 http://qa.openoffice.org/issues/show_bug.cgi?id=54980 299 300 - Implemented a flag to forbid obscene word suggestion: 301 http://qa.openoffice.org/issues/show_bug.cgi?id=55498 302 303 - Checked with 50 regression tests in Valgrind debugging environment, 304 and tested with 52 OOo dictionaries. 305 306 - other improvements and bug fixes (see ChangeLog) 307 3082005-09-19: Hunspell 1.1.0 release 309 310* complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta) 311 312* improved ngram suggestion with swap character detection and 313 case insensitivity 314 315------ examples for ngram improvement (input word and suggestions) ----- 316 3171. pernament (instead of permanent) 318 319MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, 320 ornament, ornamentals, ornamental, ornamentally 321 322Hunspell 1.0.9: ornamental, ornament, tournament 323 324Hunspell 1.1.0: permanent 325 326Note: swap character detection 327 328 3292. PERNAMENT (instead of PERMANENT) 330 331MySpell 3.2: - 332 333Hunspell 1.0.9: - 334 335Hunspell 1.1.0: PERMANENT 336 337 3383. Unesco (instead of UNESCO) 339 340MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's, 341 Frescoed, Fresco, Escorts, Escorting 342 343Hunspell 1.0.9: Genesco, Ionesco, Fresco 344 345Hunspell 1.1.0: UNESCO 346 347 3484. siggraph's (instead of SIGGRAPH's) 349 350MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's, 351 physiography, digraphs, serigraph, stratigraphy's, stratigraphy 352 epigraphs 353 354Hunspell 1.0.9: serigraph's, epigraph's, digraph's 355 356Hunspell 1.1.0: SIGGRAPH's 357 358--------------- end of examples -------------------- 359 360* improved testing environment with suggestion checking and memory debugging 361 362 memory debugging of all tests with a simple command: 363 364 VALGRIND=memcheck make check 365 366* lots of other improvements and bug fixes (see ChangeLog) 367 368 3692005-08-26: Hunspell 1.0.9 release 370 371* improved related character map suggestion 372 373* improved ngram suggestion 374 375------ examples for ngram improvement (O=old, N = new ngram suggestions) -- 376 3771. Permenant (instead of Permanent) 378 379O: Endangerment, Ferment, Fermented, Deferment's, Empowerment, 380 Ferment's, Ferments, Fermenting, Countermen, Weathermen 381 382N: Permanent, Supermen, Preferment 383 384Note: Ngram suggestions was case sensitive. 385 3862. permenant (instead of permanent) 387 388O: supermen, newspapermen, empowerment, endangerment, preferments, 389 preferment, permanent, preferment's, permanently, impermanent 390 391N: permanent, supermen, preferment 392 393Note: new suggestions are also weighted with longest common subsequence, 394first letter and common character positions 395 3963. pernemant (instead of permanent) 397 398O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent, 399 supernatant, impermanent, semipermanent, impermanently 400 401N: permanent, supernatant, pimpernel 402 403Note: new method also prefers root word instead of not 404relevant affixes ('s, s and ly) 405 406 4074. pernament (instead of permanent) 408 409O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, 410 ornament, ornamentals, ornamental, ornamentally 411 412N: ornamental, ornament, tournament 413 414Note: Both ngram methods misses here. 415 416 4175. obvus (instad of obvious): 418 419O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse, 420 obviates, obviate, Travus 421 422N: obvious, obtuse, obverse 423 424Note: new method also prefers common first letters. 425 426 4276. unambigus (instead of unambiguous) 428 429O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous, 430 unambitious, ambiguities, ambiguousness 431 432N: unambiguous, unambiguity, unambitious 433 434 435 4367. consecvence (instead of consequence) 437 438O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence, 439 consecutiveness's, convenience's, consistences, consistence 440 441N: consequence, consecutive, consecrates 442 443 444An example in a language with rich morphology: 445 4468. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]): 447 448O: Misik�d�iben, Pisised�iben, Misik�i�iben, Pisisek�iben, Misik�iben, 449 Misik�id�iben, Misik�k�iben, Misik�ik�iben, Misik�im�iben, Mississippiiben 450 451N: Mississippiben, Mississippiiben, Misiiben 452 453Note: Suggesting not relevant affixes was the biggest fault in ngram 454 suggestion for languages with a lot of affixes. 455 456--------------- end of examples -------------------- 457 458* support twofold prefix cutting 459 460* lots of other improvements and bug fixes (see ChangeLog) 461 462* test Hunspell with 54 OpenOffice.org dictionaries: 463 464source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries 465 466testing shell script: 467------------------------------------------------------- 468for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'` 469do 470 dic=`basename $i .zip` 471 mkdir $dic 472 echo unzip $dic 473 unzip -d $dic $i 2>/dev/null 474 cd $dic 475 echo unmunch and test $dic 476 unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' | 477 hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result 478 cd .. 479done 480-------------------------------------------------------- 481 482test result (0 size is o.k.): 483 484$ for i in *_*/*.result; do wc -c $i; done 4850 af_ZA/af_ZA.result 4860 bg_BG/bg_BG.result 4870 ca_ES/ca_ES.result 4880 cy_GB/cy_GB.result 4890 cs_CZ/cs_CZ.result 4900 da_DK/da_DK.result 4910 de_AT/de_AT.result 4920 de_CH/de_CH.result 4930 de_DE/de_DE.result 4940 el_GR/el_GR.result 4956 en_AU/en_AU.result 4960 en_CA/en_CA.result 4970 en_GB/en_GB.result 4980 en_NZ/en_NZ.result 4990 en_US/en_US.result 5000 eo_EO/eo_EO.result 5010 es_ES/es_ES.result 5020 es_MX/es_MX.result 5030 es_NEW/es_NEW.result 5040 fo_FO/fo_FO.result 5050 fr_FR/fr_FR.result 5060 ga_IE/ga_IE.result 5070 gd_GB/gd_GB.result 5080 gl_ES/gl_ES.result 5090 he_IL/he_IL.result 5100 hr_HR/hr_HR.result 511200694989 hu_HU/hu_HU.result 5120 id_ID/id_ID.result 5130 it_IT/it_IT.result 5140 ku_TR/ku_TR.result 5150 lt_LT/lt_LT.result 5160 lv_LV/lv_LV.result 5170 mg_MG/mg_MG.result 5180 mi_NZ/mi_NZ.result 5190 ms_MY/ms_MY.result 5200 nb_NO/nb_NO.result 5210 nl_NL/nl_NL.result 5220 nn_NO/nn_NO.result 5230 ny_MW/ny_MW.result 5240 pl_PL/pl_PL.result 5250 pt_BR/pt_BR.result 5260 pt_PT/pt_PT.result 5270 ro_RO/ro_RO.result 5280 ru_RU/ru_RU.result 5290 rw_RW/rw_RW.result 5300 sk_SK/sk_SK.result 5310 sl_SI/sl_SI.result 5320 sv_SE/sv_SE.result 5330 sw_KE/sw_KE.result 5340 tet_ID/tet_ID.result 5350 tl_PH/tl_PH.result 5360 tn_ZA/tn_ZA.result 5370 uk_UA/uk_UA.result 5380 zu_ZA/zu_ZA.result 539 540In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but 541`eqn.' is missing. Presumably it is a dictionary bug. Myspell also 542haven't accepted it. 543 544Hungarian dictionary contains pseudoroots and forbidden words. 545Unmunch haven't supported these features yet, and generates bad words, too. 546 547* check affix rules and OOo dictionaries. Detected bugs in cs_CZ, 548es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries). 549 550Details: 551-------------------------------------------------------- 552cs_CZ 553warning - incompatible stripping characters and condition: 554SFX D us ech [^ighk]os 555SFX D us y [^i]os 556SFX Q os ech [^ghk]es 557SFX M o ech [^ghkei]a 558SFX J �m ej �m 559SFX J �m ejme �m 560SFX J �m ejte �m 561SFX A ou�it up oupit 562SFX A ou�it upme oupit 563SFX A ou�it upte oupit 564SFX A nout l [aeiouy��������r][^aeiouy��������rl][^aeiouy 565SFX A nout l [aeiouy��������r][^aeiouy��������rl][^aeiouy 566 567es_ES 568warning - incompatible stripping characters and condition: 569SFX W umar �se [ae]husar 570SFX W emir i��is e�ir 571 572es_NEW 573warning - incompatible stripping characters and condition: 574SFX I unan �nen unar 575 576es_MX 577warning - incompatible stripping characters and condition: 578SFX A a ote e 579SFX W umar �se [ae]husar 580SFX W emir i��is e�ir 581 582lt_LT 583warning - incompatible stripping characters and condition: 584SFX U ti siuosi tis 585SFX U ti siuosi tis 586SFX U ti siesi tis 587SFX U ti siesi tis 588SFX U ti sis tis 589SFX U ti sis tis 590SFX U ti sim�s tis 591SFX U ti sim�s tis 592SFX U ti sit�s tis 593SFX U ti sit�s tis 594 595nn_NO 596warning - incompatible stripping characters and condition: 597SFX D ar rar [^fmk]er 598SFX U �re orde ere 599SFX U �re ort ere 600 601pt_PT 602warning - incompatible stripping characters and condition: 603SFX g �os oas �o 604SFX g �os oas �o 605 606ro_RO 607warning - bad field number: 608SFX L 0 le [^cg] i 609SFX L 0 i [cg] i 610SFX U 0 i [^i] ii 611warning - incompatible stripping characters and condition: 612SFX P l i l [<- there is an unnecessary tabulator here) 613SFX I a ii [gc] a 614warning - bad field number: 615SFX I a ii [gc] a 616SFX I a ei [^cg] a 617 618sk_SK 619warning - incompatible stripping characters and condition: 620SFX T �a� ol� kla� 621SFX T �a� ol�c kla� 622SFX T s�a� �l� sla� 623SFX T s�a� �l�c sla� 624SFX R �c� l�iem �c� 625SFX R i�s� �tie mias� 626SFX R iez� iem [^i]ez� 627SFX R iez� ie� [^i]ez� 628SFX R iez� ie [^i]ez� 629SFX R iez� eme [^i]ez� 630SFX R iez� ete [^i]ez� 631SFX R iez� � [^i]ez� 632SFX R iez� �c [^i]ez� 633SFX R iez� z [^i]ez� 634SFX R iez� me [^i]ez� 635SFX R iez� te [^i]ez� 636 637sv_SE 638warning - bad field number: 639SFX C 0 net nets [^e]n 640-------------------------------------------------------- 641 6422005-08-01: Hunspell 1.0.8 release 643 644- improved compound word support 645- fix German S handling 646- port MySpell files and MAP feature 647 6482005-07-22: Hunspell 1.0.7 release 649 6502005-07-21: new home page: http://hunspell.sourceforge.net 651