1# About Hunspell 2 3Hunspell is a free spell checker and morphological analyzer library 4and command-line tool, licensed under LGPL/GPL/MPL tri-license. 5 6Hunspell is used by LibreOffice office suite, free browsers, like 7Mozilla Firefox and Google Chrome, and other tools and OSes, like 8Linux distributions and macOS. It is also a command-line tool for 9Linux, Unix-like and other OSes. 10 11It is designed for quick and high quality spell checking and 12correcting for languages with word-level writing system, 13including languages with rich morphology, complex word compounding 14and character encoding. 15 16Hunspell interfaces: Ispell-like terminal interface using Curses 17library, Ispell pipe interface, C++/C APIs and shared library, also 18with existing language bindings for other programming languages. 19 20Hunspell's code base comes from OpenOffice.org's MySpell library, 21developed by Kevin Hendricks (originally a C++ reimplementation of 22spell checking and affixation of Geoff Kuenning's International 23Ispell from scratch, later extended with eg. n-gram suggestions), 24see http://lingucomponent.openoffice.org/MySpell-3.zip, and 25its README, CONTRIBUTORS and license.readme (here: license.myspell) files. 26 27Main features of Hunspell library, developed by László Németh: 28 29 - Unicode support 30 - Highly customizable suggestions: word-part replacement tables and 31 stem-level phonetic and other alternative transcriptions to recognize 32 and fix all typical misspellings, don't suggest offensive words etc. 33 - Complex morphology: dictionary and affix homonyms; twofold affix 34 stripping to handle inflectional and derivational morpheme groups for 35 agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian, 36 Turkish; 64 thousand affix classes with arbitrary number of affixes; 37 conditional affixes, circumfixes, fogemorphemes, zero morphemes, 38 virtual dictionary stems, forbidden words to avoid overgeneration etc. 39 - Handling complex compounds (for example, for Finno-Ugric, German and 40 Indo-Aryan languages): recognizing compounds made of arbitrary 41 number of words, handle affixation within compounds etc. 42 - Custom dictionaries with affixation 43 - Stemming 44 - Morphological analysis (in custom item and arrangement style) 45 - Morphological generation 46 - SPELLML XML API over plain spell() API function for easier integration 47 of stemming, morpological generation and custom dictionaries with affixation 48 - Language specific algorithms, like special casing of Azeri or Turkish 49 dotted i and German sharp s, and special compound rules of Hungarian. 50 51Main features of Hunspell command line tool, developed by László Németh: 52 53 - Reimplementation of quick interactive interface of Geoff Kuenning's Ispell 54 - Parsing formats: text, OpenDocument, TeX/LaTeX, HTML/SGML/XML, nroff/troff 55 - Custom dictionaries with optional affixation, specified by a model word 56 - Multiple dictionary usage (for example hunspell -d en_US,de_DE,de_medical) 57 - Various filtering options (bad or good words/lines) 58 - Morphological analysis (option -m) 59 - Stemming (option -s) 60 61See man hunspell, man 3 hunspell, man 5 hunspell for complete manual. 62 63# Dependencies 64 65Build only dependencies: 66 67 g++ make autoconf automake autopoint libtool 68 69Runtime dependencies: 70 71| | Mandatory | Optional | 72|---------------|------------------|------------------| 73|libhunspell | | | 74|hunspell tool | libiconv gettext | ncurses readline | 75 76# Compiling on GNU/Linux and Unixes 77 78We first need to download the dependencies. On Linux, `gettext` and 79`libiconv` are part of the standard library. On other Unixes we 80need to manually install them. 81 82For Ubuntu: 83 84 sudo apt install autoconf automake autopoint libtool 85 86Then run the following commands: 87 88 autoreconf -vfi 89 ./configure 90 make 91 sudo make install 92 sudo ldconfig 93 94For dictionary development, use the `--with-warnings` option of 95configure. 96 97For interactive user interface of Hunspell executable, use the 98`--with-ui option`. 99 100Optional developer packages: 101 102 - ncurses (need for --with-ui), eg. libncursesw5 for UTF-8 103 - readline (for fancy input line editing, configure parameter: 104 --with-readline) 105 106In Ubuntu, the packages are: 107 108 libncurses5-dev libreadline-dev 109 110# Compiling on OSX and macOS 111 112On macOS for compiler always use `clang` and not `g++` because Homebrew 113dependencies are build with that. 114 115 brew install autoconf automake libtool gettext 116 brew link gettext --force 117 118Then run autoreconf, configure, make. See above. 119 120# Compiling on Windows 121 122## Compiling with Mingw64 and MSYS2 123 124Download Msys2, update everything and install the following 125 packages: 126 127 pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool 128 129Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see 130above. 131 132## Compiling in Cygwin environment 133 134Download and install Cygwin environment for Windows with the following 135extra packages: 136 137 - make 138 - automake 139 - autoconf 140 - libtool 141 - gcc-g++ development package 142 - ncurses, readline (for user interface) 143 - iconv (character conversion) 144 145Then compile the same way as on Linux. Cygwin builds depend on 146Cygwin1.dll. 147 148# Debugging 149 150It is recommended to install a debug build of the standard library: 151 152 libstdc++6-6-dbg 153 154For debugging we need to create a debug build and then we need to start 155`gdb`. 156 157 ./configure CXXFLAGS='-g -O0 -Wall -Wextra' 158 make 159 ./libtool --mode=execute gdb src/tools/hunspell 160 161You can also pass the `CXXFLAGS` directly to `make` without calling 162`./configure`, but we don't recommend this way during long development 163sessions. 164 165If you like to develop and debug with an IDE, see documentation at 166https://github.com/hunspell/hunspell/wiki/IDE-Setup 167 168# Testing 169 170Testing Hunspell (see tests in tests/ subdirectory): 171 172 make check 173 174or with Valgrind debugger: 175 176 make check 177 VALGRIND=[Valgrind_tool] make check 178 179For example: 180 181 make check 182 VALGRIND=memcheck make check 183 184# Documentation 185 186features and dictionary format: 187 188 man 5 hunspell 189 man hunspell 190 hunspell -h 191 192http://hunspell.github.io/ 193 194# Usage 195 196After compiling and installing (see INSTALL) you can run the Hunspell 197spell checker (compiled with user interface) with a Hunspell or Myspell 198dictionary: 199 200 hunspell -d en_US text.txt 201 202or without interface: 203 204 hunspell 205 hunspell -d en_GB -l <text.txt 206 207Dictionaries consist of an affix (.aff) and dictionary (.dic) file, for 208example, download American English dictionary files of LibreOffice 209(older version, but with stemming and morphological generation) with 210 211 wget -O en_US.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff?id=a4473e06b56bfe35187e302754f6baaa8d75e54f 212 wget -O en_US.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic?id=a4473e06b56bfe35187e302754f6baaa8d75e54f 213 214and with command line input and output, it's possible to check its work quickly, 215for example with the input words "example", "examples", "teached" and 216"verybaaaaaaaaaaaaaaaaaaaaaad": 217 218 $ hunspell -d en_US 219 Hunspell 1.7.0 220 example 221 * 222 223 examples 224 + example 225 226 teached 227 & teached 9 0: taught, teased, reached, teaches, teacher, leached, beached 228 229 verybaaaaaaaaaaaaaaaaaaaaaad 230 # verybaaaaaaaaaaaaaaaaaaaaaad 0 231 232Where in the output, `*` and `+` mean correct (accepted) words (`*` = dictionary stem, 233`+` = affixed forms of the following dictionary stem), and 234`&` and `#` mean bad (rejected) words (`&` = with suggestions, `#` = without suggestions) 235(see man hunspell). 236 237Example for stemming: 238 239 $ hunspell -d en_US -s 240 mice 241 mice mouse 242 243Example for morphological analysis (very limited with this English dictionary): 244 245 $ hunspell -d en_US -m 246 mice 247 mice st:mouse ts:Ns 248 249 cats 250 cats st:cat ts:0 is:Ns 251 cats st:cat ts:0 is:Vs 252 253# Other executables 254 255The src/tools directory contains the following executables after compiling. 256 257 - The main executable: 258 - hunspell: main program for spell checking and others (see 259 manual) 260 - Example tools: 261 - analyze: example of spell checking, stemming and morphological 262 analysis 263 - chmorph: example of automatic morphological generation and 264 conversion 265 - example: example of spell checking and suggestion 266 - Tools for dictionary development: 267 - affixcompress: dictionary generation from large (millions of 268 words) vocabularies 269 - makealias: alias compression (Hunspell only, not back compatible 270 with MySpell) 271 - wordforms: word generation (Hunspell version of unmunch) 272 - hunzip: decompressor of hzip format 273 - hzip: compressor of hzip format 274 - munch (DEPRECATED, use affixcompress): dictionary generation 275 from vocabularies (it needs an affix file, too). 276 - unmunch (DEPRECATED, use wordforms): list all recognized words 277 of a MySpell dictionary 278 279Example for morphological generation: 280 281 $ ~/hunspell/src/tools/analyze en_US.aff en_US.dic /dev/stdin 282 cat mice 283 generate(cat, mice) = cats 284 mouse cats 285 generate(mouse, cats) = mice 286 generate(mouse, cats) = mouses 287 288# Using Hunspell library with GCC 289 290Including in your program: 291 292 #include <hunspell.hxx> 293 294Linking with Hunspell static library: 295 296 g++ -lhunspell-1.7 example.cxx 297 # or better, use pkg-config 298 g++ $(pkg-config --cflags --libs hunspell) example.cxx 299 300## Dictionaries 301 302Hunspell (MySpell) dictionaries: 303 304 - https://wiki.documentfoundation.org/Language_support_of_LibreOffice 305 - http://cgit.freedesktop.org/libreoffice/dictionaries 306 - http://extensions.libreoffice.org 307 - http://extensions.openoffice.org 308 - http://wiki.services.openoffice.org/wiki/Dictionaries 309 310Aspell dictionaries (conversion: man 5 hunspell): 311 312 - ftp://ftp.gnu.org/gnu/aspell/dict 313 314László Németh, nemeth at numbertext org 315 316