1# About Hunspell
2
3Hunspell is a free spell checker and morphological analyzer library
4and command-line tool, licensed under LGPL/GPL/MPL tri-license.
5
6Hunspell is used by LibreOffice office suite, free browsers, like
7Mozilla Firefox and Google Chrome, and other tools and OSes, like
8Linux distributions and macOS. It is also a command-line tool for
9Linux, Unix-like and other OSes.
10
11It is designed for quick and high quality spell checking and
12correcting for languages with word-level writing system,
13including languages with rich morphology, complex word compounding
14and character encoding.
15
16Hunspell interfaces: Ispell-like terminal interface using Curses
17library, Ispell pipe interface, C++/C APIs and shared library, also
18with existing language bindings for other programming languages.
19
20Hunspell's code base comes from OpenOffice.org's MySpell library,
21developed by Kevin Hendricks (originally a C++ reimplementation of
22spell checking and affixation of Geoff Kuenning's International
23Ispell from scratch, later extended with eg. n-gram suggestions),
24see http://lingucomponent.openoffice.org/MySpell-3.zip, and
25its README, CONTRIBUTORS and license.readme (here: license.myspell) files.
26
27Main features of Hunspell library, developed by László Németh:
28
29 - Unicode support
30 - Highly customizable suggestions: word-part replacement tables and
31 stem-level phonetic and other alternative transcriptions to recognize
32 and fix all typical misspellings, don't suggest offensive words etc.
33 - Complex morphology: dictionary and affix homonyms; twofold affix
34 stripping to handle inflectional and derivational morpheme groups for
35 agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian,
36 Turkish; 64 thousand affix classes with arbitrary number of affixes;
37 conditional affixes, circumfixes, fogemorphemes, zero morphemes,
38 virtual dictionary stems, forbidden words to avoid overgeneration etc.
39 - Handling complex compounds (for example, for Finno-Ugric, German and
40 Indo-Aryan languages): recognizing compounds made of arbitrary
41 number of words, handle affixation within compounds etc.
42 - Custom dictionaries with affixation
43 - Stemming
44 - Morphological analysis (in custom item and arrangement style)
45 - Morphological generation
46 - SPELLML XML API over plain spell() API function for easier integration
47 of stemming, morpological generation and custom dictionaries with affixation
48 - Language specific algorithms, like special casing of Azeri or Turkish
49 dotted i and German sharp s, and special compound rules of Hungarian.
50
51Main features of Hunspell command line tool, developed by László Németh:
52
53 - Reimplementation of quick interactive interface of Geoff Kuenning's Ispell
54 - Parsing formats: text, OpenDocument, TeX/LaTeX, HTML/SGML/XML, nroff/troff
55 - Custom dictionaries with optional affixation, specified by a model word
56 - Multiple dictionary usage (for example hunspell -d en_US,de_DE,de_medical)
57 - Various filtering options (bad or good words/lines)
58 - Morphological analysis (option -m)
59 - Stemming (option -s)
60
61See man hunspell, man 3 hunspell, man 5 hunspell for complete manual.
62
63# Dependencies
64
65Build only dependencies:
66
67 g++ make autoconf automake autopoint libtool
68
69Runtime dependencies:
70
71| | Mandatory | Optional |
72|---------------|------------------|------------------|
73|libhunspell | | |
74|hunspell tool | libiconv gettext | ncurses readline |
75
76# Compiling on GNU/Linux and Unixes
77
78We first need to download the dependencies. On Linux, `gettext` and
79`libiconv` are part of the standard library. On other Unixes we
80need to manually install them.
81
82For Ubuntu:
83
84 sudo apt install autoconf automake autopoint libtool
85
86Then run the following commands:
87
88 autoreconf -vfi
89 ./configure
90 make
91 sudo make install
92 sudo ldconfig
93
94For dictionary development, use the `--with-warnings` option of
95configure.
96
97For interactive user interface of Hunspell executable, use the
98`--with-ui option`.
99
100Optional developer packages:
101
102 - ncurses (need for --with-ui), eg. libncursesw5 for UTF-8
103 - readline (for fancy input line editing, configure parameter:
104 --with-readline)
105
106In Ubuntu, the packages are:
107
108 libncurses5-dev libreadline-dev
109
110# Compiling on OSX and macOS
111
112On macOS for compiler always use `clang` and not `g++` because Homebrew
113dependencies are build with that.
114
115 brew install autoconf automake libtool gettext
116 brew link gettext --force
117
118Then run autoreconf, configure, make. See above.
119
120# Compiling on Windows
121
122## Compiling with Mingw64 and MSYS2
123
124Download Msys2, update everything and install the following
125 packages:
126
127 pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool
128
129Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see
130above.
131
132## Compiling in Cygwin environment
133
134Download and install Cygwin environment for Windows with the following
135extra packages:
136
137 - make
138 - automake
139 - autoconf
140 - libtool
141 - gcc-g++ development package
142 - ncurses, readline (for user interface)
143 - iconv (character conversion)
144
145Then compile the same way as on Linux. Cygwin builds depend on
146Cygwin1.dll.
147
148# Debugging
149
150It is recommended to install a debug build of the standard library:
151
152 libstdc++6-6-dbg
153
154For debugging we need to create a debug build and then we need to start
155`gdb`.
156
157 ./configure CXXFLAGS='-g -O0 -Wall -Wextra'
158 make
159 ./libtool --mode=execute gdb src/tools/hunspell
160
161You can also pass the `CXXFLAGS` directly to `make` without calling
162`./configure`, but we don't recommend this way during long development
163sessions.
164
165If you like to develop and debug with an IDE, see documentation at
166https://github.com/hunspell/hunspell/wiki/IDE-Setup
167
168# Testing
169
170Testing Hunspell (see tests in tests/ subdirectory):
171
172 make check
173
174or with Valgrind debugger:
175
176 make check
177 VALGRIND=[Valgrind_tool] make check
178
179For example:
180
181 make check
182 VALGRIND=memcheck make check
183
184# Documentation
185
186features and dictionary format:
187
188 man 5 hunspell
189 man hunspell
190 hunspell -h
191
192http://hunspell.github.io/
193
194# Usage
195
196After compiling and installing (see INSTALL) you can run the Hunspell
197spell checker (compiled with user interface) with a Hunspell or Myspell
198dictionary:
199
200 hunspell -d en_US text.txt
201
202or without interface:
203
204 hunspell
205 hunspell -d en_GB -l <text.txt
206
207Dictionaries consist of an affix (.aff) and dictionary (.dic) file, for
208example, download American English dictionary files of LibreOffice
209(older version, but with stemming and morphological generation) with
210
211 wget -O en_US.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff?id=a4473e06b56bfe35187e302754f6baaa8d75e54f
212 wget -O en_US.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic?id=a4473e06b56bfe35187e302754f6baaa8d75e54f
213
214and with command line input and output, it's possible to check its work quickly,
215for example with the input words "example", "examples", "teached" and
216"verybaaaaaaaaaaaaaaaaaaaaaad":
217
218 $ hunspell -d en_US
219 Hunspell 1.7.0
220 example
221 *
222
223 examples
224 + example
225
226 teached
227 & teached 9 0: taught, teased, reached, teaches, teacher, leached, beached
228
229 verybaaaaaaaaaaaaaaaaaaaaaad
230 # verybaaaaaaaaaaaaaaaaaaaaaad 0
231
232Where in the output, `*` and `+` mean correct (accepted) words (`*` = dictionary stem,
233`+` = affixed forms of the following dictionary stem), and
234`&` and `#` mean bad (rejected) words (`&` = with suggestions, `#` = without suggestions)
235(see man hunspell).
236
237Example for stemming:
238
239 $ hunspell -d en_US -s
240 mice
241 mice mouse
242
243Example for morphological analysis (very limited with this English dictionary):
244
245 $ hunspell -d en_US -m
246 mice
247 mice st:mouse ts:Ns
248
249 cats
250 cats st:cat ts:0 is:Ns
251 cats st:cat ts:0 is:Vs
252
253# Other executables
254
255The src/tools directory contains the following executables after compiling.
256
257 - The main executable:
258 - hunspell: main program for spell checking and others (see
259 manual)
260 - Example tools:
261 - analyze: example of spell checking, stemming and morphological
262 analysis
263 - chmorph: example of automatic morphological generation and
264 conversion
265 - example: example of spell checking and suggestion
266 - Tools for dictionary development:
267 - affixcompress: dictionary generation from large (millions of
268 words) vocabularies
269 - makealias: alias compression (Hunspell only, not back compatible
270 with MySpell)
271 - wordforms: word generation (Hunspell version of unmunch)
272 - hunzip: decompressor of hzip format
273 - hzip: compressor of hzip format
274 - munch (DEPRECATED, use affixcompress): dictionary generation
275 from vocabularies (it needs an affix file, too).
276 - unmunch (DEPRECATED, use wordforms): list all recognized words
277 of a MySpell dictionary
278
279Example for morphological generation:
280
281 $ ~/hunspell/src/tools/analyze en_US.aff en_US.dic /dev/stdin
282 cat mice
283 generate(cat, mice) = cats
284 mouse cats
285 generate(mouse, cats) = mice
286 generate(mouse, cats) = mouses
287
288# Using Hunspell library with GCC
289
290Including in your program:
291
292 #include <hunspell.hxx>
293
294Linking with Hunspell static library:
295
296 g++ -lhunspell-1.7 example.cxx
297 # or better, use pkg-config
298 g++ $(pkg-config --cflags --libs hunspell) example.cxx
299
300## Dictionaries
301
302Hunspell (MySpell) dictionaries:
303
304 - https://wiki.documentfoundation.org/Language_support_of_LibreOffice
305 - http://cgit.freedesktop.org/libreoffice/dictionaries
306 - http://extensions.libreoffice.org
307 - http://extensions.openoffice.org
308 - http://wiki.services.openoffice.org/wiki/Dictionaries
309
310Aspell dictionaries (conversion: man 5 hunspell):
311
312 - ftp://ftp.gnu.org/gnu/aspell/dict
313
314László Németh, nemeth at numbertext org
315
316