1
2Note:
3A simple "spellchecking" program is included in this distribution.
4It is a perl program named "spellcheck". It simply prints the
5analysis of the input text; it provides no way to modify the text.
6It is simply given as a demonstration of the module. Type
7 spellcheck -h
8for a usage summary. If no input files are specified, it will
9read from stdin. After each line of input, it will print the
10analysis of the terms. By default, it only gives output for
11terms which are "incorrect". Give it the -v option to have it
12report on the "correct" terms as well.
13
14Tests:
15'make test' currently does nothing. To test the installation,
16try out the "spellcheck" program provided.
17
18__POD__
19
20NAME
21 Lingua::Ispell.pm - a module encapsulating access to the
22 Ispell program.
23
24 Note: this module was previously known as Text::Ispell; if
25 you have Text::Ispell installed on your system, it is now
26 obsolete and should be replaced by Lingua::Ispell.
27
28NOTA BENE
29 ispell, when reporting on misspelled words, indicates the
30 string it was unable to verify, as well as its starting
31 offset in the input line. No such information is returned
32 for words which are deemed to be correctly spelled. For
33 example, in a line like "Can't buy a thrill", ispell simply
34 reports that the line contained four correctly spelled
35 words.
36
37 Lingua::Ispell would like to identify which substrings of
38 the input line are words -- correctly spelled or otherwise.
39 It used to attempt to split the input line into words
40 according to the same rules ispell uses; but that has proven
41 to be very difficult, resulting in both slow and error-prone
42 code.
43
44 Consequences
45
46 Lingua::Ispell now operates only in "terse" mode. In this
47 mode, only misspelled words are reported. Words which
48 ispell verifies as correctly spelled are silently accepted.
49
50 In the report structures returned by spellcheck(), the
51 'term' member is now always identical to the 'original'
52 member; of the two, you should probably use the 'term'
53 member. (Also consider the 'offset' member.) ispell does
54 not report this information for correctly spelled words; if
55 at some point in the future this capability is added to
56 ispell, Lingua::Ispell will be updated to take advantage of
57 it.
58
59 Use of the $word_chars variable has been removed; setting it
60 no longer has any effect.
61
62 terse_mode() now does nothing.
63
64SYNOPSIS
65 # Brief:
66 use Lingua::Ispell;
67 Lingua::Ispell::spellcheck( $string );
68 # or
69 use Lingua::Ispell qw( spellcheck ); # import the function
70 spellcheck( $string );
71
72 # Useful:
73 use Lingua::Ispell qw( :all ); # import all symbols
74 for my $r ( spellcheck( "hello hacking perl shrdlu 42" ) ) {
75 print "$r->{'type'}: $r->{'term'}\n";
76 }
77
78
79DESCRIPTION
80 Lingua::Ispell::spellcheck() takes one argument. It must be
81 a string, and it should contain only printable characters.
82 One allowable exception is a terminal newline, which will be
83 chomped off anyway. The line is fed to a coprocess running
84 ispell for analysis. ispell parses the line into "terms"
85 according to the language-specific rules in effect.
86
87 The result of ispell's analysis of each term is a
88 categorization of the term into one of six types: ok,
89 compound, root, miss, none, and guess. Some of these carry
90 additional information. The first three types are
91 "correctly" spelled terms, and the last three are for
92 "incorrectly" spelled terms.
93
94 Lingua::Ispell::spellcheck returns a list of objects, each
95 corresponding to a term in the spellchecked string. Each
96 object is a hash (hash-ref) with at least two entries:
97 'term' and 'type'. The former contains the term ispell is
98 reporting on, and the latter is ispell's determination of
99 that term's type (see above). For types 'ok' and 'none',
100 that is all the information there is. For the type 'root',
101 an additional hash entry is present: 'root'. Its value is
102 the word which ispell identified in the dictionary as being
103 the likely root of the current term. For the type 'miss',
104 an additional hash entry is present: 'misses'. Its value is
105 an ref to an array of words which ispell identified as being
106 "near-misses" of the current term, when scanning the
107 dictionary.
108
109 NOTE
110
111 As mentioned above, Lingua::Ispell::spellcheck() currently
112 only reports on misspelled terms.
113
114 EXAMPLE
115
116 use Lingua::Ispell qw( spellcheck );
117 Lingua::Ispell::allow_compounds(1);
118 for my $r ( spellcheck( "hello hacking perl salmoning fruithammer shrdlu 42" ) ) {
119 if ( $r->{'type'} eq 'ok' ) {
120 # as in the case of 'hello'
121 print "'$r->{'term'}' was found in the dictionary.\n";
122 }
123 elsif ( $r->{'type'} eq 'root' ) {
124 # as in the case of 'hacking'
125 print "'$r->{'term'}' can be formed from root '$r->{'root'}'\n";
126 }
127 elsif ( $r->{'type'} eq 'miss' ) {
128 # as in the case of 'perl'
129 print "'$r->{'term'}' was not found in the dictionary;\n";
130 print "Near misses: @{$r->{'misses'}}\n";
131 }
132 elsif ( $r->{'type'} eq 'guess' ) {
133 # as in the case of 'salmoning'
134 print "'$r->{'term'}' was not found in the dictionary;\n";
135 print "Root/affix Guesses: @{$r->{'guesses'}}\n";
136 }
137 elsif ( $r->{'type'} eq 'compound' ) {
138 # as in the case of 'fruithammer'
139 print "'$r->{'term'}' is a valid compound word.\n";
140 }
141 elsif ( $r->{'type'} eq 'none' ) {
142 # as in the case of 'shrdlu'
143 print "No match for term '$r->{'term'}'\n";
144 }
145 # and numbers are skipped entirely, as in the case of 42.
146 }
147
148
149 ERRORS
150
151 Lingua::Ispell::spellcheck() starts the ispell coprocess if
152 the coprocess seems not to exist. Ordinarily this is simply
153 the first time it's called.
154
155 ispell is spawned via the Open2::open2() function, which
156 throws an exception (i.e. dies) if the spawn fails. The
157 caller should be prepared to catch this exception -- unless,
158 of course, the default behavior of die is acceptable.
159
160 Nota Bene
161
162 The full location of the ispell executable is stored in the
163 variable $Lingua::Ispell::path. The default value is
164 /usr/local/bin/ispell. If your ispell executable has some
165 name other than this, then you must set
166 $Lingua::Ispell::path accordingly before you call
167 Lingua::Ispell::spellcheck() (or any other function in the
168 module) for the first time!
169
170AUX FUNCTIONS
171 add_word(word)
172
173 Adds a word to the personal dictionary. Be careful of
174 capitalization. If you want the word to be added "case-
175 insensitively", you should call add_word_lc()
176
177 add_word_lc(word)
178
179 Adds a word to the personal dictionary, in lower-case form.
180 This allows ispell to match it in a case-insensitive manner.
181
182 accept_word(word)
183
184 Similar to adding a word to the dictionary, in that it
185 causes ispell to accept the word as valid, but it does not
186 actually add it to the dictionary. Presumably the effects
187 of this only last for the current ispell session, which will
188 mysteriously end if any of the coprocess-restarting
189 functions are called...
190
191 parse_according_to(formatter)
192
193 Causes ispell to parse subsequent input lines according to
194 the specified formatter. As of ispell v. 3.1.20, only 'tex'
195 and 'nroff' are supported.
196
197 set_params_by_language(language)
198
199 Causes ispell to set its internal operational parameters
200 according to the given language. Legal arguments to this
201 function, and its effects, are currently unknown by the
202 author of Lingua::Ispell.
203
204 save_dictionary()
205
206 Causes ispell to save the current state of the dictionary to
207 its disk file. Presumably ispell would ordinarily only do
208 this upon exit.
209
210 terse_mode(bool:terse)
211
212 NOTE: This function has been disabled! Lingua::Ispell now
213 always operates in terse mode.
214
215 In terse mode, ispell will not produce reports for "correct"
216 words. This means that the calling program will not receive
217 results of the types 'ok', 'root', and 'compound'.
218
219
220FUNCTIONS THAT RESTART ISPELL
221 The following functions cause the current ispell coprocess,
222 if any, to terminate. This means that all the changes to the
223 state of ispell made by the above functions will be lost,
224 and their respective values reset to their defaults. The
225 only function above whose effect is persistent is
226 save_dictionary().
227
228 Perhaps in the future we will figure out a good way to make
229 this state information carry over from one instantiation of
230 the coprocess to the next.
231
232 allow_compounds(bool)
233
234 When this value is set to True, compound words are accepted
235 as legal -- as long as both words are found in the
236 dictionary; more than two words are always illegal. When
237 this value is set to False, run-together words are
238 considered spelling errors.
239
240 The default value of this setting is dictionary-dependent,
241 so the caller should set it explicitly if it really matters.
242
243 make_wild_guesses(bool)
244
245 This setting controls when ispell makes "wild" guesses.
246
247 If False, ispell only makes "sane" guesses, i.e. possible
248 root/affix combinations that match the current dictionary;
249 only if it can find none will it make "wild" guesses, which
250 don't match the dictionary, and might in fact be illegal
251 words.
252
253 If True, wild guesses are always made, along with any "sane"
254 guesses. This feature can be useful if the dictionary has a
255 limited word list, or a word list with few suffixes.
256
257 The default value of this setting is dictionary-dependent,
258 so the caller should set it explicitly if it really matters.
259
260 use_dictionary([dictionary])
261
262 Specifies what dictionary to use instead of the default.
263 Dictionary names are actually file names, and are searched
264 for according to the following rule: if the name does not
265 contain a slash, it is looked for in the directory
266 containing the default dictionary, typically /usr/local/lib.
267 Otherwise, it is used as is: if it does not begin with a
268 slash, it is construed from the current directory.
269
270 If no argument is given, the default dictionary will be
271 used.
272
273 use_personal_dictionary([dictionary])
274
275 Specifies what personal dictionary to use instead of the
276 default.
277
278 Dictionary names are actually file names, and are searched
279 for according to the following rule: if the name begins
280 with a slash, it is used as is (i.e. it is an absolute path
281 name). Otherwise, it is construed as relative to the user's
282 home directory ($HOME).
283
284 If no argument is given, the default personal dictionary
285 will be used.
286
287FUTURE ENHANCEMENTS
288 ispell options:
289
290 -w chars
291 Specify additional characters that can be part of a word.
292
293
294DEPENDENCIES
295 Lingua::Ispell uses the external program ispell, which is
296 the "International Ispell", available at
297
298 http://fmg-www.cs.ucla.edu/geoff/ispell.html
299
300 as well as various archives and mirrors, such as
301
302 ftp://ftp.math.orst.edu/pub/ispell-3.1/
303
304 This is a very popular program, and may already be installed
305 on your system.
306
307 Lingua::Ispell also uses the standard perl modules
308 FileHandle, IPC::Open2, and Carp.
309
310AUTHOR
311 jdporter@min.net (John Porter)
312
313COPYRIGHT
314 This module is free software; you may redistribute it and/or
315 modify it under the same terms as Perl itself.
316
317