• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

lib/Lingua/H18-Apr-2000-588201

ChangesH A D18-Apr-2000776 2718

MANIFESTH A D22-Oct-199976 87

Makefile.PLH A D22-Oct-1999250 85

READMEH A D18-Apr-200011.4 KiB317243

spellcheckH A D18-Apr-20001.3 KiB7757

test.plH A D22-Oct-1999838 3111

README

1
2Note:
3A simple "spellchecking" program is included in this distribution.
4It is a perl program named "spellcheck".  It simply prints the
5analysis of the input text; it provides no way to modify the text.
6It is simply given as a demonstration of the module.  Type
7    spellcheck -h
8for a usage summary.  If no input files are specified, it will
9read from stdin.  After each line of input, it will print the
10analysis of the terms.  By default, it only gives output for
11terms which are "incorrect".  Give it the -v option to have it
12report on the "correct" terms as well.
13
14Tests:
15'make test' currently does nothing.  To test the installation,
16try out the "spellcheck" program provided.
17
18__POD__
19
20NAME
21     Lingua::Ispell.pm - a module encapsulating access to the
22     Ispell program.
23
24     Note: this module was previously known as Text::Ispell; if
25     you have Text::Ispell installed on your system, it is now
26     obsolete and should be replaced by Lingua::Ispell.
27
28NOTA BENE
29     ispell, when reporting on misspelled words, indicates the
30     string it was unable to verify, as well as its starting
31     offset in the input line. No such information is returned
32     for words which are deemed to be correctly spelled.  For
33     example, in a line like "Can't buy a thrill", ispell simply
34     reports that the line contained four correctly spelled
35     words.
36
37     Lingua::Ispell would like to identify which substrings of
38     the input line are words -- correctly spelled or otherwise.
39     It used to attempt to split the input line into words
40     according to the same rules ispell uses; but that has proven
41     to be very difficult, resulting in both slow and error-prone
42     code.
43
44     Consequences
45
46     Lingua::Ispell now operates only in "terse" mode. In this
47     mode, only misspelled words are reported. Words which
48     ispell verifies as correctly spelled are silently accepted.
49
50     In the report structures returned by spellcheck(), the
51     'term' member is now always identical to the 'original'
52     member; of the two, you should probably use the 'term'
53     member.  (Also consider the 'offset' member.)  ispell does
54     not report this information for correctly spelled words; if
55     at some point in the future this capability is added to
56     ispell, Lingua::Ispell will be updated to take advantage of
57     it.
58
59     Use of the $word_chars variable has been removed; setting it
60     no longer has any effect.
61
62     terse_mode() now does nothing.
63
64SYNOPSIS
65      # Brief:
66      use Lingua::Ispell;
67      Lingua::Ispell::spellcheck( $string );
68      # or
69      use Lingua::Ispell qw( spellcheck ); # import the function
70      spellcheck( $string );
71
72      # Useful:
73      use Lingua::Ispell qw( :all );  # import all symbols
74      for my $r ( spellcheck( "hello hacking perl shrdlu 42" ) ) {
75 print "$r->{'type'}: $r->{'term'}\n";
76      }
77
78
79DESCRIPTION
80     Lingua::Ispell::spellcheck() takes one argument.  It must be
81     a string, and it should contain only printable characters.
82     One allowable exception is a terminal newline, which will be
83     chomped off anyway.  The line is fed to a coprocess running
84     ispell for analysis.  ispell parses the line into "terms"
85     according to the language-specific rules in effect.
86
87     The result of ispell's analysis of each term is a
88     categorization of the term into one of six types: ok,
89     compound, root, miss, none, and guess.  Some of these carry
90     additional information.  The first three types are
91     "correctly" spelled terms, and the last three are for
92     "incorrectly" spelled terms.
93
94     Lingua::Ispell::spellcheck returns a list of objects, each
95     corresponding to a term in the spellchecked string.  Each
96     object is a hash (hash-ref) with at least two entries:
97     'term' and 'type'.  The former contains the term ispell is
98     reporting on, and the latter is ispell's determination of
99     that term's type (see above).  For types 'ok' and 'none',
100     that is all the information there is.  For the type 'root',
101     an additional hash entry is present: 'root'.  Its value is
102     the word which ispell identified in the dictionary as being
103     the likely root of the current term.  For the type 'miss',
104     an additional hash entry is present: 'misses'.  Its value is
105     an ref to an array of words which ispell identified as being
106     "near-misses" of the current term, when scanning the
107     dictionary.
108
109     NOTE
110
111     As mentioned above, Lingua::Ispell::spellcheck() currently
112     only reports on misspelled terms.
113
114     EXAMPLE
115
116      use Lingua::Ispell qw( spellcheck );
117      Lingua::Ispell::allow_compounds(1);
118      for my $r ( spellcheck( "hello hacking perl salmoning fruithammer shrdlu 42" ) ) {
119 if ( $r->{'type'} eq 'ok' ) {
120   # as in the case of 'hello'
121   print "'$r->{'term'}' was found in the dictionary.\n";
122 }
123 elsif ( $r->{'type'} eq 'root' ) {
124   # as in the case of 'hacking'
125   print "'$r->{'term'}' can be formed from root '$r->{'root'}'\n";
126 }
127 elsif ( $r->{'type'} eq 'miss' ) {
128   # as in the case of 'perl'
129   print "'$r->{'term'}' was not found in the dictionary;\n";
130   print "Near misses: @{$r->{'misses'}}\n";
131 }
132 elsif ( $r->{'type'} eq 'guess' ) {
133   # as in the case of 'salmoning'
134   print "'$r->{'term'}' was not found in the dictionary;\n";
135   print "Root/affix Guesses: @{$r->{'guesses'}}\n";
136 }
137 elsif ( $r->{'type'} eq 'compound' ) {
138   # as in the case of 'fruithammer'
139   print "'$r->{'term'}' is a valid compound word.\n";
140 }
141 elsif ( $r->{'type'} eq 'none' ) {
142   # as in the case of 'shrdlu'
143   print "No match for term '$r->{'term'}'\n";
144 }
145 # and numbers are skipped entirely, as in the case of 42.
146      }
147
148
149     ERRORS
150
151     Lingua::Ispell::spellcheck() starts the ispell coprocess if
152     the coprocess seems not to exist. Ordinarily this is simply
153     the first time it's called.
154
155     ispell is spawned via the Open2::open2() function, which
156     throws an exception (i.e. dies) if the spawn fails.  The
157     caller should be prepared to catch this exception -- unless,
158     of course, the default behavior of die is acceptable.
159
160     Nota Bene
161
162     The full location of the ispell executable is stored in the
163     variable $Lingua::Ispell::path.  The default value is
164     /usr/local/bin/ispell.  If your ispell executable has some
165     name other than this, then you must set
166     $Lingua::Ispell::path accordingly before you call
167     Lingua::Ispell::spellcheck() (or any other function in the
168     module) for the first time!
169
170AUX FUNCTIONS
171     add_word(word)
172
173     Adds a word to the personal dictionary.  Be careful of
174     capitalization.  If you want the word to be added "case-
175     insensitively", you should call add_word_lc()
176
177     add_word_lc(word)
178
179     Adds a word to the personal dictionary, in lower-case form.
180     This allows ispell to match it in a case-insensitive manner.
181
182     accept_word(word)
183
184     Similar to adding a word to the dictionary, in that it
185     causes ispell to accept the word as valid, but it does not
186     actually add it to the dictionary.  Presumably the effects
187     of this only last for the current ispell session, which will
188     mysteriously end if any of the coprocess-restarting
189     functions are called...
190
191     parse_according_to(formatter)
192
193     Causes ispell to parse subsequent input lines according to
194     the specified formatter.  As of ispell v. 3.1.20, only 'tex'
195     and 'nroff' are supported.
196
197     set_params_by_language(language)
198
199     Causes ispell to set its internal operational parameters
200     according to the given language.  Legal arguments to this
201     function, and its effects, are currently unknown by the
202     author of Lingua::Ispell.
203
204     save_dictionary()
205
206     Causes ispell to save the current state of the dictionary to
207     its disk file.  Presumably ispell would ordinarily only do
208     this upon exit.
209
210     terse_mode(bool:terse)
211
212     NOTE: This function has been disabled! Lingua::Ispell now
213     always operates in terse mode.
214
215     In terse mode, ispell will not produce reports for "correct"
216     words.  This means that the calling program will not receive
217     results of the types 'ok', 'root', and 'compound'.
218
219
220FUNCTIONS THAT RESTART ISPELL
221     The following functions cause the current ispell coprocess,
222     if any, to terminate. This means that all the changes to the
223     state of ispell made by the above functions will be lost,
224     and their respective values reset to their defaults.  The
225     only function above whose effect is persistent is
226     save_dictionary().
227
228     Perhaps in the future we will figure out a good way to make
229     this state information carry over from one instantiation of
230     the coprocess to the next.
231
232     allow_compounds(bool)
233
234     When this value is set to True, compound words are accepted
235     as legal -- as long as both words are found in the
236     dictionary; more than two words are always illegal.  When
237     this value is set to False, run-together words are
238     considered spelling errors.
239
240     The default value of this setting is dictionary-dependent,
241     so the caller should set it explicitly if it really matters.
242
243     make_wild_guesses(bool)
244
245     This setting controls when ispell makes "wild" guesses.
246
247     If False, ispell only makes "sane" guesses, i.e.  possible
248     root/affix combinations that match the current dictionary;
249     only if it can find none will it make "wild" guesses, which
250     don't match the dictionary, and might in fact be illegal
251     words.
252
253     If True, wild guesses are always made, along with any "sane"
254     guesses. This feature can be useful if the dictionary has a
255     limited word list, or a word list with few suffixes.
256
257     The default value of this setting is dictionary-dependent,
258     so the caller should set it explicitly if it really matters.
259
260     use_dictionary([dictionary])
261
262     Specifies what dictionary to use instead of the default.
263     Dictionary names are actually file names, and are searched
264     for according to the following rule: if the name does not
265     contain a slash, it is looked for in the directory
266     containing the default dictionary, typically /usr/local/lib.
267     Otherwise, it is used as is: if it does not begin with a
268     slash, it is construed from the current directory.
269
270     If no argument is given, the default dictionary will be
271     used.
272
273     use_personal_dictionary([dictionary])
274
275     Specifies what personal dictionary to use instead of the
276     default.
277
278     Dictionary names are actually file names, and are searched
279     for according to the following rule:  if the name begins
280     with a slash, it is used as is (i.e. it is an absolute path
281     name). Otherwise, it is construed as relative to the user's
282     home directory ($HOME).
283
284     If no argument is given, the default personal dictionary
285     will be used.
286
287FUTURE ENHANCEMENTS
288     ispell options:
289
290       -w chars
291       Specify additional characters that can be part of a word.
292
293
294DEPENDENCIES
295     Lingua::Ispell uses the external program ispell, which is
296     the "International Ispell", available at
297
298       http://fmg-www.cs.ucla.edu/geoff/ispell.html
299
300     as well as various archives and mirrors, such as
301
302       ftp://ftp.math.orst.edu/pub/ispell-3.1/
303
304     This is a very popular program, and may already be installed
305     on your system.
306
307     Lingua::Ispell also uses the standard perl modules
308     FileHandle, IPC::Open2, and Carp.
309
310AUTHOR
311     jdporter@min.net (John Porter)
312
313COPYRIGHT
314     This module is free software; you may redistribute it and/or
315     modify it under the same terms as Perl itself.
316
317