1=====================================================
2"Did you mean... ?" Correcting errors in user queries
3=====================================================
4
5Overview
6========
7
8Whoosh can quickly suggest replacements for mis-typed words by returning
9a list of words from the index (or a dictionary) that are close to the
10mis-typed word::
11
12    with ix.searcher() as s:
13        corrector = s.corrector("text")
14        for mistyped_word in mistyped_words:
15            print corrector.suggest(mistyped_word, limit=3)
16
17See the :meth:`whoosh.spelling.Corrector.suggest` method documentation
18for information on the arguments.
19
20Currently the suggestion engine is more like a "typo corrector" than a
21real "spell checker" since it doesn't do the kind of sophisticated
22phonetic matching or semantic/contextual analysis a good spell checker
23might. However, it is still very useful.
24
25There are two main strategies for correcting words:
26
27*   Use the terms from an index field.
28
29*   Use words from a word list.
30
31
32Pulling suggestions from an indexed field
33=========================================
34
35In Whoosh 2.7 and later, spelling suggestions are available on all fields.
36However, if you have an analyzer that modifies the indexed words (such as
37stemming), you can add ``spelling=True`` to a field to have it store separate
38unmodified versions of the terms for spelling suggestions::
39
40    ana = analysis.StemmingAnalyzer()
41    schema = fields.Schema(text=TEXT(analyzer=ana, spelling=True))
42
43You can then use the :meth:`whoosh.searching.Searcher.corrector` method
44to get a corrector for a field::
45
46    corrector = searcher.corrector("content")
47
48The advantage of using the contents of an index field is that when you
49are spell checking queries on that index, the suggestions are tailored
50to the contents of the index. The disadvantage is that if the indexed
51documents contain spelling errors, then the spelling suggestions will
52also be erroneous.
53
54
55Pulling suggestions from a word list
56====================================
57
58There are plenty of word lists available on the internet you can use to
59populate the spelling dictionary.
60
61(In the following examples, ``word_list`` can be a list of unicode
62strings, or a file object with one word on each line.)
63
64To create a :class:`whoosh.spelling.Corrector` object from a sorted word list::
65
66    from whoosh.spelling import ListCorrector
67
68    # word_list must be a sorted list of unicocde strings
69    corrector = ListCorrector(word_list)
70
71
72Merging two or more correctors
73==============================
74
75You can combine suggestions from two sources (for example, the contents
76of an index field and a word list) using a
77:class:`whoosh.spelling.MultiCorrector`::
78
79    c1 = searcher.corrector("content")
80    c2 = spelling.ListCorrector(word_list)
81    corrector = MultiCorrector([c1, c2])
82
83
84Correcting user queries
85=======================
86
87You can spell-check a user query using the
88:meth:`whoosh.searching.Searcher.correct_query` method::
89
90    from whoosh import qparser
91
92    # Parse the user query string
93    qp = qparser.QueryParser("content", myindex.schema)
94    q = qp.parse(qstring)
95
96    # Try correcting the query
97    with myindex.searcher() as s:
98        corrected = s.correct_query(q, qstring)
99        if corrected.query != q:
100            print("Did you mean:", corrected.string)
101
102The ``correct_query`` method returns an object with the following
103attributes:
104
105``query``
106    A corrected :class:`whoosh.query.Query` tree. You can test
107    whether this is equal (``==``) to the original parsed query to
108    check if the corrector actually changed anything.
109
110``string``
111    A corrected version of the user's query string.
112
113``tokens``
114    A list of corrected token objects representing the corrected
115    terms. You can use this to reformat the user query (see below).
116
117
118You can use a :class:`whoosh.highlight.Formatter` object to format the
119corrected query string. For example, use the
120:class:`~whoosh.highlight.HtmlFormatter` to format the corrected string
121as HTML::
122
123    from whoosh import highlight
124
125    hf = highlight.HtmlFormatter()
126    corrected = s.correct_query(q, qstring, formatter=hf)
127
128See the documentation for
129:meth:`whoosh.searching.Searcher.correct_query` for information on the
130defaults and arguments.
131