1Metadata-Version: 2.1
2Name: snowballstemmer
3Version: 2.2.0
4Summary: This package provides 29 stemmers for 28 languages generated from Snowball algorithms.
5Home-page: https://github.com/snowballstem/snowball
6Author: Snowball Developers
7Author-email: snowball-discuss@lists.tartarus.org
8License: BSD-3-Clause
9Keywords: stemmer
10Platform: UNKNOWN
11Classifier: Development Status :: 5 - Production/Stable
12Classifier: Intended Audience :: Developers
13Classifier: License :: OSI Approved :: BSD License
14Classifier: Natural Language :: Arabic
15Classifier: Natural Language :: Basque
16Classifier: Natural Language :: Catalan
17Classifier: Natural Language :: Danish
18Classifier: Natural Language :: Dutch
19Classifier: Natural Language :: English
20Classifier: Natural Language :: Finnish
21Classifier: Natural Language :: French
22Classifier: Natural Language :: German
23Classifier: Natural Language :: Greek
24Classifier: Natural Language :: Hindi
25Classifier: Natural Language :: Hungarian
26Classifier: Natural Language :: Indonesian
27Classifier: Natural Language :: Irish
28Classifier: Natural Language :: Italian
29Classifier: Natural Language :: Lithuanian
30Classifier: Natural Language :: Nepali
31Classifier: Natural Language :: Norwegian
32Classifier: Natural Language :: Portuguese
33Classifier: Natural Language :: Romanian
34Classifier: Natural Language :: Russian
35Classifier: Natural Language :: Serbian
36Classifier: Natural Language :: Spanish
37Classifier: Natural Language :: Swedish
38Classifier: Natural Language :: Tamil
39Classifier: Natural Language :: Turkish
40Classifier: Operating System :: OS Independent
41Classifier: Programming Language :: Python
42Classifier: Programming Language :: Python :: 2
43Classifier: Programming Language :: Python :: 2.6
44Classifier: Programming Language :: Python :: 2.7
45Classifier: Programming Language :: Python :: 3
46Classifier: Programming Language :: Python :: 3.4
47Classifier: Programming Language :: Python :: 3.5
48Classifier: Programming Language :: Python :: 3.6
49Classifier: Programming Language :: Python :: 3.7
50Classifier: Programming Language :: Python :: 3.8
51Classifier: Programming Language :: Python :: 3.9
52Classifier: Programming Language :: Python :: 3.10
53Classifier: Programming Language :: Python :: Implementation :: CPython
54Classifier: Programming Language :: Python :: Implementation :: PyPy
55Classifier: Topic :: Database
56Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
57Classifier: Topic :: Text Processing :: Indexing
58Classifier: Topic :: Text Processing :: Linguistic
59Description-Content-Type: text/x-rst
60License-File: COPYING
61
62Snowball stemming library collection for Python
63===============================================
64
65Python 3 (>= 3.3) is supported.  We no longer actively support Python 2 as
66the Python developers stopped supporting it at the start of 2020.  Snowball
672.1.0 was the last release to officially support Python 2.
68
69What is Stemming?
70-----------------
71
72Stemming maps different forms of the same word to a common "stem" - for
73example, the English stemmer maps *connection*, *connections*, *connective*,
74*connected*, and *connecting* to *connect*.  So a searching for *connected*
75would also find documents which only have the other forms.
76
77This stem form is often a word itself, but this is not always the case as this
78is not a requirement for text search systems, which are the intended field of
79use.  We also aim to conflate words with the same meaning, rather than all
80words with a common linguistic root (so *awe* and *awful* don't have the same
81stem), and over-stemming is more problematic than under-stemming so we tend not
82to stem in cases that are hard to resolve.  If you want to always reduce words
83to a root form and/or get a root form which is itself a word then Snowball's
84stemming algorithms likely aren't the right answer.
85
86How to use library
87------------------
88
89The ``snowballstemmer`` module has two functions.
90
91The ``snowballstemmer.algorithms`` function returns a list of available
92algorithm names.
93
94The ``snowballstemmer.stemmer`` function takes an algorithm name and returns a
95``Stemmer`` object.
96
97``Stemmer`` objects have a ``Stemmer.stemWord(word)`` method and a
98``Stemmer.stemWords(word[])`` method.
99
100.. code-block:: python
101
102   import snowballstemmer
103
104   stemmer = snowballstemmer.stemmer('english');
105   print(stemmer.stemWords("We are the world".split()));
106
107Automatic Acceleration
108----------------------
109
110`PyStemmer <https://pypi.org/project/PyStemmer/>`_ is a wrapper module for
111Snowball's ``libstemmer_c`` and should provide results 100% compatible to
112**snowballstemmer**.
113
114**PyStemmer** is faster because it wraps generated C versions of the stemmers;
115**snowballstemmer** uses generate Python code and is slower but offers a pure
116Python solution.
117
118If PyStemmer is installed, ``snowballstemmer.stemmer`` returns a ``PyStemmer``
119``Stemmer`` object which provides the same ``Stemmer.stemWord()`` and
120``Stemmer.stemWords()`` methods.
121
122Benchmark
123~~~~~~~~~
124
125This is a crude benchmark which measures the time for running each stemmer on
126every word in its sample vocabulary (10,787,583 words over 26 languages).  It's
127not a realistic test of normal use as a real application would do much more
128than just stemming.  It's also skewed towards the stemmers which do more work
129per word and towards those with larger sample vocabularies.
130
131* Python 2.7 + **snowballstemmer** : 13m00s (15.0 * PyStemmer)
132* Python 3.7 + **snowballstemmer** : 12m19s (14.2 * PyStemmer)
133* PyPy 7.1.1 (Python 2.7.13) + **snowballstemmer** : 2m14s (2.6 * PyStemmer)
134* PyPy 7.1.1 (Python 3.6.1) + **snowballstemmer** : 1m46s (2.0 * PyStemmer)
135* Python 2.7 + **PyStemmer** : 52s
136
137For reference the equivalent test for C runs in 9 seconds.
138
139These results are for Snowball 2.0.0.  They're likely to evolve over time as
140the code Snowball generates for both Python and C continues to improve (for
141a much older test over a different set of stemmers using Python 2.7,
142**snowballstemmer** was 30 times slower than **PyStemmer**, or 9 times slower
143with **PyPy**).
144
145The message to take away is that if you're stemming a lot of words you should
146either install **PyStemmer** (which **snowballstemmer** will then automatically
147use for you as described above) or use PyPy.
148
149The TestApp example
150-------------------
151
152The ``testapp.py`` example program allows you to run any of the stemmers
153on a sample vocabulary.
154
155Usage::
156
157   testapp.py <algorithm> "sentences ... "
158
159.. code-block:: bash
160
161   $ python testapp.py English "sentences... "
162
163
164