1Metadata-Version: 2.1 2Name: snowballstemmer 3Version: 2.2.0 4Summary: This package provides 29 stemmers for 28 languages generated from Snowball algorithms. 5Home-page: https://github.com/snowballstem/snowball 6Author: Snowball Developers 7Author-email: snowball-discuss@lists.tartarus.org 8License: BSD-3-Clause 9Keywords: stemmer 10Platform: UNKNOWN 11Classifier: Development Status :: 5 - Production/Stable 12Classifier: Intended Audience :: Developers 13Classifier: License :: OSI Approved :: BSD License 14Classifier: Natural Language :: Arabic 15Classifier: Natural Language :: Basque 16Classifier: Natural Language :: Catalan 17Classifier: Natural Language :: Danish 18Classifier: Natural Language :: Dutch 19Classifier: Natural Language :: English 20Classifier: Natural Language :: Finnish 21Classifier: Natural Language :: French 22Classifier: Natural Language :: German 23Classifier: Natural Language :: Greek 24Classifier: Natural Language :: Hindi 25Classifier: Natural Language :: Hungarian 26Classifier: Natural Language :: Indonesian 27Classifier: Natural Language :: Irish 28Classifier: Natural Language :: Italian 29Classifier: Natural Language :: Lithuanian 30Classifier: Natural Language :: Nepali 31Classifier: Natural Language :: Norwegian 32Classifier: Natural Language :: Portuguese 33Classifier: Natural Language :: Romanian 34Classifier: Natural Language :: Russian 35Classifier: Natural Language :: Serbian 36Classifier: Natural Language :: Spanish 37Classifier: Natural Language :: Swedish 38Classifier: Natural Language :: Tamil 39Classifier: Natural Language :: Turkish 40Classifier: Operating System :: OS Independent 41Classifier: Programming Language :: Python 42Classifier: Programming Language :: Python :: 2 43Classifier: Programming Language :: Python :: 2.6 44Classifier: Programming Language :: Python :: 2.7 45Classifier: Programming Language :: Python :: 3 46Classifier: Programming Language :: Python :: 3.4 47Classifier: Programming Language :: Python :: 3.5 48Classifier: Programming Language :: Python :: 3.6 49Classifier: Programming Language :: Python :: 3.7 50Classifier: Programming Language :: Python :: 3.8 51Classifier: Programming Language :: Python :: 3.9 52Classifier: Programming Language :: Python :: 3.10 53Classifier: Programming Language :: Python :: Implementation :: CPython 54Classifier: Programming Language :: Python :: Implementation :: PyPy 55Classifier: Topic :: Database 56Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search 57Classifier: Topic :: Text Processing :: Indexing 58Classifier: Topic :: Text Processing :: Linguistic 59Description-Content-Type: text/x-rst 60License-File: COPYING 61 62Snowball stemming library collection for Python 63=============================================== 64 65Python 3 (>= 3.3) is supported. We no longer actively support Python 2 as 66the Python developers stopped supporting it at the start of 2020. Snowball 672.1.0 was the last release to officially support Python 2. 68 69What is Stemming? 70----------------- 71 72Stemming maps different forms of the same word to a common "stem" - for 73example, the English stemmer maps *connection*, *connections*, *connective*, 74*connected*, and *connecting* to *connect*. So a searching for *connected* 75would also find documents which only have the other forms. 76 77This stem form is often a word itself, but this is not always the case as this 78is not a requirement for text search systems, which are the intended field of 79use. We also aim to conflate words with the same meaning, rather than all 80words with a common linguistic root (so *awe* and *awful* don't have the same 81stem), and over-stemming is more problematic than under-stemming so we tend not 82to stem in cases that are hard to resolve. If you want to always reduce words 83to a root form and/or get a root form which is itself a word then Snowball's 84stemming algorithms likely aren't the right answer. 85 86How to use library 87------------------ 88 89The ``snowballstemmer`` module has two functions. 90 91The ``snowballstemmer.algorithms`` function returns a list of available 92algorithm names. 93 94The ``snowballstemmer.stemmer`` function takes an algorithm name and returns a 95``Stemmer`` object. 96 97``Stemmer`` objects have a ``Stemmer.stemWord(word)`` method and a 98``Stemmer.stemWords(word[])`` method. 99 100.. code-block:: python 101 102 import snowballstemmer 103 104 stemmer = snowballstemmer.stemmer('english'); 105 print(stemmer.stemWords("We are the world".split())); 106 107Automatic Acceleration 108---------------------- 109 110`PyStemmer <https://pypi.org/project/PyStemmer/>`_ is a wrapper module for 111Snowball's ``libstemmer_c`` and should provide results 100% compatible to 112**snowballstemmer**. 113 114**PyStemmer** is faster because it wraps generated C versions of the stemmers; 115**snowballstemmer** uses generate Python code and is slower but offers a pure 116Python solution. 117 118If PyStemmer is installed, ``snowballstemmer.stemmer`` returns a ``PyStemmer`` 119``Stemmer`` object which provides the same ``Stemmer.stemWord()`` and 120``Stemmer.stemWords()`` methods. 121 122Benchmark 123~~~~~~~~~ 124 125This is a crude benchmark which measures the time for running each stemmer on 126every word in its sample vocabulary (10,787,583 words over 26 languages). It's 127not a realistic test of normal use as a real application would do much more 128than just stemming. It's also skewed towards the stemmers which do more work 129per word and towards those with larger sample vocabularies. 130 131* Python 2.7 + **snowballstemmer** : 13m00s (15.0 * PyStemmer) 132* Python 3.7 + **snowballstemmer** : 12m19s (14.2 * PyStemmer) 133* PyPy 7.1.1 (Python 2.7.13) + **snowballstemmer** : 2m14s (2.6 * PyStemmer) 134* PyPy 7.1.1 (Python 3.6.1) + **snowballstemmer** : 1m46s (2.0 * PyStemmer) 135* Python 2.7 + **PyStemmer** : 52s 136 137For reference the equivalent test for C runs in 9 seconds. 138 139These results are for Snowball 2.0.0. They're likely to evolve over time as 140the code Snowball generates for both Python and C continues to improve (for 141a much older test over a different set of stemmers using Python 2.7, 142**snowballstemmer** was 30 times slower than **PyStemmer**, or 9 times slower 143with **PyPy**). 144 145The message to take away is that if you're stemming a lot of words you should 146either install **PyStemmer** (which **snowballstemmer** will then automatically 147use for you as described above) or use PyPy. 148 149The TestApp example 150------------------- 151 152The ``testapp.py`` example program allows you to run any of the stemmers 153on a sample vocabulary. 154 155Usage:: 156 157 testapp.py <algorithm> "sentences ... " 158 159.. code-block:: bash 160 161 $ python testapp.py English "sentences... " 162 163 164