1Metadata-Version: 2.1 2Name: rapidfuzz 3Version: 1.8.0 4Summary: rapid fuzzy string matching 5Home-page: https://github.com/maxbachmann/rapidfuzz 6Author: Max Bachmann 7Author-email: contact@maxbachmann.de 8License: MIT 9Platform: UNKNOWN 10Classifier: Programming Language :: Python :: 2 11Classifier: Programming Language :: Python :: 2.7 12Classifier: Programming Language :: Python :: 3 13Classifier: Programming Language :: Python :: 3.5 14Classifier: Programming Language :: Python :: 3.6 15Classifier: Programming Language :: Python :: 3.7 16Classifier: Programming Language :: Python :: 3.8 17Classifier: Programming Language :: Python :: 3.9 18Classifier: Programming Language :: Python :: 3.10 19Classifier: License :: OSI Approved :: MIT License 20Requires-Python: >=2.7 21Description-Content-Type: text/markdown 22Provides-Extra: full 23License-File: LICENSE 24 25<h1 align="center"> 26<img src="https://raw.githubusercontent.com/maxbachmann/RapidFuzz/main/docs/img/RapidFuzz.svg?sanitize=true" alt="RapidFuzz" width="400"> 27</h1> 28<h4 align="center">Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance</h4> 29 30<p align="center"> 31 <a href="https://github.com/maxbachmann/RapidFuzz/actions"> 32 <img src="https://github.com/maxbachmann/RapidFuzz/workflows/Build/badge.svg" 33 alt="Continous Integration"> 34 </a> 35 <a href="https://pypi.org/project/rapidfuzz/"> 36 <img src="https://img.shields.io/pypi/v/rapidfuzz" 37 alt="PyPI package version"> 38 </a> 39 <a href="https://anaconda.org/conda-forge/rapidfuzz"> 40 <img src="https://img.shields.io/conda/vn/conda-forge/rapidfuzz.svg" 41 alt="Conda Version"> 42 </a> 43 <a href="https://www.python.org"> 44 <img src="https://img.shields.io/pypi/pyversions/rapidfuzz" 45 alt="Python versions"> 46 </a><br/> 47 <a href="https://maxbachmann.github.io/RapidFuzz"> 48 <img src="https://img.shields.io/badge/-documentation-blue" 49 alt="Documentation"> 50 </a> 51 <a href="https://github.com/maxbachmann/RapidFuzz/blob/main/LICENSE"> 52 <img src="https://img.shields.io/github/license/maxbachmann/rapidfuzz" 53 alt="GitHub license"> 54 </a> 55</p> 56 57<p align="center"> 58 <a href="#description">Description</a> • 59 <a href="#installation">Installation</a> • 60 <a href="#usage">Usage</a> • 61 <a href="#license">License</a> 62</p> 63 64--- 65 66## Description 67RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations from [FuzzyWuzzy](https://github.com/seatgeek/fuzzywuzzy). However there are a couple of aspects that set RapidFuzz apart from FuzzyWuzzy: 681) It is MIT licensed so it can be used whichever License you might want to choose for your project, while you're forced to adopt the GPL license when using FuzzyWuzzy 692) It provides many string_metrics like hamming or jaro_winkler, which are not included in FuzzyWuzzy 703) It is mostly written in C++ and on top of this comes with a lot of Algorithmic improvements to make string matching even faster, while still providing the same results. For detailed benchmarks check the [documentation](https://maxbachmann.github.io/RapidFuzz/fuzz.html) 714) Fixes multiple bugs in the `partial_ratio` implementation 72 73## Requirements 74 75- Python 2.7 or later 76- On Windows the [Visual C++ 2019 redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) is required 77 78## Installation 79 80There are several ways to install RapidFuzz, the recommended methods 81are to either use `pip`(the Python package manager) or 82`conda` (an open-source, cross-platform, package manager) 83 84### with pip 85 86RapidFuzz can be installed with `pip` the following way: 87 88```bash 89pip install rapidfuzz 90``` 91 92There are pre-built binaries (wheels) of RapidFuzz for MacOS (10.9 and later), Linux x86_64 and Windows. Wheels for armv6l (Raspberry Pi Zero) and armv7l (Raspberry Pi) are available on [piwheels](https://www.piwheels.org/project/rapidfuzz/). 93 94> :heavy_multiplication_x: **failure "ImportError: DLL load failed"** 95> 96> If you run into this error on Windows the reason is most likely, that the [Visual C++ 2019 redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) is not installed, which is required to find C++ Libraries (The C++ 2019 version includes the 2015, 2017 and 2019 version). 97 98### with conda 99 100RapidFuzz can be installed with `conda`: 101 102```bash 103conda install -c conda-forge rapidfuzz 104``` 105 106### from git 107RapidFuzz can be installed directly from the source distribution by cloning the repository. This requires a C++14 capable compiler. 108 109```bash 110git clone --recursive https://github.com/maxbachmann/rapidfuzz.git 111cd rapidfuzz 112pip install . 113``` 114 115## Usage 116Some simple functions are shown below. A complete documentation of all functions can be found [here](https://maxbachmann.github.io/RapidFuzz/index.html). 117 118### Scorers 119Scorers in RapidFuzz can be found in the modules `fuzz` and `string_metric`. 120 121#### Simple Ratio 122```console 123> fuzz.ratio("this is a test", "this is a test!") 12496.55171966552734 125``` 126 127#### Partial Ratio 128```console 129> fuzz.partial_ratio("this is a test", "this is a test!") 130100.0 131``` 132 133#### Token Sort Ratio 134```console 135> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 13690.90908813476562 137> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 138100.0 139``` 140 141#### Token Set Ratio 142```console 143> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 14483.8709716796875 145> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 146100.0 147``` 148 149### Process 150The process module makes it compare strings to lists of strings. This is generally more 151performant than using the scorers directly from Python. 152Here are some examples on the usage of processors in RapidFuzz: 153 154```console 155> from rapidfuzz import process, fuzz 156> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"] 157> process.extract("new york jets", choices, scorer=fuzz.WRatio, limit=2) 158[('New York Jets', 100, 1), ('New York Giants', 78.57142639160156, 2)] 159> process.extractOne("cowboys", choices, scorer=fuzz.WRatio) 160("Dallas Cowboys", 90, 3) 161``` 162 163The full documentation of processors can be found [here](https://maxbachmann.github.io/RapidFuzz/process.html) 164 165## Benchmark 166 167The following benchmark gives a quick performance comparision between RapidFuzz and FuzzyWuzzy. 168More detailed benchmarks for the string metrics can be found in the [documentation](https://maxbachmann.github.io/RapidFuzz/fuzz.html). For this simple comparision I generated a list of 10.000 strings with length 10, that is compared to a sample of 100 elements from this list: 169```python 170words = [ 171 ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(10)) 172 for _ in range(10_000) 173] 174samples = words[::len(words) // 100] 175``` 176 177The first benchmark compares the performance of the scorers in FuzzyWuzzy and RapidFuzz when they are used directly 178from Python in the following way: 179```python3 180for sample in samples: 181 for word in words: 182 scorer(sample, word) 183``` 184The following graph shows how many elements are processed per second with each of the scorers. There are big performance differences between the different scorers. However each of the scorers is faster in RapidFuzz 185 186<img src="https://raw.githubusercontent.com/maxbachmann/RapidFuzz/main/docs/img/scorer.svg?sanitize=true" alt="Benchmark Scorer"> 187 188The second benchmark compares the performance when the scorers are used in combination with extractOne in the following 189way: 190```python3 191for sample in samples: 192 extractOne(sample, word, scorer=scorer) 193``` 194The following graph shows how many elements are processed per second with each of the scorers. In RapidFuzz the usage of scorers through processors like `extractOne` is a lot faster than directly using it. Thats why they should be used whenever possible. 195 196<img src="https://raw.githubusercontent.com/maxbachmann/RapidFuzz/main/docs/img/extractOne.svg?sanitize=true" alt="Benchmark extractOne"> 197 198 199## License 200RapidFuzz is licensed under the MIT license since I believe that everyone should be able to use it without being forced to adopt the GPL license. Thats why the library is based on an older version of fuzzywuzzy that was MIT licensed as well. 201This old version of fuzzywuzzy can be found [here](https://github.com/seatgeek/fuzzywuzzy/tree/4bf28161f7005f3aa9d4d931455ac55126918df7). 202 203 204