1Metadata-Version: 2.1
2Name: rapidfuzz
3Version: 1.8.0
4Summary: rapid fuzzy string matching
5Home-page: https://github.com/maxbachmann/rapidfuzz
6Author: Max Bachmann
7Author-email: contact@maxbachmann.de
8License: MIT
9Platform: UNKNOWN
10Classifier: Programming Language :: Python :: 2
11Classifier: Programming Language :: Python :: 2.7
12Classifier: Programming Language :: Python :: 3
13Classifier: Programming Language :: Python :: 3.5
14Classifier: Programming Language :: Python :: 3.6
15Classifier: Programming Language :: Python :: 3.7
16Classifier: Programming Language :: Python :: 3.8
17Classifier: Programming Language :: Python :: 3.9
18Classifier: Programming Language :: Python :: 3.10
19Classifier: License :: OSI Approved :: MIT License
20Requires-Python: >=2.7
21Description-Content-Type: text/markdown
22Provides-Extra: full
23License-File: LICENSE
24
25<h1 align="center">
26<img src="https://raw.githubusercontent.com/maxbachmann/RapidFuzz/main/docs/img/RapidFuzz.svg?sanitize=true" alt="RapidFuzz" width="400">
27</h1>
28<h4 align="center">Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance</h4>
29
30<p align="center">
31  <a href="https://github.com/maxbachmann/RapidFuzz/actions">
32    <img src="https://github.com/maxbachmann/RapidFuzz/workflows/Build/badge.svg"
33         alt="Continous Integration">
34  </a>
35  <a href="https://pypi.org/project/rapidfuzz/">
36    <img src="https://img.shields.io/pypi/v/rapidfuzz"
37         alt="PyPI package version">
38  </a>
39  <a href="https://anaconda.org/conda-forge/rapidfuzz">
40    <img src="https://img.shields.io/conda/vn/conda-forge/rapidfuzz.svg"
41         alt="Conda Version">
42  </a>
43  <a href="https://www.python.org">
44    <img src="https://img.shields.io/pypi/pyversions/rapidfuzz"
45         alt="Python versions">
46  </a><br/>
47  <a href="https://maxbachmann.github.io/RapidFuzz">
48    <img src="https://img.shields.io/badge/-documentation-blue"
49         alt="Documentation">
50  </a>
51  <a href="https://github.com/maxbachmann/RapidFuzz/blob/main/LICENSE">
52    <img src="https://img.shields.io/github/license/maxbachmann/rapidfuzz"
53         alt="GitHub license">
54  </a>
55</p>
56
57<p align="center">
58  <a href="#description">Description</a> •
59  <a href="#installation">Installation</a> •
60  <a href="#usage">Usage</a> •
61  <a href="#license">License</a>
62</p>
63
64---
65
66## Description
67RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations from [FuzzyWuzzy](https://github.com/seatgeek/fuzzywuzzy). However there are a couple of aspects that set RapidFuzz apart from FuzzyWuzzy:
681) It is MIT licensed so it can be used whichever License you might want to choose for your project, while you're forced to adopt the GPL license when using FuzzyWuzzy
692) It provides many string_metrics like hamming or jaro_winkler, which are not included in FuzzyWuzzy
703) It is mostly written in C++ and on top of this comes with a lot of Algorithmic improvements to make string matching even faster, while still providing the same results. For detailed benchmarks check the [documentation](https://maxbachmann.github.io/RapidFuzz/fuzz.html)
714) Fixes multiple bugs in the `partial_ratio` implementation
72
73## Requirements
74
75- Python 2.7 or later
76- On Windows the [Visual C++ 2019 redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) is required
77
78## Installation
79
80There are several ways to install RapidFuzz, the recommended methods
81are to either use `pip`(the Python package manager) or
82`conda` (an open-source, cross-platform, package manager)
83
84### with pip
85
86RapidFuzz can be installed with `pip` the following way:
87
88```bash
89pip install rapidfuzz
90```
91
92There are pre-built binaries (wheels) of RapidFuzz for MacOS (10.9 and later), Linux x86_64 and Windows. Wheels for armv6l (Raspberry Pi Zero) and armv7l (Raspberry Pi) are available on [piwheels](https://www.piwheels.org/project/rapidfuzz/).
93
94> :heavy_multiplication_x: &nbsp;&nbsp;**failure "ImportError: DLL load failed"**
95>
96> If you run into this error on Windows the reason is most likely, that the [Visual C++ 2019 redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) is not installed, which is required to find C++ Libraries (The C++ 2019 version includes the 2015, 2017 and 2019 version).
97
98### with conda
99
100RapidFuzz can be installed with `conda`:
101
102```bash
103conda install -c conda-forge rapidfuzz
104```
105
106### from git
107RapidFuzz can be installed directly from the source distribution by cloning the repository. This requires a C++14 capable compiler.
108
109```bash
110git clone --recursive https://github.com/maxbachmann/rapidfuzz.git
111cd rapidfuzz
112pip install .
113```
114
115## Usage
116Some simple functions are shown below. A complete documentation of all functions can be found [here](https://maxbachmann.github.io/RapidFuzz/index.html).
117
118### Scorers
119Scorers in RapidFuzz can be found in the modules `fuzz` and `string_metric`.
120
121#### Simple Ratio
122```console
123> fuzz.ratio("this is a test", "this is a test!")
12496.55171966552734
125```
126
127#### Partial Ratio
128```console
129> fuzz.partial_ratio("this is a test", "this is a test!")
130100.0
131```
132
133#### Token Sort Ratio
134```console
135> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
13690.90908813476562
137> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
138100.0
139```
140
141#### Token Set Ratio
142```console
143> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
14483.8709716796875
145> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
146100.0
147```
148
149### Process
150The process module makes it compare strings to lists of strings. This is generally more
151performant than using the scorers directly from Python.
152Here are some examples on the usage of processors in RapidFuzz:
153
154```console
155> from rapidfuzz import process, fuzz
156> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
157> process.extract("new york jets", choices, scorer=fuzz.WRatio, limit=2)
158[('New York Jets', 100, 1), ('New York Giants', 78.57142639160156, 2)]
159> process.extractOne("cowboys", choices, scorer=fuzz.WRatio)
160("Dallas Cowboys", 90, 3)
161```
162
163The full documentation of processors can be found [here](https://maxbachmann.github.io/RapidFuzz/process.html)
164
165## Benchmark
166
167The following benchmark gives a quick performance comparision between RapidFuzz and FuzzyWuzzy.
168More detailed benchmarks for the string metrics can be found in the [documentation](https://maxbachmann.github.io/RapidFuzz/fuzz.html). For this simple comparision I generated a list of 10.000 strings with length 10, that is compared to a sample of 100 elements from this list:
169```python
170words = [
171  ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(10))
172  for _ in range(10_000)
173]
174samples = words[::len(words) // 100]
175```
176
177The first benchmark compares the performance of the scorers in FuzzyWuzzy and RapidFuzz when they are used directly
178from Python in the following way:
179```python3
180for sample in samples:
181  for word in words:
182    scorer(sample, word)
183```
184The following graph shows how many elements are processed per second with each of the scorers. There are big performance differences between the different scorers. However each of the scorers is faster in RapidFuzz
185
186<img src="https://raw.githubusercontent.com/maxbachmann/RapidFuzz/main/docs/img/scorer.svg?sanitize=true" alt="Benchmark Scorer">
187
188The second benchmark compares the performance when the scorers are used in combination with extractOne in the following
189way:
190```python3
191for sample in samples:
192  extractOne(sample, word, scorer=scorer)
193```
194The following graph shows how many elements are processed per second with each of the scorers. In RapidFuzz the usage of scorers through processors like `extractOne` is a lot faster than directly using it. Thats why they should be used whenever possible.
195
196<img src="https://raw.githubusercontent.com/maxbachmann/RapidFuzz/main/docs/img/extractOne.svg?sanitize=true" alt="Benchmark extractOne">
197
198
199## License
200RapidFuzz is licensed under the MIT license since I believe that everyone should be able to use it without being forced to adopt the GPL license. Thats why the library is based on an older version of fuzzywuzzy that was MIT licensed as well.
201This old version of fuzzywuzzy can be found [here](https://github.com/seatgeek/fuzzywuzzy/tree/4bf28161f7005f3aa9d4d931455ac55126918df7).
202
203
204