Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | 03-May-2022 | - | ||||
.github/workflows/ | H | 23-Mar-2020 | - | 31 | 29 | |
fingerprints/ | H | 23-Mar-2020 | - | 178 | 125 | |
tests/ | H | 23-Mar-2020 | - | 48 | 34 | |
tools/ | H | 23-Mar-2020 | - | 2,431 | 2,421 | |
.bumpversion.cfg | H A D | 23-Mar-2020 | 118 | 9 | 6 | |
.gitignore | H A D | 23-Mar-2020 | 74 | 9 | 9 | |
LICENSE | H A D | 23-Mar-2020 | 1.1 KiB | 22 | 17 | |
MANIFEST.in | H A D | 23-Mar-2020 | 50 | 3 | 2 | |
Makefile | H A D | 23-Mar-2020 | 335 | 16 | 11 | |
README.md | H A D | 23-Mar-2020 | 1.7 KiB | 40 | 25 | |
setup.cfg | H A D | 23-Mar-2020 | 61 | 6 | 4 | |
setup.py | H A D | 23-Mar-2020 | 1.1 KiB | 38 | 35 |
README.md
1# fingerprints 2 3![package](https://github.com/alephdata/fingerprints/workflows/package/badge.svg) 4 5This library helps with the generation of fingerprints for entity data. A fingerprint 6in this context is understood as a simplified entity identifier, derived from it's 7name or address and used for cross-referencing of entity across different datasets. 8 9## Usage 10 11```python 12import fingerprints 13 14fp = fingerprints.generate('Mr. Sherlock Holmes') 15assert fp == 'holmes sherlock' 16 17fp = fingerprints.generate('Siemens Aktiengesellschaft') 18assert fp == 'ag siemens' 19 20fp = fingerprints.generate('New York, New York') 21assert fp == 'new york' 22``` 23 24## Company type names 25 26A significant part of what `fingerprints` does it to recognize company legal form 27names. For example, `fingerprints` will be able to simplify `Общество с ограниченной ответственностью` to `ООО`, or `Aktiengesellschaft` to `AG`. The required database 28is based on two different sources: 29 30* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP. 31* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list) 32 33Wikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity). 34 35## See also 36 37* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering. 38* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us. 39 40