• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.circleci/H20-Dec-2020-2824

v2/H20-Dec-2020-1,3841,011

.deepsource.tomlH A D20-Dec-2020156 107

README.mdH A D20-Dec-20204.3 KiB12987

codecov.ymlH A D20-Dec-2020113 128

go.modH A D20-Dec-202047 42

README.md

1
2[![GoDoc](https://godoc.org/github.com/holiman/bloomfilter?status.png)](https://godoc.org/github.com/holiman/bloomfilter)
3[![CircleCI](https://circleci.com/gh/holiman/bloomfilter.svg?style=svg)](https://app.circleci.com/pipelines/github/holiman/bloomfilter)
4[![codecov](https://codecov.io/gh/holiman/bloomfilter/branch/master/graph/badge.svg?token=O48l6LbHkL)](https://codecov.io/gh/holiman/bloomfilter)
5[![DeepSource](https://deepsource.io/gh/holiman/bloomfilter.svg/?label=active+issues&show_trend=true)](https://deepsource.io/gh/holiman/bloomfilter/?ref=repository-badge)
6
7# History
8
9This bloom filter implementation is a fork from [steakknife/bloomfilter](https://github.com/steakknife/bloomfilter) by Barry Allard.
10The upstream project is now archived, so this fork exists to fix some bugs and also
11make a few improvements. Below is the original description.
12
13The original implemenation is Copyright © 2014-2016,2018 Barry Allard
14[MIT license](MIT-LICENSE.txt)
15
16All recent changes are copyright © 2019-2020 Martin Holst Swende.
17
18## Installation
19
20```
21$ go get github.com/holiman/bloomfilter
22```
23
24## Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
25
26### WTF is a bloom filter
27
28**TL;DR:** Probabilistic, extra lookup table to track a set of elements kept elsewhere to reduce expensive, unnecessary set element retrieval and/or iterator operations **when an element is not present in the set.** It's a classic time-storage tradeoff algoritm.
29
30### Properties
31
32#### [See wikipedia](https://en.wikipedia.org/wiki/Bloom_filter) for algorithm details
33
34|Impact|What|Description|
35|---|---|---|
36|Good|No false negatives|know for certain if a given element is definitely NOT in the set|
37|Bad|False positives|uncertain if a given element is in the set|
38|Bad|Theoretical potential for hash collisions|in very large systems and/or badly hash.Hash64-conforming implementations|
39|Bad|Add only|Cannot remove an element, it would destroy information about other elements|
40|Good|Constant storage|uses only a fixed amount of memory|
41
42## Naming conventions
43
44(Similar to algorithm)
45
46|Variable/function|Description|Range|
47|---|---|---|
48|m/M()|number of bits in the bloom filter (memory representation is about m/8 bytes in size)|>=2|
49|n/N()|number of elements present|>=0|
50|k/K()|number of keys to use (keys are kept private to user code but are de/serialized to Marshal and file I/O)|>=0|
51|maxN|maximum capacity of intended structure|>0|
52|p|maximum allowed probability of collision (for computing m and k for optimal sizing)|>0..<1|
53
54- Memory representation should be exactly `24 + 8*(k + (m+63)/64) + unsafe.Sizeof(RWMutex)` bytes.
55- Serialized (`BinaryMarshaler`) representation should be exactly `72 + 8*(k + (m+63)/64)` bytes. (Disk format is less due to compression.)
56
57## Binary serialization format
58
59All values in Little-endian format
60
61|Offset|Offset (Hex)|Length (bytes)|Name|Type|
62|---|---|---|---|---|
63|0|00|12|magic + version number|`\0\0\0\0\0\0\0\0v02\n`|
64|12|0c|8|k|`uint64`|
65|20|14|8|n|`uint64`|
66|28|1c|8|m|`uint64`|
67|36|24|k|(keys)|`[k]uint64`|
68|36+8*k|...|(m+63)/64|(bloom filter)|`[(m+63)/64]uint64`|
69|36+8\*k+8\*((m+63)/64)|...|48|(SHA384 of all previous fields, hashed in order)|`[48]byte`|
70
71- `bloomfilter.Filter` conforms to `encoding.BinaryMarshaler` and `encoding.BinaryUnmarshaler'
72
73## Usage
74
75```go
76
77import "github.com/holiman/bloomfilter"
78
79const (
80  maxElements = 100000
81  probCollide = 0.0000001
82)
83
84bf, err := bloomfilter.NewOptimal(maxElements, probCollide)
85if err != nil {
86  panic(err)
87}
88
89someValue := ... // must conform to hash.Hash64
90
91bf.Add(someValue)
92if bf.Contains(someValue) { // probably true, could be false
93  // whatever
94}
95
96anotherValue := ... // must also conform to hash.Hash64
97
98if bf.Contains(anotherValue) {
99  panic("This should never happen")
100}
101
102err := bf.WriteFile("1.bf.gz")  // saves this BF to a file
103if err != nil {
104  panic(err)
105}
106
107bf2, err := bloomfilter.ReadFile("1.bf.gz") // read the BF to another var
108if err != nil {
109  panic(err)
110}
111```
112
113
114## Design
115
116Where possible, branch-free operations are used to avoid deep pipeline / execution unit stalls on branch-misses.
117
118## Contact
119
120- [Issues](https://github.com/holiman/bloomfilter/issues)
121
122## License
123
124[MIT license](MIT-LICENSE.txt)
125
126Copyright © 2014-2016 Barry Allard
127Copyright © 2019-2020 Martin Holst Swende
128
129