• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

bench_all_ivf/H27-May-2021-2,1361,527

distributed_ondisk/H27-May-2021-1,8841,320

link_and_code/H27-May-2021-940650

README.mdH A D27-May-202114.9 KiB339261

bench_6bit_codec.cppH A D27-May-20211.9 KiB7949

bench_for_interrupt.pyH A D27-May-20214 KiB156112

bench_gpu_1bn.pyH A D27-May-202121.8 KiB748509

bench_gpu_sift1m.pyH A D27-May-20211.9 KiB9240

bench_hamming_computer.cppH A D27-May-20212.6 KiB9468

bench_heap_replace.cppH A D27-May-20213.8 KiB141105

bench_hnsw.pyH A D27-May-20214.5 KiB193114

bench_index_flat.pyH A D27-May-20212.1 KiB8853

bench_index_pq.pyH A D27-May-2021547 2313

bench_pairwise_distances.pyH A D27-May-2021782 3720

bench_partition.pyH A D27-May-20212.2 KiB7956

bench_polysemous_1bn.pyH A D27-May-20216.9 KiB252153

bench_polysemous_sift1m.pyH A D27-May-20211 KiB4827

bench_pq_tables.pyH A D27-May-20212.2 KiB7954

bench_quantizer.pyH A D27-May-20211.5 KiB6546

bench_scalar_quantizer.pyH A D27-May-20212.7 KiB8359

bench_vector_ops.pyH A D27-May-20212.1 KiB8655

datasets.pyH A D27-May-20211.1 KiB4629

kmeans_mnist.pyH A D27-May-20212.1 KiB9055

README.md

1
2# Benchmarking scripts
3
4This directory contains benchmarking scripts that can reproduce the
5numbers reported in the two papers
6
7```
8@inproceedings{DJP16,
9  Author = {Douze, Matthijs and J{\'e}gou, Herv{\'e} and Perronnin, Florent},
10  Booktitle = "ECCV",
11  Organization = {Springer},
12  Title = {Polysemous codes},
13  Year = {2016}
14}
15```
16and
17
18```
19@inproceedings{JDJ17,
20   Author = {Jeff Johnson and Matthijs Douze and Herv{\'e} J{\'e}gou},
21   journal= {arXiv:1702.08734},,
22   Title = {Billion-scale similarity search with GPUs},
23   Year = {2017},
24}
25```
26
27Note that the numbers (especially timings) change slightly due to changes in the implementation, different machines, etc.
28
29The scripts are self-contained. They depend only on Faiss and external training data that should be stored in sub-directories.
30
31## SIFT1M experiments
32
33The script [`bench_polysemous_sift1m.py`](bench_polysemous_sift1m.py) reproduces the numbers in
34Figure 3 from the "Polysemous" paper.
35
36### Getting SIFT1M
37
38To run it, please download the ANN_SIFT1M dataset from
39
40http://corpus-texmex.irisa.fr/
41
42and unzip it to the subdirectory sift1M.
43
44### Result
45
46The output looks like:
47
48```
49PQ training on 100000 points, remains 0 points: training polysemous on centroids
50add vectors to index
51PQ baseline        7.517 ms per query, R@1 0.4474
52Polysemous 64      9.875 ms per query, R@1 0.4474
53Polysemous 62      8.358 ms per query, R@1 0.4474
54Polysemous 58      5.531 ms per query, R@1 0.4474
55Polysemous 54      3.420 ms per query, R@1 0.4478
56Polysemous 50      2.182 ms per query, R@1 0.4475
57Polysemous 46      1.621 ms per query, R@1 0.4408
58Polysemous 42      1.448 ms per query, R@1 0.4174
59Polysemous 38      1.331 ms per query, R@1 0.3563
60Polysemous 34      1.334 ms per query, R@1 0.2661
61Polysemous 30      1.272 ms per query, R@1 0.1794
62```
63
64
65## Experiments on 1B elements dataset
66
67The script [`bench_polysemous_1bn.py`](bench_polysemous_1bn.py) reproduces a few experiments on
68two datasets of size 1B from the Polysemous codes" paper.
69
70
71### Getting BIGANN
72
73Download the four files of ANN_SIFT1B from
74http://corpus-texmex.irisa.fr/ to subdirectory bigann/
75
76### Getting Deep1B
77
78The ground-truth and queries are available here
79
80https://yadi.sk/d/11eDCm7Dsn9GA
81
82For the learning and database vectors, use the script
83
84https://github.com/arbabenko/GNOIMI/blob/master/downloadDeep1B.py
85
86to download the data to subdirectory deep1b/, then concatenate the
87database files to base.fvecs and the training files to learn.fvecs
88
89### Running the experiments
90
91These experiments are quite long. To support resuming, the script
92stores the result of training to a temporary directory, `/tmp/bench_polysemous`.
93
94The script `bench_polysemous_1bn.py` takes at least two arguments:
95
96- the dataset name: SIFT1000M (aka SIFT1B, aka BIGANN) or Deep1B. SIFT1M, SIFT2M,... are also supported to make subsets of for small experiments (note that SIFT1M as a subset of SIFT1B is not the same as the SIFT1M above)
97
98- the type of index to build, which should be a valid [index_factory key](https://github.com/facebookresearch/faiss/wiki/High-level-interface-and-auto-tuning#index-factory) (see below for examples)
99
100- the remaining arguments are parsed as search-time parameters.
101
102### Experiments of Table 2
103
104The `IMI*+PolyD+ADC` results in Table 2 can be reproduced with (for 16 bytes):
105
106```
107python bench_polysemous_1bn.par SIFT1000M IMI2x12,PQ16 nprobe=16,max_codes={10000,30000},ht={44..54}
108```
109
110Training takes about 2 minutes and adding vectors to the dataset
111takes 3.1 h. These operations are multithreaded. Note that in the command
112above, we use bash's [brace expansion](https://www.gnu.org/software/bash/manual/html_node/Brace-Expansion.html) to set a grid of parameters.
113
114The search is *not* multithreaded, and the output looks like:
115
116```
117                                        R@1    R@10   R@100     time    %pass
118nprobe=16,max_codes=10000,ht=44         0.1779 0.2994 0.3139    0.194   12.45
119nprobe=16,max_codes=10000,ht=45         0.1859 0.3183 0.3339    0.197   14.24
120nprobe=16,max_codes=10000,ht=46         0.1930 0.3366 0.3543    0.202   16.22
121nprobe=16,max_codes=10000,ht=47         0.1993 0.3550 0.3745    0.209   18.39
122nprobe=16,max_codes=10000,ht=48         0.2033 0.3694 0.3917    0.640   20.77
123nprobe=16,max_codes=10000,ht=49         0.2070 0.3839 0.4077    0.229   23.36
124nprobe=16,max_codes=10000,ht=50         0.2101 0.3949 0.4205    0.232   26.17
125nprobe=16,max_codes=10000,ht=51         0.2120 0.4042 0.4310    0.239   29.21
126nprobe=16,max_codes=10000,ht=52         0.2134 0.4113 0.4402    0.245   32.47
127nprobe=16,max_codes=10000,ht=53         0.2157 0.4184 0.4482    0.250   35.96
128nprobe=16,max_codes=10000,ht=54         0.2170 0.4240 0.4546    0.256   39.66
129nprobe=16,max_codes=30000,ht=44         0.1882 0.3327 0.3555    0.226   11.29
130nprobe=16,max_codes=30000,ht=45         0.1964 0.3525 0.3771    0.231   13.05
131nprobe=16,max_codes=30000,ht=46         0.2039 0.3713 0.3987    0.236   15.01
132nprobe=16,max_codes=30000,ht=47         0.2103 0.3907 0.4202    0.245   17.19
133nprobe=16,max_codes=30000,ht=48         0.2145 0.4055 0.4384    0.251   19.60
134nprobe=16,max_codes=30000,ht=49         0.2179 0.4198 0.4550    0.257   22.25
135nprobe=16,max_codes=30000,ht=50         0.2208 0.4305 0.4681    0.268   25.15
136nprobe=16,max_codes=30000,ht=51         0.2227 0.4402 0.4791    0.275   28.30
137nprobe=16,max_codes=30000,ht=52         0.2241 0.4473 0.4884    0.284   31.70
138nprobe=16,max_codes=30000,ht=53         0.2265 0.4544 0.4965    0.294   35.34
139nprobe=16,max_codes=30000,ht=54         0.2278 0.4601 0.5031    0.303   39.20
140```
141
142The result reported in table 2 is the one for which the %pass (percentage of code comparisons that pass the Hamming check) is around 20%, which occurs for Hamming threshold `ht=48`.
143
144The 8-byte results can be reproduced with the factory key `IMI2x12,PQ8`
145
146### Experiments of the appendix
147
148The experiments in the appendix are only in the ArXiv version of the paper (table 3).
149
150```
151python bench_polysemous_1bn.py SIFT1000M OPQ8_64,IMI2x13,PQ8 nprobe={1,2,4,8,16,32,64,128},ht={20,24,26,28,30}
152
153               	R@1    R@10   R@100     time    %pass
154nprobe=1,ht=20 	0.0351 0.0616 0.0751    0.158   19.01
155...
156nprobe=32,ht=28 	0.1256 0.3563 0.5026    0.561   52.61
157...
158```
159Here again the runs are not exactly the same but the original result was obtained from nprobe=32,ht=28.
160
161For Deep1B, we used a simple version of [auto-tuning](https://github.com/facebookresearch/faiss/wiki/High-level-interface-and-auto-tuning/_edit#auto-tuning-the-runtime-parameters) to sweep through the set of operating points:
162
163```
164python bench_polysemous_1bn.py Deep1B OPQ20_80,IMI2x14,PQ20 autotune
165...
166Done in 4067.555 s, available OPs:
167Parameters                                1-R@1     time
168                                          0.0000    0.000
169nprobe=1,ht=22,max_codes=256              0.0215    3.115
170nprobe=1,ht=30,max_codes=256              0.0381    3.120
171...
172nprobe=512,ht=68,max_codes=524288         0.4478   36.903
173nprobe=1024,ht=80,max_codes=131072        0.4557   46.363
174nprobe=1024,ht=78,max_codes=262144        0.4616   61.939
175...
176```
177The original results were obtained with `nprobe=1024,ht=66,max_codes=262144`.
178
179
180## GPU experiments
181
182The benchmarks below run 1 or 4 Titan X GPUs and reproduce the results of the "GPU paper". They are also a good starting point on how to use GPU Faiss.
183
184### Search on SIFT1M
185
186See above on how to get SIFT1M into subdirectory sift1M/. The script [`bench_gpu_sift1m.py`](bench_gpu_sift1m.py) reproduces the "exact k-NN time" plot in the ArXiv paper, and the SIFT1M numbers.
187
188The output is:
189```
190============ Exact search
191add vectors to index
192warmup
193benchmark
194k=1 0.715 s, R@1 0.9914
195k=2 0.729 s, R@1 0.9935
196k=4 0.731 s, R@1 0.9935
197k=8 0.732 s, R@1 0.9935
198k=16 0.742 s, R@1 0.9935
199k=32 0.737 s, R@1 0.9935
200k=64 0.753 s, R@1 0.9935
201k=128 0.761 s, R@1 0.9935
202k=256 0.799 s, R@1 0.9935
203k=512 0.975 s, R@1 0.9935
204k=1024 1.424 s, R@1 0.9935
205============ Approximate search
206train
207WARNING clustering 100000 points to 4096 centroids: please provide at least 159744 training points
208add vectors to index
209WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 256000000 B, highwater 256000000 B)
210warmup
211benchmark
212nprobe=   1 0.043 s recalls= 0.3909 0.4312 0.4312
213nprobe=   2 0.040 s recalls= 0.5041 0.5636 0.5636
214nprobe=   4 0.048 s recalls= 0.6048 0.6897 0.6897
215nprobe=   8 0.064 s recalls= 0.6879 0.8028 0.8028
216nprobe=  16 0.088 s recalls= 0.7534 0.8940 0.8940
217nprobe=  32 0.134 s recalls= 0.7957 0.9549 0.9550
218nprobe=  64 0.224 s recalls= 0.8125 0.9833 0.9834
219nprobe= 128 0.395 s recalls= 0.8205 0.9953 0.9954
220nprobe= 256 0.717 s recalls= 0.8227 0.9993 0.9994
221nprobe= 512 1.348 s recalls= 0.8228 0.9999 1.0000
222```
223The run produces two warnings:
224
225- the clustering complains that it does not have enough training data, there is not much we can do about this.
226
227- the add() function complains that there is an inefficient memory allocation, but this is a concern only when it happens often, and we are not benchmarking the add time anyways.
228
229To index small datasets, it is more efficient to use a `GpuIVFFlat`, which just stores the full vectors in the inverted lists. We did not mention this in the the paper because it is not as scalable. To experiment with this setting, change the `index_factory` string from "IVF4096,PQ64" to "IVF16384,Flat". This gives:
230
231```
232nprobe=   1 0.025 s recalls= 0.4084 0.4105 0.4105
233nprobe=   2 0.033 s recalls= 0.5235 0.5264 0.5264
234nprobe=   4 0.033 s recalls= 0.6332 0.6367 0.6367
235nprobe=   8 0.040 s recalls= 0.7358 0.7403 0.7403
236nprobe=  16 0.049 s recalls= 0.8273 0.8324 0.8324
237nprobe=  32 0.068 s recalls= 0.8957 0.9024 0.9024
238nprobe=  64 0.104 s recalls= 0.9477 0.9549 0.9549
239nprobe= 128 0.174 s recalls= 0.9760 0.9837 0.9837
240nprobe= 256 0.299 s recalls= 0.9866 0.9944 0.9944
241nprobe= 512 0.527 s recalls= 0.9907 0.9987 0.9987
242```
243
244### Clustering on MNIST8m
245
246To get the "infinite MNIST dataset", follow the instructions on [Léon Bottou's website](http://leon.bottou.org/projects/infimnist). The script assumes the file `mnist8m-patterns-idx3-ubyte` is in subdirectory `mnist8m`
247
248The script [`kmeans_mnist.py`](kmeans_mnist.py) produces the following output:
249
250```
251python kmeans_mnist.py 1 256
252...
253Clustering 8100000 points in 784D to 256 clusters, redo 1 times, 20 iterations
254  Preprocessing in 7.94526 s
255  Iteration 19 (131.697 s, search 114.78 s): objective=1.44881e+13 imbalance=1.05963 nsplit=0
256final objective: 1.449e+13
257total runtime: 140.615 s
258```
259
260### search on SIFT1B
261
262The script [`bench_gpu_1bn.py`](bench_gpu_1bn.py) runs multi-gpu searches on the two 1-billion vector datasets we considered. It is more complex than the previous scripts, because it supports many search options and decomposes the dataset build process in Python to exploit the best possible CPU/GPU parallelism and GPU distribution.
263
264Even on multiple GPUs, building the 1B datasets can last several hours. It is often a good idea to validate that everything is working fine on smaller datasets like SIFT1M, SIFT2M, etc.
265
266The search results on SIFT1B in the "GPU paper" can be obtained with
267
268<!-- see P57124181 -->
269
270```
271python bench_gpu_1bn.py SIFT1000M OPQ8_32,IVF262144,PQ8 -nnn 10 -ngpu 1 -tempmem $[1536*1024*1024]
272...
2730/10000 (0.024 s)      probe=1  : 0.161 s 1-R@1: 0.0752 1-R@10: 0.1924
2740/10000 (0.005 s)      probe=2  : 0.150 s 1-R@1: 0.0964 1-R@10: 0.2693
2750/10000 (0.005 s)      probe=4  : 0.153 s 1-R@1: 0.1102 1-R@10: 0.3328
2760/10000 (0.005 s)      probe=8  : 0.170 s 1-R@1: 0.1220 1-R@10: 0.3827
2770/10000 (0.005 s)      probe=16 : 0.196 s 1-R@1: 0.1290 1-R@10: 0.4151
2780/10000 (0.006 s)      probe=32 : 0.244 s 1-R@1: 0.1314 1-R@10: 0.4345
2790/10000 (0.006 s)      probe=64 : 0.353 s 1-R@1: 0.1332 1-R@10: 0.4461
2800/10000 (0.005 s)      probe=128: 0.587 s 1-R@1: 0.1341 1-R@10: 0.4502
2810/10000 (0.006 s)      probe=256: 1.160 s 1-R@1: 0.1342 1-R@10: 0.4511
282```
283
284We use the `-tempmem` option to reduce the temporary memory allocation to 1.5G, otherwise the dataset does not fit in GPU memory
285
286### search on Deep1B
287
288The same script generates the GPU search results on Deep1B.
289
290```
291python bench_gpu_1bn.py  Deep1B OPQ20_80,IVF262144,PQ20 -nnn 10 -R 2 -ngpu 4 -altadd -noptables -tempmem $[1024*1024*1024]
292...
293
2940/10000 (0.115 s)      probe=1  : 0.239 s 1-R@1: 0.2387 1-R@10: 0.3420
2950/10000 (0.006 s)      probe=2  : 0.103 s 1-R@1: 0.3110 1-R@10: 0.4623
2960/10000 (0.005 s)      probe=4  : 0.105 s 1-R@1: 0.3772 1-R@10: 0.5862
2970/10000 (0.005 s)      probe=8  : 0.116 s 1-R@1: 0.4235 1-R@10: 0.6889
2980/10000 (0.005 s)      probe=16 : 0.133 s 1-R@1: 0.4517 1-R@10: 0.7693
2990/10000 (0.005 s)      probe=32 : 0.168 s 1-R@1: 0.4713 1-R@10: 0.8281
3000/10000 (0.005 s)      probe=64 : 0.238 s 1-R@1: 0.4841 1-R@10: 0.8649
3010/10000 (0.007 s)      probe=128: 0.384 s 1-R@1: 0.4900 1-R@10: 0.8816
3020/10000 (0.005 s)      probe=256: 0.736 s 1-R@1: 0.4933 1-R@10: 0.8912
303```
304
305Here we are a bit tight on memory so we disable precomputed tables (`-noptables`) and restrict the amount of temporary memory. The `-altadd` option avoids GPU memory overflows during add.
306
307
308### knn-graph on Deep1B
309
310The same script generates the KNN-graph on Deep1B. Note that the inverted file from above will not be re-used because the training sets are different. For the knngraph, the script will first do a pass over the whole dataset to compute the ground-truth knn for a subset of 10k nodes, for evaluation.
311
312```
313python bench_gpu_1bn.py Deep1B OPQ20_80,IVF262144,PQ20 -nnn 10 -altadd -knngraph  -R 2 -noptables -tempmem $[1<<30] -ngpu 4
314...
315CPU index contains 1000000000 vectors, move to GPU
316Copy CPU index to 2 sharded GPU indexes
317   dispatch to GPUs 0:2
318IndexShards shard 0 indices 0:500000000
319  IndexIVFPQ size 500000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
320IndexShards shard 1 indices 500000000:1000000000
321  IndexIVFPQ size 500000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
322   dispatch to GPUs 2:4
323IndexShards shard 0 indices 0:500000000
324  IndexIVFPQ size 500000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
325IndexShards shard 1 indices 500000000:1000000000
326  IndexIVFPQ size 500000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
327move to GPU done in 151.535 s
328search...
329999997440/1000000000 (8389.961 s, 0.3379)      probe=1  : 8389.990 s rank-10 intersection results: 0.3379
330999997440/1000000000 (9205.934 s, 0.4079)      probe=2  : 9205.966 s rank-10 intersection results: 0.4079
331999997440/1000000000 (9741.095 s, 0.4722)      probe=4  : 9741.128 s rank-10 intersection results: 0.4722
332999997440/1000000000 (10830.420 s, 0.5256)      probe=8  : 10830.455 s rank-10 intersection results: 0.5256
333999997440/1000000000 (12531.716 s, 0.5603)      probe=16 : 12531.758 s rank-10 intersection results: 0.5603
334999997440/1000000000 (15922.519 s, 0.5825)      probe=32 : 15922.571 s rank-10 intersection results: 0.5825
335999997440/1000000000 (22774.153 s, 0.5950)      probe=64 : 22774.220 s rank-10 intersection results: 0.5950
336999997440/1000000000 (36717.207 s, 0.6015)      probe=128: 36717.309 s rank-10 intersection results: 0.6015
337999997440/1000000000 (70616.392 s, 0.6047)      probe=256: 70616.581 s rank-10 intersection results: 0.6047
338```
339