• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..04-Jun-2021-

COPYINGH A D26-Sep-20181.1 KiB2019

READMEH A D26-Sep-20187.5 KiB164124

config.hH A D26-Sep-2018373 153

farmhash.ccH A D26-Sep-2018276.5 KiB11,84111,204

farmhash.hH A D03-May-202211.6 KiB332160

README

1FarmHash, a family of hash functions.
2Version 1.1
3
4Introduction
5============
6
7A general overview of hash functions and their use is available in the file
8Understanding_Hash_Functions in this directory.  It may be helpful to read it
9before using FarmHash.
10
11FarmHash provides hash functions for strings and other data.  The functions
12mix the input bits thoroughly but are not suitable for cryptography.  See
13"Hash Quality," below, for details on how FarmHash was tested and so on.
14
15We provide reference implementations in C++, with a friendly MIT license.
16
17All members of the FarmHash family were designed with heavy reliance on
18previous work by Jyrki Alakuijala, Austin Appleby, Bob Jenkins, and others.
19
20
21Recommended Usage
22=================
23
24Our belief is that the typical hash function is mostly used for in-memory hash
25tables and similar.  That use case allows hash functions that differ on
26different platforms, and that change from time to time.  For this, I recommend
27using wrapper functions in a .h file with comments such as, "may change from
28time to time, may differ on different platforms, and may change depending on
29NDEBUG."
30
31Some projects may also require a forever-fixed, portable hash function.  Again
32we recommend using wrapper functions in a .h, but in this case the comments on
33them would be very different.
34
35We have provided a sample of these wrapper functions in src/farmhash.h.  Our
36hope is that most people will need nothing more than src/farmhash.h and
37src/farmhash.cc.  Those two files are a usable and relatively portable library.
38(One portability snag: if your compiler doesn't have __builtin_expect then
39you may need to define FARMHASH_NO_BUILTIN_EXPECT.)  For those that prefer
40using a configure script (perhaps because they want to "make install" later),
41FarmHash has one, but for many people it's best to ignore it.
42
43Note that the wrapper functions such as Hash() in src/farmhash.h can select
44one of several hash functions.  The selection is done at compile time, based
45on your machine architecture (e.g., sizeof(size_t)) and the availability of
46vector instructions (e.g., SSE4.1).
47
48To get the best performance from FarmHash, one will need to think a bit about
49when to use compiler flags that allow vector instructions and such: -maes,
50-msse4.2, -mavx, etc., or their equivalents for other compilers.  Those are
51the g++ flags that make g++ emit more types of machine instructions than it
52otherwise would.  For example, if you are confident that you will only be
53using FarmHash on systems with SSE4.2 and/or AES, you may communicate that to
54the compiler as explained in src/farmhash.cc.  If not, use -maes, -mavx, etc.,
55when you can, and the appropriate choices will be made by via conditional
56compilation in src/farmhash.cc.
57
58It may be beneficial to try -O3 or other compiler flags as well.  I also have
59found feedback-directed optimization (FDO) to improve the speed of FarmHash.
60
61The "configure" script: creating config.h
62=========================================
63
64We provide reference implementations of several FarmHash functions, written in
65C++.  The build system is based on autoconf.  It defaults the C++ compiler
66flags to "-g -O2", which may or may not be best.
67
68If you are planning to use the configure script, I generally recommend
69trying this first, unless you know that your system lacks AVX and/or AESNI:
70
71  ./configure CXXFLAGS="-g -mavx -maes -O3"
72  make all check
73
74If that fails, you can retry with -mavx and/or -maes removed, or with -mavx replaced by
75-msse4.1 or -msse4.2.
76
77Please see below for thoughts on cross-platform testing, if that is a concern.
78Finally, if you want to install a library, you may use
79
80  make install
81
82Some useful flags for configure include:
83
84  --enable-optional-builtin-expect: This causes __builtin_expect to be optional.
85    If you don't use this flag, the assumption is that FarmHash will be compiled
86    with compilers that provide __builtin_expect.  In practice, some FarmHash
87    variants may be slightly faster if __builtin_expect is available, but it
88    isn't very important and affects speed only.
89
90Further Details
91===============
92
93The above instructions will produce a single source-level library that
94includes multiple hash functions.  It will use conditional compilation, and
95perhaps GCC's multiversioning, to select among the functions.  In addition,
96"make all check" will create an object file using your chosen compiler, and
97test it.  The object file won't necessarily contain all the code that would be
98used if you were to compile the code on other platforms.  The downside of this
99is obvious: the paths not tested may not actually work if and when you try
100them.  The FarmHash developers try hard to prevent such problems; please let
101us know if you find bugs.
102
103To aid your cross-platform testing, for each relevant platform you may
104compile your program that uses farmhash.cc with the preprocessor flag
105FARMHASHSELFTEST equal to 1.  This causes a FarmHash self test to run
106at program startup; the self test writes output to stdout and then
107calls std::exit().  You can see this in action by running "make check":
108see src/farm-test.cc for details.
109
110There's also a trivial workaround to force particular functions to be used:
111modify the wrapper functions in hash.h.  You can prevent choices being made via
112conditional compilation or multiversioning by choosing FarmHash variants with
113names like farmhashaa::Hash32, farmhashab::Hash64, etc.: those compute the same
114hash function regardless of conditional compilation, multiversioning, or
115endianness.  Consult their comments and ifdefs to learn their requirements: for
116example, they are not all guaranteed to work on all platforms.
117
118Known Issues
119============
120
1211) FarmHash was developed with little-endian architectures in mind.  It should
122work on big-endian too, but less work has gone into optimizing for those
123platforms.  To make FarmHash work properly on big-endian platforms you may
124need to modify the wrapper .h file and/or your compiler flags to arrange for
125FARMHASH_BIG_ENDIAN to be defined, though there is logic that tries to figure
126it out automatically.
127
1282) FarmHash's implementation is fairly complex.
129
1303) The techniques described in dev/INSTRUCTIONS to let hash function
131developers regenerate src/*.cc from dev/* are hacky and not so portable.
132
133Hash Quality
134============
135
136We like to test hash functions with SMHasher, among other things.
137SMHasher isn't perfect, but it seems to find almost any significant flaw.
138SMHasher is available at http://code.google.com/p/smhasher/
139
140SMHasher is designed to pass a 32-bit seed to the hash functions it tests.
141For our functions that accept a seed, we use the given seed directly (padded
142with zeroes as needed); for our functions that don't accept a seed, we hash
143the concatenation of the given seed and the input string.
144
145Some minor flaws in 32-bit and 64-bit functions are harmless, as we
146expect the primary use of these functions will be in hash tables.  We
147may have gone slightly overboard in trying to please SMHasher and other
148similar tests, but we don't want anyone to choose a different hash function
149because of some minor issue reported by a quality test.
150
151If your setup is similar enough to mine, it's easy to use SMHasher and other
152tools yourself via the "builder" in the dev directory.  See dev/INSTRUCTIONS.
153(Improvements to that directory are a relatively low priority, and code
154there is never going to be as portable as the other parts of FarmHash.)
155
156For more information
157====================
158
159http://code.google.com/p/farmhash/
160
161farmhash-discuss@googlegroups.com
162
163Please feel free to send us comments, questions, bug reports, or patches.
164