• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

docs/H07-Feb-2018-

examples/H07-Feb-2018-

sparsepp/H03-May-2022-

tests/H07-Feb-2018-

.gitignoreH A D07-Feb-2018649

.travis.ymlH A D07-Feb-2018131

CHANGELOG.mdH A D07-Feb-2018640

LICENSEH A D07-Feb-20181.8 KiB

README.mdH A D07-Feb-201813.3 KiB

bench.mdH A D07-Feb-201815.2 KiB

spp.natvisH A D07-Feb-20181.6 KiB

README.md

1[![Build Status](https://travis-ci.org/greg7mdp/sparsepp.svg?branch=master)](https://travis-ci.org/greg7mdp/sparsepp)
2
3# Sparsepp: A fast, memory efficient hash map for C++
4
5Sparsepp is derived from Google's excellent [sparsehash](https://github.com/sparsehash/sparsehash) implementation. It aims to achieve the following objectives:
6
7- A drop-in alternative for unordered_map and unordered_set.
8- **Extremely low memory usage** (typically about one byte overhead per entry), and most important very small memory overhead when resizing.
9- **Very efficient**, typically faster than your compiler's unordered map/set or Boost's.
10- **C++11 support** (if supported by compiler).
11- ~~Single header~~ not anymore
12- **Tested** on Windows (vs2010-2015, g++), linux (g++, clang++) and MacOS (clang++).
13
14We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's `dense_hash_map` is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance, and insertions cause a resizing).
15
16This hash map. like Google's `dense_hash_map`, uses open addressing (meaning that the hash table entries are conceptually stored in a big array, and collisions are not resolved using a linked list, but by probing). A big drawback of open addressing is that when the array needs to be resized larger, the high mark for memory usage is typically 3 times the current map size (as we allocate a new array twice as big as the current one, and copy from the old array to the new one). Sparsepp's implementation resolves this memory usage issue, and the memory usage high mark shows only a small bump when resizing.
17
18For a detailed comparison of various hash implementations, including Sparsepp, please see our [write-up](bench.md).
19
20## Example
21
22```c++
23#include <iostream>
24#include <string>
25#include <sparsepp/spp.h>
26
27using spp::sparse_hash_map;
28
29int main()
30{
31    // Create an unordered_map of three strings (that map to strings)
32    sparse_hash_map<std::string, std::string> email =
33    {
34        { "tom",  "tom@gmail.com"},
35        { "jeff", "jk@gmail.com"},
36        { "jim",  "jimg@microsoft.com"}
37    };
38
39    // Iterate and print keys and values
40    for (const auto& n : email)
41        std::cout << n.first << "'s email is: " << n.second << "\n";
42
43    // Add a new entry
44    email["bill"] = "bg@whatever.com";
45
46    // and print it
47    std::cout << "bill's email is: " << email["bill"] << "\n";
48
49    return 0;
50}
51```
52
53## Installation
54
55No compilation is needed, as this is a header-only library. The installation consist in copying the sparsepp directory wherever it will be convenient to include in your project(s).
56
57## Warning - iterator and reference invalidation on erase/insert
58
591. erasing elements is likely to invalidate iterators (for example when calling `erase()`)
60
612. inserting new elements is likely to invalidate iterators (iterator invalidation can also happen with std::unordered_map if rehashing occurs due to the insertion)
62
633. references to values stored in a sparse_hash_map or set are likely to be invalidated on insert()/erase(). This is not the case for std::unordered_map or set.
64
65## Usage
66
67As shown in the example above, you need to include the header file: `#include <sparsepp/spp.h>`
68
69This provides the implementation for the following classes:
70
71```c++
72namespace spp
73{
74    template <class Key,
75              class T,
76              class HashFcn  = spp_hash<Key>,
77              class EqualKey = std::equal_to<Key>,
78              class Alloc    = libc_allocator_with_realloc<std::pair<const Key, T>>>
79    class sparse_hash_map;
80
81    template <class Value,
82              class HashFcn  = spp_hash<Value>,
83              class EqualKey = std::equal_to<Value>,
84              class Alloc    = libc_allocator_with_realloc<Value>>
85    class sparse_hash_set;
86};
87```
88
89These classes provide the same interface as std::unordered_map and std::unordered_set, with the following differences:
90
91- Calls to `erase()` may invalidate iterators. However, conformant to the C++11 standard, the position and range erase functions return an iterator pointing to the position immediately following the last of the elements erased. This makes it easy to traverse a sparse hash table and delete elements matching a condition. For example to delete odd values:
92
93   ```c++
94   for (auto it = c.begin(); it != c.end(); )
95       if (it->first % 2 == 1)
96          it = c.erase(it);
97       else
98          ++it;
99   ```
100
101   As for std::unordered_map, the order of the elements that are not erased is preserved.
102
103- Since items are not grouped into buckets, Bucket APIs have been adapted: `max_bucket_count` is equivalent to `max_size`, and `bucket_count` returns the sparsetable size, which is normally at least twice the number of items inserted into the hash_map.
104
105## Memory allocator on Windows (when building with Visual Studio)
106
107When building with the Microsoft compiler, we provide a custom allocator because the default one (from the Visual C++ runtime) fragments memory when reallocating.
108
109This is desirable *only* when creating large sparsepp hash maps. If you create lots of small hash_maps, memory usage may increase instead of decreasing as expected.  The reason is that, for each instance of a hash_map, the custom memory allocator creates a new memory space to allocate from, which is typically 4K, so it may be a big waste if just a few items are allocated.
110
111In order to use the custom spp allocator, define the following preprocessor variable before including `<spp/spp.h>`:
112
113`#define SPP_USE_SPP_ALLOC 1`
114
115## Integer keys, and other hash function considerations.
116
1171. For basic integer types, sparsepp provides a default hash function which does some mixing of the bits of the keys (see [Integer Hashing](http://burtleburtle.net/bob/hash/integer.html)). This prevents a pathological case where inserted keys are sequential (1, 2, 3, 4, ...), and the lookup on non-present keys becomes very slow.
118
119   Of course, the user of sparsepp may provide its own hash function,  as shown below:
120
121   ```c++
122   #include <sparsepp/spp.h>
123
124   struct Hash64 {
125       size_t operator()(uint64_t k) const { return (k ^ 14695981039346656037ULL) * 1099511628211ULL; }
126   };
127
128   struct Hash32 {
129       size_t operator()(uint32_t k) const { return (k ^ 2166136261U)  * 16777619UL; }
130   };
131
132   int main()
133   {
134       spp::sparse_hash_map<uint64_t, double, Hash64> map;
135       ...
136   }
137
138   ```
139
1402. When the user provides its own hash function, for example when inserting custom classes into a hash map, sometimes the resulting hash keys have similar low order bits and cause many collisions, decreasing the efficiency of the hash map. To address this use case, sparsepp provides an optional 'mixing' of the hash key (see [Integer Hash Function](https://gist.github.com/badboy/6267743) which can be enabled by defining the proprocessor macro: SPP_MIX_HASH.
141
142## Example 2 - providing a hash function for a user-defined class
143
144In order to use a sparse_hash_set or sparse_hash_map, a hash function should be provided. Even though a the hash function can be provided via the HashFcn template parameter, we recommend injecting a specialization of `std::hash` for the class into the "std" namespace. For example:
145
146```c++
147#include <iostream>
148#include <functional>
149#include <string>
150#include <sparsepp/spp.h>
151
152using std::string;
153
154struct Person
155{
156    bool operator==(const Person &o) const
157    { return _first == o._first && _last == o._last; }
158
159    string _first;
160    string _last;
161};
162
163namespace std
164{
165    // inject specialization of std::hash for Person into namespace std
166    // ----------------------------------------------------------------
167    template<>
168    struct hash<Person>
169    {
170        std::size_t operator()(Person const &p) const
171        {
172            std::size_t seed = 0;
173            spp::hash_combine(seed, p._first);
174            spp::hash_combine(seed, p._last);
175            return seed;
176        }
177    };
178}
179
180int main()
181{
182    // As we have defined a specialization of std::hash() for Person,
183    // we can now create sparse_hash_set or sparse_hash_map of Persons
184    // ----------------------------------------------------------------
185    spp::sparse_hash_set<Person> persons = { { "John", "Galt" },
186                                             { "Jane", "Doe" } };
187    for (auto& p: persons)
188        std::cout << p._first << ' ' << p._last << '\n';
189}
190```
191
192The `std::hash` specialization for `Person` combines the hash values for both first and last name using the convenient spp::hash_combine function, and returns the combined hash value.
193
194spp::hash_combine is provided by the header `sparsepp/spp.h`. However, class definitions often appear in header files, and it is desirable to limit the size of headers included in such header files, so we provide the very small header `sparsepp/spp_utils.h` for that purpose:
195
196```c++
197#include <string>
198#include <sparsepp/spp_utils.h>
199
200using std::string;
201
202struct Person
203{
204    bool operator==(const Person &o) const
205    {
206        return _first == o._first && _last == o._last && _age == o._age;
207    }
208
209    string _first;
210    string _last;
211    int    _age;
212};
213
214namespace std
215{
216    // inject specialization of std::hash for Person into namespace std
217    // ----------------------------------------------------------------
218    template<>
219    struct hash<Person>
220    {
221        std::size_t operator()(Person const &p) const
222        {
223            std::size_t seed = 0;
224            spp::hash_combine(seed, p._first);
225            spp::hash_combine(seed, p._last);
226            spp::hash_combine(seed, p._age);
227            return seed;
228        }
229    };
230}
231```
232
233## Example 3 - serialization
234
235sparse_hash_set and sparse_hash_map can easily be serialized/unserialized to a file or network connection.
236This support is implemented in the following APIs:
237
238```c++
239    template <typename Serializer, typename OUTPUT>
240    bool serialize(Serializer serializer, OUTPUT *stream);
241
242    template <typename Serializer, typename INPUT>
243    bool unserialize(Serializer serializer, INPUT *stream);
244```
245
246The following example demonstrates how a simple sparse_hash_map can be written to a file, and then read back. The serializer we use read and writes to a file using the stdio APIs, but it would be equally simple to write a serialized using the stream APIS:
247
248```c++
249#include <cstdio>
250
251#include <sparsepp/spp.h>
252
253using spp::sparse_hash_map;
254using namespace std;
255
256class FileSerializer
257{
258public:
259    // serialize basic types to FILE
260    // -----------------------------
261    template <class T>
262    bool operator()(FILE *fp, const T& value)
263    {
264        return fwrite((const void *)&value, sizeof(value), 1, fp) == 1;
265    }
266
267    template <class T>
268    bool operator()(FILE *fp, T* value)
269    {
270        return fread((void *)value, sizeof(*value), 1, fp) == 1;
271    }
272
273    // serialize std::string to FILE
274    // -----------------------------
275    bool operator()(FILE *fp, const string& value)
276    {
277        const size_t size = value.size();
278        return (*this)(fp, size) && fwrite(value.c_str(), size, 1, fp) == 1;
279    }
280
281    bool operator()(FILE *fp, string* value)
282    {
283        size_t size;
284        if (!(*this)(fp, &size))
285            return false;
286        char* buf = new char[size];
287        if (fread(buf, size, 1, fp) != 1)
288        {
289            delete [] buf;
290            return false;
291        }
292        new (value) string(buf, (size_t)size);
293        delete[] buf;
294        return true;
295    }
296
297    // serialize std::pair<const A, B> to FILE - needed for maps
298    // ---------------------------------------------------------
299    template <class A, class B>
300    bool operator()(FILE *fp, const std::pair<const A, B>& value)
301    {
302        return (*this)(fp, value.first) && (*this)(fp, value.second);
303    }
304
305    template <class A, class B>
306    bool operator()(FILE *fp, std::pair<const A, B> *value)
307    {
308        return (*this)(fp, (A *)&value->first) && (*this)(fp, &value->second);
309    }
310};
311
312int main(int argc, char* argv[])
313{
314    sparse_hash_map<string, int> age{ { "John", 12 }, {"Jane", 13 }, { "Fred", 8 } };
315
316    // serialize age hash_map to "ages.dmp" file
317    FILE *out = fopen("ages.dmp", "wb");
318    age.serialize(FileSerializer(), out);
319    fclose(out);
320
321    sparse_hash_map<string, int> age_read;
322
323    // read from "ages.dmp" file into age_read hash_map
324    FILE *input = fopen("ages.dmp", "rb");
325    age_read.unserialize(FileSerializer(), input);
326    fclose(input);
327
328    // print out contents of age_read to verify correct serialization
329    for (auto& v : age_read)
330        printf("age_read: %s -> %d\n", v.first.c_str(), v.second);
331}
332```
333
334## Thread safety
335
336Sparsepp follows the thread safety rules of the Standard C++ library. In Particular:
337
338- A single sparsepp hash table is thread safe for reading from multiple threads. For example, given a hash table A, it is safe to read A from thread 1 and from thread 2 simultaneously.
339
340- If a single hash table is being written to by one thread, then all reads and writes to that hash table on the same or other threads must be protected. For example, given a hash table A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.
341
342- It is safe to read and write to one instance of a type even if another thread is reading or writing to a different instance of the same type. For example, given hash tables A and B of the same type, it is safe if A is being written in thread 1 and B is being read in thread 2.
343