• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

misc/H03-May-2022-

pkg/H03-May-2022-141103

src/H03-May-2022-2,1501,462

.travis.ymlH A D29-Apr-2020634 3731

LICENSEH A D29-Apr-20201.4 KiB2625

README.mdH A D29-Apr-20207.1 KiB164129

README.md

1# Concurrent trie-hash map
2
3[![Build Status](https://travis-ci.com/rmind/thmap.svg?branch=master)](https://travis-ci.com/rmind/thmap)
4
5Concurrent trie-hash map library -- a general purpose associative array,
6combining the elements of hashing and radix trie.  Highlights:
7- Very competitive performance, with logarithmic time complexity on average.
8- Lookups are lock-free and inserts/deletes are using fine-grained locking.
9- Incremental growth of the data structure (no large resizing/rehashing).
10- Optional support for use with shared memory, e.g. memory-mapped file.
11
12The implementation is written in C11 and distributed under the 2-clause
13BSD license.
14
15NOTE: Delete operations (the key/data destruction) must be synchronised with
16the readers using some reclamation mechanism.  You can use the Epoch-based
17Reclamation (EBR) library provided [HERE](https://github.com/rmind/libqsbr).
18
19References (some, but not all, key ideas are based on these papers):
20
21- [W. Litwin. Trie Hashing. Proceedings of the 1981 ACM SIGMOD, p. 19-29
22](https://dl.acm.org/citation.cfm?id=582322)
23
24- [P. L. Lehman and S. B. Yao.
25Efficient locking for concurrent operations on B-trees.
26ACM TODS, 6(4):650-670, 1981
27](https://www.csd.uoc.gr/~hy460/pdf/p650-lehman.pdf)
28
29## API
30
31* `thmap_t *thmap_create(uintptr_t baseptr, const thmap_ops_t *ops, unsigned flags)`
32  * Construct a new trie-hash map.  The optional `ops` parameter can
33  used to set the custom allocate/free operations (see the description
34  of `thmap_ops_t` below).  In such case, the `baseptr` is the base (start)
35  address of the address space mapping (it must be word-aligned).  If `ops`
36  is set to `NULL`, then _malloc(3)_ and _free(3)_ will be used as the
37  default operations and `baseptr` should be
38  set to zero.  Currently, the supported `flags` are:
39    * `THMAP_NOCOPY`: the keys on insert will not be copied and the given
40    pointers to them will be expected to be valid and the values constant
41    until the key is deleted; by default, the put operation will make a
42    copy of the key.
43    * `THMAP_SETROOT`: indicate that the root of the map will be manually
44    set using the `thmap_setroot` routine; by default, the map is initialised
45    and the root node is set on `thmap_create`.
46
47* `void thmap_destroy(thmap_t *hmap)`
48  * Destroy the map, freeing the memory it uses.
49
50* `void *thmap_get(thmap_t *hmap, const void *key, size_t len)`
51  * Lookup the key (of a given length) and return the value associated with it.
52  Return `NULL` if the key is not found (see the caveats section).
53
54* `void *thmap_put(thmap_t *hmap, const void *key, size_t len, void *val)`
55  * Insert the key with an arbitrary value.  If the key is already present,
56  return the already existing associated value without changing it.
57  Otherwise, on a successful insert, return the given value.  Just compare
58  the result against `val` to test whether the insert was successful.
59
60* `void *thmap_del(thmap_t *hmap, const void *key, size_t len)`
61  * Remove the given key.  If the key was present, return the associated
62  value; otherwise return `NULL`.  The memory associated with the entry is
63  not released immediately, because in the concurrent environment (e.g.
64  multi-threaded application) the caller may need to ensure it is safe to
65  do so.  It is managed using the `thmap_stage_gc` and `thmap_gc` routines.
66
67* `void *thmap_stage_gc(thmap_t *hmap)`
68  * Stage the currently pending entries (the memory not yet released after
69  the deletion) for reclamation (G/C).  This operation should be called
70  **before** the synchronisation barrier.
71  * Returns a reference which must be passed to `thmap_gc`.  Not calling the
72  G/C function for the returned reference would result in a memory leak.
73
74* `void thmap_gc(thmap_t *hmap, void *ref)`
75  * Reclaim (G/C) the staged entries i.e. release any memory associated
76  with the deleted keys.  The reference must be the value returned by the
77  call to `thmap_stage_gc`.
78  * This function must be called **after** the synchronisation barrier which
79  guarantees that there are no active readers referencing the staged entries.
80
81If the map is created using the `THMAP_SETROOT` flag, then the following
82functions are applicable:
83
84* `void thmap_setroot(thmap_t *thmap, uintptr_t root_offset)`
85  * Set the root node.  The address must be relative to the base address,
86  as if allocated by the `thmap_ops_t::alloc` routine.  Return 0 on success
87  and -1 on failure (if already set).
88
89* `uintptr_t thmap_getroot(const thmap_t *thmap)`
90  * Get the root node address.  The returned address will be relative to
91  the base address.
92
93The `thmap_ops_t` structure has the following members:
94* `uintptr_t (*alloc)(size_t len)`
95  * Function to allocate the memory.  Must return an address to the
96  memory area of the size `len`.  The address must be relative to the
97  base address specified during map creation and must be word-aligned.
98* `void (*free)(uintptr_t addr, size_t len)`
99  * Function to release the memory.  Must take a previously allocated
100  address (relative to the base) and release the memory area.  The `len`
101  is guaranteed to match the original allocation length.
102
103## Notes
104
105Internally, offsets from the base pointer are used to organise the access
106to the data structure.  This allows user to store the data structure in the
107shared memory, using the allocation/free functions.  The keys will also be
108copied using the custom functions; if `THMAP_NOCOPY` is set, then the keys
109must belong to the same shared memory object.
110
111The implementation was extensively tested on a 24-core x86 machine,
112see [the stress test](src/t_stress.c) for the details on the technique.
113
114## Caveats
115
116* The implementation uses pointer tagging and atomic operations.  This
117requires the base address and the allocations to provide at least word
118alignment.
119
120* While the `NULL` values may be inserted, `thmap_get` and `thmap_del`
121cannot indicate whether the key was not found or a key with a NULL value
122was found.  If the caller needs to indicate an "empty" value, it can use a
123special pointer value, such as `(void *)(uintptr_t)0x1`.
124
125## Performance
126
127The library has been benchmarked using different key profiles (8 to 256
128bytes), set sizes (hundreds, thousands, millions) and ratio between readers
129and writers (from 60:40 to 90:10).  In all cases it demonstrated nearly
130linear scalability (up to the number of cores).  Here is an example result
131when matched with the C++ libcuckoo library:
132
133![](misc/thmap_lookup_80_64bit_keys_intel_4980hq.svg)
134
135Disclaimer: benchmark results, however, depend on many aspects (workload,
136hardware characteristics, methodology, etc).  Ultimately, readers are
137encouraged to perform their own benchmarks.
138
139## Example
140
141Simple case backed by _malloc(3)_, which could be used in multi-threaded
142environment:
143```c
144thmap_t *kvmap;
145struct obj *obj;
146
147kvmap = thmap_create(0, NULL);
148assert(kvmap != NULL);
149...
150obj = obj_create();
151thmap_put(kvmap, "test", sizeof("test") - 1, obj);
152...
153obj = thmap_get(kvmap, "test", sizeof("test") - 1);
154...
155thmap_destroy(kvmap);
156```
157
158## Packages
159
160Just build the package, install it and link the library using the
161`-lthmap` flag.
162* RPM (tested on RHEL/CentOS 7): `cd pkg && make rpm`
163* DEB (tested on Debian 9): `cd pkg && make deb`
164