README.md
1[![CI][1]][0]
2[![Security Audit][2]][0]
3[![Coverage][3]][4]
4[![LoC][5]][0]
5[![Docs.rs][6]][7]
6[![Crates.io][8]][9]
7
8A multithreaded and single threaded string interner that allows strings to be cached with a minimal memory footprint,
9associating them with a unique [key] that can be used to retrieve them at any time. A [`Rodeo`] allows `O(1)`
10internment and resolution and can be turned into a [`RodeoReader`] to allow for contention-free resolutions
11with both key to str and str to key operations. It can also be turned into a [`RodeoResolver`] with only
12key to str operations for the lowest possible memory usage.
13
14## Which interner do I use?
15
16For single-threaded workloads [`Rodeo`] is encouraged, while multi-threaded applications should use [`ThreadedRodeo`].
17Both of these are the only way to intern strings, but most applications will hit a stage where they are done interning
18strings, and at that point is where the choice between [`RodeoReader`] and [`RodeoResolver`]. If the user needs to get
19keys for strings still, then they must use the [`RodeoReader`] (although they can still transfer into a [`RodeoResolver`])
20at this point. For users who just need key to string resolution, the [`RodeoResolver`] gives contention-free access at the
21minimum possible memory usage. Note that to gain access to [`ThreadedRodeo`] the `multi-threaded` feature is required.
22
23| Interner | Thread-safe | Intern String | str to key | key to str | Contention Free | Memory Usage |
24|-------------------|:-----------:|:-------------:|:----------:|:----------:|:---------------:|:------------:|
25| [`Rodeo`] | ❌ | ✅ | ✅ | ✅ | N/A | Medium |
26| [`ThreadedRodeo`] | ✅ | ✅ | ✅ | ✅ | ❌ | Most |
27| [`RodeoReader`] | ✅ | ❌ | ✅ | ✅ | ✅ | Medium |
28| [`RodeoResolver`] | ✅ | ❌ | ❌ | ✅ | ✅ | Least |
29
30## Cargo Features
31
32By default `lasso` has one dependency, `hashbrown`, and only [`Rodeo`] is exposed. Hashbrown is used since the [`raw_entry` api] is currently unstable in the standard library's hashmap.
33The raw hashmap API is used for custom hashing within the hashmaps, which works to dramatically reduce memory usage
34To make use of [`ThreadedRodeo`], you must enable the `multi-threaded` feature.
35
36* `multi-threaded` - Enables [`ThreadedRodeo`], the interner for multi-threaded tasks
37* `ahasher` - Use [`ahash`]'s `RandomState` as the default hasher
38* `no-std` - Enables `no_std` + `alloc` support for [`Rodeo`] and [`ThreadedRodeo`]
39 * Automatically enables the following required features:
40 * `ahasher` - `no_std` hashing function
41* `serialize` - Implements `Serialize` and `Deserialize` for all `Spur` types and all interners
42* `inline-more` - Annotate external apis with `#[inline]`
43
44## Example: Using Rodeo
45
46```rust
47use lasso::Rodeo;
48
49let mut rodeo = Rodeo::default();
50let key = rodeo.get_or_intern("Hello, world!");
51
52// Easily retrieve the value of a key and find the key for values
53assert_eq!("Hello, world!", rodeo.resolve(&key));
54assert_eq!(Some(key), rodeo.get("Hello, world!"));
55
56// Interning the same string again will yield the same key
57let key2 = rodeo.get_or_intern("Hello, world!");
58
59assert_eq!(key, key2);
60```
61
62## Example: Using ThreadedRodeo
63
64```rust
65use lasso::ThreadedRodeo;
66use std::{thread, sync::Arc};
67
68let rodeo = Arc::new(ThreadedRodeo::default());
69let key = rodeo.get_or_intern("Hello, world!");
70
71// Easily retrieve the value of a key and find the key for values
72assert_eq!("Hello, world!", rodeo.resolve(&key));
73assert_eq!(Some(key), rodeo.get("Hello, world!"));
74
75// Interning the same string again will yield the same key
76let key2 = rodeo.get_or_intern("Hello, world!");
77
78assert_eq!(key, key2);
79
80// ThreadedRodeo can be shared across threads
81let moved = Arc::clone(&rodeo);
82let hello = thread::spawn(move || {
83 assert_eq!("Hello, world!", moved.resolve(&key));
84 moved.get_or_intern("Hello from the thread!")
85})
86.join()
87.unwrap();
88
89assert_eq!("Hello, world!", rodeo.resolve(&key));
90assert_eq!("Hello from the thread!", rodeo.resolve(&hello));
91```
92
93## Example: Creating a RodeoReader
94
95```rust
96use lasso::Rodeo;
97
98// Rodeo and ThreadedRodeo are interchangeable here
99let mut rodeo = Rodeo::default();
100
101let key = rodeo.get_or_intern("Hello, world!");
102assert_eq!("Hello, world!", rodeo.resolve(&key));
103
104let reader = rodeo.into_reader();
105
106// Reader keeps all the strings from the parent
107assert_eq!("Hello, world!", reader.resolve(&key));
108assert_eq!(Some(key), reader.get("Hello, world!"));
109
110// The Reader can now be shared across threads, no matter what kind of Rodeo created it
111```
112
113## Example: Creating a RodeoResolver
114
115```rust
116use lasso::Rodeo;
117
118// Rodeo and ThreadedRodeo are interchangeable here
119let mut rodeo = Rodeo::default();
120
121let key = rodeo.get_or_intern("Hello, world!");
122assert_eq!("Hello, world!", rodeo.resolve(&key));
123
124let resolver = rodeo.into_resolver();
125
126// Resolver keeps all the strings from the parent
127assert_eq!("Hello, world!", resolver.resolve(&key));
128
129// The Resolver can now be shared across threads, no matter what kind of Rodeo created it
130```
131
132## Example: Making a custom-ranged key
133
134Sometimes you want your keys to only inhabit (or *not* inhabit) a certain range of values so that you can have custom [niches].
135This allows you to pack more data into what would otherwise be unused space, which can be critical for memory-sensitive applications.
136
137```rust
138use lasso::{Key, Rodeo};
139
140// First make our key type, this will be what we use as handles into our interner
141#[derive(Copy, Clone, PartialEq, Eq)]
142struct NicheKey(u32);
143
144// This will reserve the upper 255 values for us to use as niches
145const NICHE: usize = 0xFF000000;
146
147// Implementing `Key` is unsafe and requires that anything given to `try_from_usize` must produce the
148// same `usize` when `into_usize` is later called
149unsafe impl Key for NicheKey {
150 fn into_usize(self) -> usize {
151 self.0 as usize
152 }
153
154 fn try_from_usize(int: usize) -> Option<Self> {
155 if int < NICHE {
156 // The value isn't in our niche range, so we're good to go
157 Some(Self(int as u32))
158 } else {
159 // The value interferes with our niche, so we return `None`
160 None
161 }
162 }
163}
164
165// To make sure we're upholding `Key`'s safety contract, let's make two small tests
166#[test]
167fn value_in_range() {
168 let key = NicheKey::try_from_usize(0).unwrap();
169 assert_eq!(key.into_usize(), 0);
170
171 let key = NicheKey::try_from_usize(NICHE - 1).unwrap();
172 assert_eq!(key.into_usize(), NICHE - 1);
173}
174
175#[test]
176fn value_out_of_range() {
177 let key = NicheKey::try_from_usize(NICHE);
178 assert!(key.is_none());
179
180 let key = NicheKey::try_from_usize(u32::max_value() as usize);
181 assert!(key.is_none());
182}
183
184// And now we're done and can make `Rodeo`s or `ThreadedRodeo`s that use our custom key!
185let mut rodeo: Rodeo<NicheKey> = Rodeo::new();
186let key = rodeo.get_or_intern("It works!");
187assert_eq!(rodeo.resolve(&key), "It works!");
188```
189
190## Example: Creation using `FromIterator`
191
192```rust
193use lasso::Rodeo;
194use core::iter::FromIterator;
195
196// Works for both `Rodeo` and `ThreadedRodeo`
197let rodeo = Rodeo::from_iter(vec![
198 "one string",
199 "two string",
200 "red string",
201 "blue string",
202]);
203
204assert!(rodeo.contains("one string"));
205assert!(rodeo.contains("two string"));
206assert!(rodeo.contains("red string"));
207assert!(rodeo.contains("blue string"));
208```
209
210```rust
211use lasso::Rodeo;
212use core::iter::FromIterator;
213
214// Works for both `Rodeo` and `ThreadedRodeo`
215let rodeo: Rodeo = vec!["one string", "two string", "red string", "blue string"]
216 .into_iter()
217 .collect();
218
219assert!(rodeo.contains("one string"));
220assert!(rodeo.contains("two string"));
221assert!(rodeo.contains("red string"));
222assert!(rodeo.contains("blue string"));
223```
224
225## Benchmarks
226
227Benchmarks were gathered with [Criterion.rs](https://github.com/bheisler/criterion.rs)
228OS: Windows 10
229CPU: Ryzen 9 3900X at 3800Mhz
230RAM: 3200Mhz
231Rustc: Stable 1.44.1
232
233### Rodeo
234
235#### STD RandomState
236
237| Method | Time | Throughput |
238|:-----------------------------|:---------:|:------------:|
239| `resolve` | 1.9251 μs | 13.285 GiB/s |
240| `try_resolve` | 1.9214 μs | 13.311 GiB/s |
241| `resolve_unchecked` | 1.4356 μs | 17.816 GiB/s |
242| `get_or_intern` (empty) | 60.350 μs | 433.96 MiB/s |
243| `get_or_intern` (filled) | 57.415 μs | 456.15 MiB/s |
244| `try_get_or_intern` (empty) | 58.978 μs | 444.06 MiB/s |
245| `try_get_or_intern (filled)` | 57.421 μs | 456.10 MiB/s |
246| `get` (empty) | 37.288 μs | 702.37 MiB/s |
247| `get` (filled) | 55.095 μs | 475.36 MiB/s |
248
249#### AHash
250
251| Method | Time | Throughput |
252|:-----------------------------|:---------:|:------------:|
253| `try_resolve` | 1.9282 μs | 13.264 GiB/s |
254| `resolve` | 1.9404 μs | 13.181 GiB/s |
255| `resolve_unchecked` | 1.4328 μs | 17.851 GiB/s |
256| `get_or_intern` (empty) | 38.029 μs | 688.68 MiB/s |
257| `get_or_intern` (filled) | 33.650 μs | 778.30 MiB/s |
258| `try_get_or_intern` (empty) | 39.392 μs | 664.84 MiB/s |
259| `try_get_or_intern (filled)` | 33.435 μs | 783.31 MiB/s |
260| `get` (empty) | 12.565 μs | 2.0356 GiB/s |
261| `get` (filled) | 26.545 μs | 986.61 MiB/s |
262
263#### FXHash
264
265| Method | Time | Throughput |
266|:-----------------------------|:---------:|:------------:|
267| `resolve` | 1.9014 μs | 13.451 GiB/s |
268| `try_resolve` | 1.9278 μs | 13.267 GiB/s |
269| `resolve_unchecked` | 1.4449 μs | 17.701 GiB/s |
270| `get_or_intern` (empty) | 32.523 μs | 805.27 MiB/s |
271| `get_or_intern` (filled) | 30.281 μs | 864.88 MiB/s |
272| `try_get_or_intern` (empty) | 31.630 μs | 828.00 MiB/s |
273| `try_get_or_intern (filled)` | 31.002 μs | 844.78 MiB/s |
274| `get` (empty) | 12.699 μs | 2.0141 GiB/s |
275| `get` (filled) | 29.220 μs | 896.28 MiB/s |
276
277
278### ThreadedRodeo
279
280#### STD RandomState
281
282| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
283|:-----------------------------|:---------------:|:---------------------:|:-----------------:|:-----------------------:|
284| `resolve` | 54.336 μs | 482.00 MiB/s | 364.27 μs | 71.897 MiB/s |
285| `try_resolve` | 54.582 μs | 479.82 MiB/s | 352.67 μs | 74.261 MiB/s |
286| `get_or_intern` (empty) | 266.03 μs | 98.447 MiB/s | N\A | N\A |
287| `get_or_intern` (filled) | 103.04 μs | 254.17 MiB/s | 441.42 μs | 59.331 MiB/s |
288| `try_get_or_intern` (empty) | 261.80 μs | 100.04 MiB/s | N\A | N\A |
289| `try_get_or_intern` (filled) | 102.61 μs | 255.25 MiB/s | 447.42 μs | 58.535 MiB/s |
290| `get` (empty) | 80.346 μs | 325.96 MiB/s | N\A | N\A |
291| `get` (filled) | 92.669 μs | 282.62 MiB/s | 439.24 μs | 59.626 MiB/s |
292
293#### AHash
294
295| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
296|:-----------------------------|:---------------:|:---------------------:|:-----------------:|:-----------------------:|
297| `resolve` | 22.261 μs | 1.1489 GiB/s | 265.46 μs | 98.658 MiB/s |
298| `try_resolve` | 22.378 μs | 1.1429 GiB/s | 268.58 μs | 97.513 MiB/s |
299| `get_or_intern` (empty) | 157.86 μs | 165.91 MiB/s | N\A | N\A |
300| `get_or_intern` (filled) | 56.320 μs | 465.02 MiB/s | 357.13 μs | 73.335 MiB/s |
301| `try_get_or_intern` (empty) | 161.46 μs | 162.21 MiB/s | N\A | N\A |
302| `try_get_or_intern` (filled) | 55.874 μs | 468.73 MiB/s | 360.25 μs | 72.698 MiB/s |
303| `get` (empty) | 43.520 μs | 601.79 MiB/s | N\A | N\A |
304| `get` (filled) | 53.720 μs | 487.52 MiB/s | 360.66 μs | 72.616 MiB/s |
305
306#### FXHash
307
308| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
309|:-----------------------------|:---------------:|:---------------------:|:-----------------:|:-----------------------:|
310| `try_resolve` | 17.289 μs | 1.4794 GiB/s | 238.29 μs | 109.91 MiB/s |
311| `resolve` | 19.833 μs | 1.2896 GiB/s | 237.05 μs | 110.48 MiB/s |
312| `get_or_intern` (empty) | 130.97 μs | 199.97 MiB/s | N\A | N\A |
313| `get_or_intern` (filled) | 42.630 μs | 614.35 MiB/s | 301.60 μs | 86.837 MiB/s |
314| `try_get_or_intern` (empty) | 129.30 μs | 202.55 MiB/s | N\A | N\A |
315| `try_get_or_intern` (filled) | 42.508 μs | 616.12 MiB/s | 337.29 μs | 77.648 MiB/s |
316| `get` (empty) | 28.001 μs | 935.30 MiB/s | N\A | N\A |
317| `get` (filled) | 37.700 μs | 694.68 MiB/s | 292.15 μs | 89.645 MiB/s |
318
319### RodeoReader
320
321#### STD RandomState
322
323| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
324|:--------------------|:---------------:|:---------------------:|:-----------------:|:------------------------:|
325| `resolve` | 1.9398 μs | 13.185 GiB/s | 4.3153 μs | 5.9269 GiB/s |
326| `try_resolve` | 1.9315 μs | 13.242 GiB/s | 4.1956 μs | 6.0959 GiB/s |
327| `resolve_unchecked` | 1.4416 μs | 17.741 GiB/s | 3.1204 μs | 8.1964 GiB/s |
328| `get` (empty) | 38.886 μs | 673.50 MiB/s | N\A | N\A |
329| `get` (filled) | 56.271 μs | 465.42 MiB/s | 105.12 μs | 249.14 MiB/s |
330
331#### AHash
332
333| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
334|:--------------------|:---------------:|:---------------------:|:-----------------:|:-----------------------:|
335| `resolve` | 1.9404 μs | 13.181 GiB/s | 4.1881 μs | 6.1069 GiB/s |
336| `try_resolve` | 1.8932 μs | 13.509 GiB/s | 4.2410 μs | 6.0306 GiB/s |
337| `resolve_unchecked` | 1.4128 μs | 18.103 GiB/s | 3.1691 μs | 8.0703 GiB/s |
338| `get` (empty) | 11.952 μs | 2.1399 GiB/s | N\A | N\A |
339| `get` (filled) | 27.093 μs | 966.65 MiB/s | 56.269 μs | 465.44 MiB/s |
340
341#### FXHash
342
343| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
344|:--------------------|:---------------:|:---------------------:|:-----------------:|:-----------------------:|
345| `resolve` | 1.8987 μs | 13.471 GiB/s | 4.2117 μs | 6.0727 GiB/s |
346| `try_resolve` | 1.9103 μs | 13.389 GiB/s | 4.2254 μs | 6.0529 GiB/s |
347| `resolve_unchecked` | 1.4469 μs | 17.677 GiB/s | 3.0923 μs | 8.2709 GiB/s |
348| `get` (empty) | 12.994 μs | 1.9682 GiB/s | N\A | N\A |
349| `get` (filled) | 29.745 μs | 880.49 MiB/s | 52.387 μs | 499.93 MiB/s |
350
351### RodeoResolver
352
353| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
354|:--------------------|:---------------:|:---------------------:|:-----------------:|:-----------------------:|
355| `resolve` | 1.9416 μs | 13.172 GiB/s | 3.9114 μs | 6.5387 GiB/s |
356| `try_resolve` | 1.9264 μs | 13.277 GiB/s | 3.9289 μs | 6.5097 GiB/s |
357| `resolve_unchecked` | 1.6638 μs | 15.372 GiB/s | 3.1741 μs | 8.0578 GiB/s |
358
359[0]: https://github.com/Kixiron/lasso
360[1]: https://github.com/Kixiron/lasso/workflows/CI/badge.svg
361[2]: https://github.com/Kixiron/lasso/workflows/Security%20Audit/badge.svg
362[3]: https://coveralls.io/repos/github/Kixiron/lasso/badge.svg?branch=master
363[4]: https://coveralls.io/github/Kixiron/lasso?branch=master
364[5]: https://tokei.rs/b1/github/Kixiron/lasso
365[6]: https://docs.rs/lasso/badge.svg
366[7]: https://docs.rs/lasso
367[8]: https://img.shields.io/crates/v/lasso.svg
368[9]: https://crates.io/crates/lasso
369[key]: crate::Key
370[`Rodeo`]: crate::Rodeo
371[`ThreadedRodeo`]: crate::ThreadedRodeo
372[`RodeoResolver`]: crate::RodeoResolver
373[`RodeoReader`]: crate::RodeoReader
374[`hashbrown`]: https://crates.io/crates/hashbrown
375[`ahash`]: https://crates.io/crates/ahash
376[`raw_entry` api]: https://github.com/rust-lang/rust/issues/56167
377[niches]: https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#niche
378