• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

cargo-crates/H03-May-2022-1,914,9861,517,873

data/store/H27-Jun-2020-

examples/config/H27-Jun-2020-1512

scripts/H27-Jun-2020-7647

src/H27-Jun-2020-29,60426,878

tests/integration/H27-Jun-2020-11676

tools/cross/cross-x86_64-unknown-linux-gnu/H27-Jun-2020-64

.dockerignoreH A D27-Jun-20208 21

.gitignoreH A D27-Jun-202066 97

.travis.ymlH A D27-Jun-2020273 2115

CHANGELOG.mdH A D27-Jun-202011.6 KiB17898

CODE_OF_CONDUCT.mdH A D27-Jun-20203.3 KiB7757

CONFIGURATION.mdH A D27-Jun-20205.7 KiB8151

CONTRIBUTING.mdH A D27-Jun-20201.7 KiB5734

Cargo.lockH A D27-Jun-202031.2 KiB703623

Cargo.tomlH A D27-Jun-20201.6 KiB7262

Cross.tomlH A D27-Jun-2020147 86

DockerfileH A D27-Jun-2020346 1810

INNER_WORKINGS.mdH A D27-Jun-202018.5 KiB11969

LICENSE.mdH A D27-Jun-202015.2 KiB367283

PACKAGING.mdH A D27-Jun-2020982 2417

PROTOCOL.mdH A D27-Jun-202011.1 KiB207158

README.mdH A D27-Jun-202019.3 KiB335232

config.cfgH A D27-Jun-2020867 6739

README.md

1Sonic
2=====
3
4[![Build Status](https://travis-ci.org/valeriansaliou/sonic.svg?branch=master)](https://travis-ci.org/valeriansaliou/sonic) [![Dependency Status](https://deps.rs/repo/github/valeriansaliou/sonic/status.svg)](https://deps.rs/repo/github/valeriansaliou/sonic) [![Buy Me A Coffee](https://img.shields.io/badge/buy%20me%20a%20coffee-donate-yellow.svg)](https://www.buymeacoffee.com/valeriansaliou)
5
6**Sonic is a fast, lightweight and schema-less search backend. It ingests search texts and identifier tuples that can then be queried against in a microsecond's time.**
7
8Sonic can be used as a simple alternative to super-heavy and full-featured search backends such as Elasticsearch in some use-cases. It is capable of normalizing natural language search queries, auto-completing a search query and providing the most relevant results for a query. Sonic is an identifier index, rather than a document index; when queried, it returns IDs that can then be used to refer to the matched documents in an external database.
9
10A strong attention to performance and code cleanliness has been given when designing Sonic. It aims at being crash-free, super-fast and puts minimum strain on server resources (our measurements have shown that Sonic - when under load - responds to search queries in the μs range, eats ~30MB RAM and has a low CPU footprint; [see our benchmarks](https://github.com/valeriansaliou/sonic#how-fast--lightweight-is-it)).
11
12_Tested at Rust version: `rustc 1.44.1 (c7087fe00 2020-06-17)`_
13
14**���� Crafted in Nantes, France.**
15
16**:newspaper: The Sonic project was initially announced in [a post on my personal journal](https://journal.valeriansaliou.name/announcing-sonic-a-super-light-alternative-to-elasticsearch/).**
17
18![Sonic](https://valeriansaliou.github.io/sonic/images/banner.jpg)
19
20> _« Sonic » is the mascot of the Sonic project. I drew it to look like a psychedelic hipster hedgehog._
21
22## Who uses it?
23
24<table>
25<tr>
26<td align="center"><a href="https://crisp.chat/"><img src="https://valeriansaliou.github.io/sonic/images/logo-crisp.png" height="64" /></a></td>
27<td align="center"><a href="https://scrumpy.io/"><img src="https://valeriansaliou.github.io/sonic/images/logo-scrumpy.png" height="64" /></a></td>
28</tr>
29<tr>
30<td align="center">Crisp</td>
31<td align="center">Scrumpy</td>
32</tr>
33</table>
34
35_�� You use Sonic and you want to be listed there? [Contact me](https://valeriansaliou.name/)._
36
37## Demo
38
39Sonic is integrated in all Crisp search products on the [Crisp](https://crisp.chat/) platform. It is used to index half a billion objects on a $5/mth 1-vCPU SSD cloud server (as of 2019). Crisp users use it to search in their messages, conversations, contacts, helpdesk articles and more.
40
41**You can test Sonic live on: [Crisp Helpdesk](https://help.crisp.chat/), and get an idea of the speed and relevance of Sonic search results. You can also test search suggestions from there: start typing at least 2 characters for a word, and get suggested a full word (press the tab key to expand suggestion). _Both search and suggestions are powered by Sonic._**
42
43![Demo on Crisp Helpdesk search](https://valeriansaliou.github.io/sonic/images/crisp-search-demo.gif)
44
45> _Sonic fuzzy search in helpdesk articles at its best. Lookup for any word or group of terms, get results instantly._
46
47## Features
48
49* **Search terms are stored in collections, organized in buckets**; you may use a single bucket, or a bucket per user on your platform if you need to search in separate indexes.
50* **Search results return object identifiers**, that can be resolved from an external database if you need to enrich the search results. This makes Sonic a simple word index, that points to identifier results. Sonic doesn't store any direct textual data in its index, but it still holds a word graph for auto-completion and typo corrections.
51* **Search query typos are corrected** if there are not enough exact-match results for a given word in a search query, Sonic tries to correct the word and tries against alternate words. You're allowed to make mistakes when searching.
52* **Insert and remove items in the index**; index-altering operations are light and can be committed to the server while it is running. A background tasker handles the job of consolidating the index so that the entries you have pushed or popped are quickly made available for search.
53* **Auto-complete any word** in real-time via the suggest operation. This helps build a snappy word suggestion feature in your end-user search interface.
54* **Full Unicode compatibility** on 80+ most spoken languages in the world. Sonic removes useless stop words from any text (eg. 'the' in English), after guessing the text language. This ensures any searched or ingested text is clean before it hits the index; [see languages](https://github.com/valeriansaliou/sonic#which-text-languages-are-supported).
55* **Simple protocol (Sonic Channel)**, that let you search your index, manage data ingestion (push in the index, pop from the index, flush a collection, flush a bucket, etc.) and perform administrative actions. Sonic Channel was designed to be lightweight on resources and simple to integrate with; [read protocol specification](https://github.com/valeriansaliou/sonic/blob/master/PROTOCOL.md).
56* **Easy-to-use libraries**, that let you connect to Sonic from your apps; [see libraries](https://github.com/valeriansaliou/sonic#-sonic-channel-libraries).
57
58## How to use it?
59
60### Installation
61
62Sonic is built in Rust. To install it, either download a version from the [Sonic releases](https://github.com/valeriansaliou/sonic/releases) page, use `cargo install` or pull the source code from `master`.
63
64**�� Install from source:**
65
66If you pulled the source code from Git, you can build it using `cargo`:
67
68```bash
69cargo build --release
70```
71
72You can find the built binaries in the `./target/release` directory.
73
74_Install `clang`, `clang-dev`, `g++` and `llvm-dev` to be able to compile the required RocksDB dependency._
75
76**�� Install from Cargo:**
77
78You can install Sonic directly with `cargo install`:
79
80```bash
81cargo install sonic-server
82```
83
84Ensure that your `$PATH` is properly configured to source the Crates binaries, and then run Sonic using the `sonic` command.
85
86**�� Install from Docker Hub:**
87
88You might find it convenient to run Sonic via Docker. You can find the pre-built Sonic image on Docker Hub as [valeriansaliou/sonic](https://hub.docker.com/r/valeriansaliou/sonic/).
89
90First, pull the `valeriansaliou/sonic` image:
91
92```bash
93docker pull valeriansaliou/sonic:v1.3.0
94```
95
96Then, seed it a configuration file and run it (replace `/path/to/your/sonic/config.cfg` with the path to your configuration file):
97
98```bash
99docker run -p 1491:1491 -v /path/to/your/sonic/config.cfg:/etc/sonic.cfg -v /path/to/your/sonic/store/:/var/lib/sonic/store/ valeriansaliou/sonic:v1.3.0
100```
101
102In the configuration file, ensure that:
103
104* `channel.inet` is set to `0.0.0.0:1491` (this lets Sonic be reached from outside the container)
105* `store.kv.path` is set to `/var/lib/sonic/store/kv/` (this lets the external KV store directory be reached by Sonic)
106* `store.fst.path` is set to `/var/lib/sonic/store/fst/` (this lets the external FST store directory be reached by Sonic)
107
108Sonic will be reachable from `tcp://localhost:1491`.
109
110**�� Install from another source (non-official):**
111
112Other installation sources are available:
113
114* **Homebrew (macOS)**: `brew install sonic` ([see formula](https://formulae.brew.sh/formula/sonic))
115
116_Note that those sources are non-official, meaning that they are not owned nor maintained by the Sonic project owners. The latest Sonic version available on those sources might be outdated, in comparison to the latest version available through the Sonic project._
117
118### Configuration
119
120Use the sample [config.cfg](https://github.com/valeriansaliou/sonic/blob/master/config.cfg) configuration file and adjust it to your own environment.
121
122_If you are looking to fine-tune your configuration, you may read our [detailed configuration documentation](https://github.com/valeriansaliou/sonic/blob/master/CONFIGURATION.md)._
123
124### Run Sonic
125
126Sonic can be run as such:
127
128`./sonic -c /path/to/config.cfg`
129
130## Perform searches and manage objects
131
132Both searches and object management (i.e. data ingestion) is handled via the Sonic Channel protocol only. As we want to keep things simple with Sonic (similarly to how Redis does it), Sonic does not offer a HTTP endpoint or similar; connecting via Sonic Channel is the way to go when you need to interact with the Sonic search database.
133
134Sonic distributes official libraries, that let you integrate Sonic to your apps easily. Click on a library below to see library integration documentation and code.
135
136_If you are looking for details on the raw Sonic Channel TCP-based protocol, you can read our [detailed protocol documentation](https://github.com/valeriansaliou/sonic/blob/master/PROTOCOL.md). It can prove handy if you are looking to code your own Sonic Channel library._
137
138### �� Sonic Channel Libraries
139
140#### 1️⃣ Official Libraries
141
142Sonic distributes official Sonic integration libraries for your programming language (official means that those libraries have been reviewed and validated by a core maintainer):
143
144* **NodeJS**:
145  * **[node-sonic-channel](https://www.npmjs.com/package/sonic-channel)** by [@valeriansaliou](https://github.com/valeriansaliou)
146* **PHP**:
147  * **[psonic](https://github.com/ppshobi/psonic)** by [@ppshobi](https://github.com/ppshobi)
148
149#### 2️⃣ Community Libraries
150
151You can find below a list of Sonic integrations provided by the community (many thanks to them!):
152
153* **Rust**:
154  * **[sonic_client](https://github.com/FrontMage/sonic_client)** by [@FrontMage](https://github.com/FrontMage)
155* **Python**:
156  * **[asonic](https://github.com/moshe/asonic)** by [@moshe](https://github.com/moshe)
157  * **[python-sonic-client](https://github.com/xmonader/python-sonic-client)** by [@xmonader](https://github.com/xmonader)
158* **Ruby**:
159  * **[sonic-ruby](https://github.com/atipugin/sonic-ruby)** by [@atipugin](https://github.com/atipugin)
160* **Go**:
161  * **[go-sonic](https://github.com/expectedsh/go-sonic)** by [@alexisvisco](https://github.com/alexisvisco)
162  * **[go-sonic](https://github.com/OGKevin/go-sonic)** by [@OGKevin](https://github.com/OGKevin)
163* **PHP**:
164  * **[php-sonic](https://github.com/php-sonic/php-sonic)** by [@touhonoob](https://github.com/touhonoob)
165  * **[laravel-scout-sonic](https://github.com/james2doyle/laravel-scout-sonic)** by [@james2doyle](https://github.com/james2doyle)
166* **Java**:
167  * **[java-sonic](https://github.com/twohou/java-sonic)** by [@touhonoob](https://github.com/touhonoob)
168  * **[jsonic](https://github.com/alohaking/jsonic)** by [@alohaking](https://github.com/alohaking)
169* **Elixir**:
170  * **[sonix](https://github.com/imerkle/sonix)** by [@imerkle](https://github.com/imerkle)
171* **Crystal**:
172  * **[sonic-crystal](https://github.com/babelian/sonic-crystal)** by [@babelian](https://github.com/babelian)
173* **Nim**:
174  * **[nim-sonic-client](https://github.com/xmonader/nim-sonic-client)** by [@xmonader](https://github.com/xmonader)
175* **.NET**:
176  * **[nsonic](https://github.com/spikensbror-dotnet/nsonic)** by [@spikensbror](https://github.com/spikensbror)
177
178_ℹ️ Cannot find the library for your programming language? Build your own and be referenced here! ([contact me](https://valeriansaliou.name/))_
179
180## Which text languages are supported?
181
182Sonic supports a wide range of languages in its lexing system. If a language is not in this list, you will still be able to push this language to the search index, but stop-words will not be eluded, which could lead to lower-quality search results.
183
184**The languages supported by the lexing system are:**
185
186* ���� Afrikaans
187* ���� Arabic
188* ���� Azerbaijani
189* ���� Bengali
190* ���� Bulgarian
191* ���� Burmese
192* ���� Chinese (Simplified)
193* ���� Chinese (Traditional)
194* ���� Croatian
195* ���� Czech
196* ���� Danish
197* ���� Dutch
198* ���� English
199* �� Esperanto
200* ���� Estonian
201* ���� Finnish
202* ���� French
203* ���� German
204* ���� Greek
205* ���� Hausa
206* ���� Hebrew
207* ���� Hindi
208* ���� Hungarian
209* ���� Indonesian
210* ���� Italian
211* ���� Japanese
212* ���� Kannada
213* ���� Khmer
214* ���� Korean
215* �� Kurdish
216* �� Latin
217* ���� Latvian
218* ���� Lithuanian
219* ���� Marathi
220* ���� Nepali
221* ���� Persian
222* ���� Polish
223* ���� Portuguese
224* ���� Punjabi
225* ���� Russian
226* ���� Slovak
227* ���� Slovene
228* ���� Somali
229* ���� Spanish
230* ���� Swedish
231* ���� Tagalog
232* ���� Tamil
233* ���� Thai
234* ���� Turkish
235* ���� Ukrainian
236* ���� Urdu
237* ���� Vietnamese
238* ���� Yiddish
239* ���� Yoruba
240* ���� Zulu
241
242## How fast & lightweight is it?
243
244Sonic was built for [Crisp](https://crisp.chat/) from the start. As Crisp was growing and indexing more and more search data into a full-text search SQL database, we decided it was time to switch to a proper search backend system. When reviewing Elasticsearch (ELS) and others, we found those were full-featured heavyweight systems that did not scale well with Crisp's freemium-based cost structure.
245
246At the end, we decided to build our own search backend, designed to be simple and lightweight on resources.
247
248You can run function-level benchmarks with the command: `cargo bench --features benchmark`
249
250### ��‍�� Benchmark #1
251
252#### ➡️ Scenario
253
254We performed an extract of all messages from the Crisp team used for [Crisp](https://crisp.chat/) own customer support.
255
256We want to import all those messages into a clean Sonic instance, and then perform searches on the index we built. We will measure the time that Sonic spent executing each operation (ie. each `PUSH` and `QUERY` commands over Sonic Channel), and group results per 1,000 operations (this outputs a mean time per 1,000 operations).
257
258#### ➡️ Context
259
260**Our benchmark is ran on the following computer:**
261
262* **Device**: MacBook Pro (Retina, 15-inch, Mid 2014)
263* **OS**: MacOS 10.14.3
264* **Disk**: 512GB SSD (formatted under the AFS file system)
265* **CPU**: 2.5 GHz Intel Core i7
266* **RAM**: 16 GB 1600 MHz DDR3
267
268**Sonic is compiled as following:**
269
270* **Sonic version**: 1.0.1
271* **Rustc version**: `rustc 1.35.0-nightly (719b0d984 2019-03-13)`
272* **Compiler flags**: `release` profile (`-03` with `lto`)
273
274**Our dataset is as such:**
275
276* **Number of objects**: ~1,000,000 messages
277* **Total size**: ~100MB of raw message text (this does not account for identifiers and other metas)
278
279#### ➡️ Scripts
280
281**The scripts we used to perform the benchmark are:**
282
2831. **PUSH script**: [sonic-benchmark_batch-push.js](https://gist.github.com/valeriansaliou/e5ab737b28601ebd70483f904d21aa09)
2842. **QUERY script**: [sonic-benchmark_batch-query.js](https://gist.github.com/valeriansaliou/3ef8315d7282bd173c2cb9eba64fa739)
285
286#### ⏬ Results
287
288**Our findings:**
289
290* We imported ~1,000,000 messages of dynamic length (some very long, eg. emails);
291* Once imported, the search index weights 20MB (KV) + 1.4MB (FST) on disk;
292* CPU usage during import averaged 75% of a single CPU core;
293* RAM usage for the Sonic process peaked at 28MB during our benchmark;
294* We used a single Sonic Channel TCP connection, which limits the import to a single thread (we could have load-balanced this across as many Sonic Channel connections as there are CPUs);
295* We get an import RPS approaching 4,000 operations per second (per thread);
296* We get a search query RPS approaching 1,000 operations per second (per thread);
297* On the hyper-threaded 4-cores CPU used, we could have parallelized operations to 8 virtual cores, thus theoretically increasing the import RPS to 32,000 operations / second, while the search query RPS would be increased to 8,000 operations / second (we may be SSD-bound at some point though);
298
299**Compared results per operation (on a single object):**
300
301We took a sample of 8 results from our batched operations, which produced a total of 1,000 results (1,000,000 items, with 1,000 items batched per measurement report).
302
303_This is not very scientific, but it should give you a clear idea of Sonic performances._
304
305**Time spent per operation:**
306
307Operation | Average | Best  | Worst
308--------- | ------- | ----- | -----
309PUSH      | 275μs   | 190μs | 363μs
310QUERY     | 880μs   | 852μs | 1ms
311
312**Batch PUSH results as seen from our terminal (from initial index of: 0 objects):**
313
314![Batch PUSH benchmark](https://valeriansaliou.github.io/sonic/images/benchmark-batch-push.png)
315
316**Batch QUERY results as seen from our terminal (on index of: 1,000,000 objects):**
317
318![Batch QUERY benchmark](https://valeriansaliou.github.io/sonic/images/benchmark-batch-query.png)
319
320## Limitations
321
322* **Indexed data limits**: Sonic is designed for large search indexes split over thousands of search buckets per collection. An IID (ie. Internal-ID) is stored in the index as a 32 bits number, which theoretically allow up to ~4.2 billion objects to be indexed (ie. OID) per bucket. We've observed storage savings of 30% to 40%, which justifies the trade-off on large databases (versus Sonic using 64 bits IIDs). Also, Sonic only keeps the N most recently pushed results for a given word, in a sliding window way (the sliding window width can be configured).
323* **Search query limits**: Sonic Natural Language Processing system (NLP) does not work at the sentence-level, for storage compactness reasons (we keep the FST graph shallow as to reduce time and space complexity). It works at the word-level, and is thus able to search per-word and can predict a word based on user input, though it is unable to predict the next word in a sentence.
324* **Real-time limits**: the FST needs to be rebuilt every time a word is pushed or popped from the bucket graph. As this is quite heavy, Sonic batches rebuild cycles. If you have just pushed a new word to the index and you are not seeing it in the `SUGGEST` command yet, wait for the next rebuild cycle to kick-in, or force it with `TRIGGER consolidate` in a `control` channel.
325* **Interoperability limits**: The Sonic Channel protocol is the only way to read and write search entries to the Sonic search index. Sonic does not expose any HTTP API. Sonic Channel has been designed with performance and minimal network footprint in mind. If you need to access Sonic from an unsupported programming language, you can either [open an issue](https://github.com/valeriansaliou/sonic/issues/new) or look at the reference [node-sonic-channel](https://github.com/valeriansaliou/node-sonic-channel) implementation and build it in your target programming language.
326* **Hardware limits**: Sonic performs the search on the file-system directly; ie. it does not fit the index in RAM. A search query results in a lot of random accesses on the disk, which means that it will be quite slow on old-school HDDs and super-fast on newer SSDs. Do store the Sonic database on SSD-backed file systems only.
327
328## :fire: Report A Vulnerability
329
330If you find a vulnerability in Sonic, you are more than welcome to report it directly to [@valeriansaliou](https://github.com/valeriansaliou) by sending an encrypted email to [valerian@valeriansaliou.name](mailto:valerian@valeriansaliou.name). Do not report vulnerabilities in public GitHub issues, as they may be exploited by malicious people to target production servers running an unpatched Sonic instance.
331
332**:warning: You must encrypt your email using [@valeriansaliou](https://github.com/valeriansaliou) GPG public key: [:key:valeriansaliou.gpg.pub.asc](https://valeriansaliou.name/files/keys/valeriansaliou.gpg.pub.asc).**
333
334**:gift: Based on the severity of the vulnerability, I may offer a $100 (US) bounty to whomever reported it.**
335