README.md
1
2# What is Beansdb?
3
4Beansdb is a distributed key-value storage system designed for large scale
5online system, aiming for high avaliablility and easy management. It took
6the ideas from Amazon's Dynamo, then made some simplify to Keep It Simple
7Stupid (KISS).
8
9The clients write to N Beansdb node, then read from R of them (solving
10conflict). Data in different nodes is synced through hash tree, in cronjob.
11
12It conforms to memcache protocol (not fully supported, see below), so any
13memcached client can interactive with it without any modification.
14
15Beansdb is heavy used in http://www.douban.com/, is used to stored images,
16mp3, text fields and so on, see benchmark below.
17
18Any suggestion or feedback is welcomed.
19
20
21# Features
22
23* High availability data storage with multi readable and writable repications
24
25* Soft state and final consistency, synced with hash tree
26
27* Easy Scaling out without interrupting online service
28
29* High performance read/write for a key-value based object
30
31* Configurable availability/consistency by N,W,R
32
33* Memcache protocol compatibility
34
35## Supported memcache commands
36
37* get
38* set(with version support)
39* append
40* incr
41* delete
42* stats
43* flush_all
44
45## Private commands
46
47* get @xxx, list the content of hash tree, such as @0f
48* get ?xxx, get the meta data of key.
49
50# Python Example
51```
52from dbclient import Beansdb
53
54# three beansdb nodes on localhost
55BEANSDBCFG = {
56 "localhost:7901": range(16),
57 "localhost:7902": range(16),
58 "localhost:7903": range(16),
59}
60
61db = Beansdb(BEANSDBCFG, 16)
62
63db.set('hello', 'world')
64db.get('hello')
65db.delete('hello')
66```
67
68# Benchmark
69```
70 $ beansdb -d
71 $ memstorm -s localhost:7900 -n 1000000 -k 10 -l 100
72
73 ----
74 Num of Records : 1000000
75 Non-Blocking IO : 0
76 TCP No-Delay : 0
77
78 Successful [SET] : 1000000
79 Failed [SET] : 0
80 Total Time [SET] : 51.77594s
81 Average Time [SET] : 0.00005s
82
83 Successful [GET] : 1000000
84 Failed [GET] : 0
85 Total Time [GET] : 40.93667s
86 Average Time [GET] : 0.00004s
87```
88
89# Real performance in production
90
91* cluster 1: 1.1B records, 55TB data, 48 nodes, 1100 get/25 set per seconds,
92 med/avg/90%/99% time is 12/20/37/186 ms.
93* cluster 2: 3.3B records, 3.5TB data, 15 nodes, 1000 get/500 set per seconds,
94 med/avg/90%/99% time is 1/11/15/123 ms.
95
96