• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..02-Mar-2017-

LICENSEH A D02-Mar-201715.6 KiB355256

MakefileH A D02-Mar-2017378 1812

README.mdH A D02-Mar-20174.6 KiB9068

api.goH A D02-Mar-201733.1 KiB1,008601

commands.goH A D02-Mar-20173.5 KiB15271

commitment.goH A D02-Mar-20173.2 KiB10266

config.goH A D02-Mar-201711.5 KiB25990

configuration.goH A D02-Mar-201710.8 KiB344231

discard_snapshot.goH A D02-Mar-20171.2 KiB5032

file_snapshot.goH A D02-Mar-201712 KiB495343

fsm.goH A D02-Mar-20173.5 KiB13781

future.goH A D02-Mar-20176.7 KiB290178

inmem_snapshot.goH A D02-Mar-20172.5 KiB10778

inmem_store.goH A D02-Mar-20172.7 KiB12697

inmem_transport.goH A D02-Mar-20177.5 KiB323244

log.goH A D02-Mar-20172.2 KiB7324

log_cache.goH A D02-Mar-20171.7 KiB8053

membership.mdH A D02-Mar-20177 KiB8467

net_transport.goH A D02-Mar-201714.6 KiB623426

observer.goH A D02-Mar-20173.3 KiB11666

peersjson.goH A D02-Mar-20171.4 KiB4730

raft.goH A D02-Mar-201742.8 KiB1,457999

replication.goH A D02-Mar-201716.5 KiB562367

snapshot.goH A D02-Mar-20177.5 KiB240136

stable.goH A D02-Mar-2017443 167

state.goH A D02-Mar-20173.6 KiB168113

tcp_transport.goH A D02-Mar-20172.6 KiB10678

transport.goH A D02-Mar-20174.4 KiB12552

util.goH A D02-Mar-20172.9 KiB13497

README.md

1raft [![Build Status](https://travis-ci.org/hashicorp/raft.png)](https://travis-ci.org/hashicorp/raft)
2====
3
4raft is a [Go](http://www.golang.org) library that manages a replicated
5log and can be used with an FSM to manage replicated state machines. It
6is library for providing [consensus](http://en.wikipedia.org/wiki/Consensus_(computer_science)).
7
8The use cases for such a library are far-reaching as replicated state
9machines are a key component of many distributed systems. They enable
10building Consistent, Partition Tolerant (CP) systems, with limited
11fault tolerance as well.
12
13## Building
14
15If you wish to build raft you'll need Go version 1.2+ installed.
16
17Please check your installation with:
18
19```
20go version
21```
22
23## Documentation
24
25For complete documentation, see the associated [Godoc](http://godoc.org/github.com/hashicorp/raft).
26
27To prevent complications with cgo, the primary backend `MDBStore` is in a separate repository,
28called [raft-mdb](http://github.com/hashicorp/raft-mdb). That is the recommended implementation
29for the `LogStore` and `StableStore`.
30
31A pure Go backend using [BoltDB](https://github.com/boltdb/bolt) is also available called
32[raft-boltdb](https://github.com/hashicorp/raft-boltdb). It can also be used as a `LogStore`
33and `StableStore`.
34
35## Protocol
36
37raft is based on ["Raft: In Search of an Understandable Consensus Algorithm"](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf)
38
39A high level overview of the Raft protocol is described below, but for details please read the full
40[Raft paper](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf)
41followed by the raft source. Any questions about the raft protocol should be sent to the
42[raft-dev mailing list](https://groups.google.com/forum/#!forum/raft-dev).
43
44### Protocol Description
45
46Raft nodes are always in one of three states: follower, candidate or leader. All
47nodes initially start out as a follower. In this state, nodes can accept log entries
48from a leader and cast votes. If no entries are received for some time, nodes
49self-promote to the candidate state. In the candidate state nodes request votes from
50their peers. If a candidate receives a quorum of votes, then it is promoted to a leader.
51The leader must accept new log entries and replicate to all the other followers.
52In addition, if stale reads are not acceptable, all queries must also be performed on
53the leader.
54
55Once a cluster has a leader, it is able to accept new log entries. A client can
56request that a leader append a new log entry, which is an opaque binary blob to
57Raft. The leader then writes the entry to durable storage and attempts to replicate
58to a quorum of followers. Once the log entry is considered *committed*, it can be
59*applied* to a finite state machine. The finite state machine is application specific,
60and is implemented using an interface.
61
62An obvious question relates to the unbounded nature of a replicated log. Raft provides
63a mechanism by which the current state is snapshotted, and the log is compacted. Because
64of the FSM abstraction, restoring the state of the FSM must result in the same state
65as a replay of old logs. This allows Raft to capture the FSM state at a point in time,
66and then remove all the logs that were used to reach that state. This is performed automatically
67without user intervention, and prevents unbounded disk usage as well as minimizing
68time spent replaying logs.
69
70Lastly, there is the issue of updating the peer set when new servers are joining
71or existing servers are leaving. As long as a quorum of nodes is available, this
72is not an issue as Raft provides mechanisms to dynamically update the peer set.
73If a quorum of nodes is unavailable, then this becomes a very challenging issue.
74For example, suppose there are only 2 peers, A and B. The quorum size is also
752, meaning both nodes must agree to commit a log entry. If either A or B fails,
76it is now impossible to reach quorum. This means the cluster is unable to add,
77or remove a node, or commit any additional log entries. This results in *unavailability*.
78At this point, manual intervention would be required to remove either A or B,
79and to restart the remaining node in bootstrap mode.
80
81A Raft cluster of 3 nodes can tolerate a single node failure, while a cluster
82of 5 can tolerate 2 node failures. The recommended configuration is to either
83run 3 or 5 raft servers. This maximizes availability without
84greatly sacrificing performance.
85
86In terms of performance, Raft is comparable to Paxos. Assuming stable leadership,
87committing a log entry requires a single round trip to half of the cluster.
88Thus performance is bound by disk I/O and network latency.
89
90