• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.circleci/H22-Apr-2021-

.github/H22-Apr-2021-

bench/H22-Apr-2021-

fuzzy/H22-Apr-2021-

.gitignoreH A D22-Apr-2021259

.golangci-lint.ymlH A D22-Apr-20211.8 KiB

.travis.ymlH A D22-Apr-2021548

CHANGELOG.mdH A D22-Apr-20216 KiB

LICENSEH A D22-Apr-202115.6 KiB

MakefileH A D22-Apr-20212.1 KiB

README.mdH A D22-Apr-20215.7 KiB

api.goH A D22-Apr-202140.1 KiB

commands.goH A D22-Apr-20214.3 KiB

commitment.goH A D22-Apr-20213.2 KiB

commitment_test.goH A D22-Apr-20215.9 KiB

config.goH A D22-Apr-202114.6 KiB

configuration.goH A D22-Apr-202111.8 KiB

configuration_test.goH A D22-Apr-20219 KiB

discard_snapshot.goH A D22-Apr-20212 KiB

discard_snapshot_test.goH A D22-Apr-2021426

file_snapshot.goH A D22-Apr-202113.1 KiB

file_snapshot_test.goH A D22-Apr-20217.4 KiB

fsm.goH A D22-Apr-20216.9 KiB

future.goH A D22-Apr-20217.5 KiB

future_test.goH A D22-Apr-2021923

go.modH A D22-Apr-2021233

go.sumH A D22-Apr-20213.7 KiB

inmem_snapshot.goH A D22-Apr-20212.7 KiB

inmem_snapshot_test.goH A D22-Apr-20214 KiB

inmem_store.goH A D22-Apr-20212.8 KiB

inmem_transport.goH A D22-Apr-20218.7 KiB

inmem_transport_test.goH A D22-Apr-20212 KiB

integ_test.goH A D22-Apr-20218.7 KiB

log.goH A D22-Apr-20215.5 KiB

log_cache.goH A D22-Apr-20211.8 KiB

log_cache_test.goH A D22-Apr-20212.7 KiB

log_test.goH A D22-Apr-20213.1 KiB

membership.mdH A D22-Apr-20217 KiB

net_transport.goH A D22-Apr-202119.3 KiB

net_transport_test.goH A D22-Apr-202118.7 KiB

observer.goH A D22-Apr-20213.9 KiB

peersjson.goH A D22-Apr-20212.8 KiB

peersjson_test.goH A D22-Apr-20213 KiB

raft.goH A D22-Apr-202154.5 KiB

raft_test.goH A D22-Apr-202161.6 KiB

replication.goH A D22-Apr-202118.9 KiB

snapshot.goH A D22-Apr-20217.7 KiB

stable.goH A D22-Apr-2021443

state.goH A D22-Apr-20213.8 KiB

tag.shH A D22-Apr-2021397

tcp_transport.goH A D22-Apr-20213 KiB

tcp_transport_test.goH A D22-Apr-2021928

testing.goH A D22-Apr-202121.2 KiB

testing_batch.goH A D22-Apr-2021598

transport.goH A D22-Apr-20214.7 KiB

transport_test.goH A D22-Apr-20216.8 KiB

util.goH A D22-Apr-20213.3 KiB

util_test.goH A D22-Apr-20212.7 KiB

README.md

1raft [![CircleCI](https://circleci.com/gh/hashicorp/raft.svg?style=svg)](https://circleci.com/gh/hashicorp/raft)
2====
3
4raft is a [Go](http://www.golang.org) library that manages a replicated
5log and can be used with an FSM to manage replicated state machines. It
6is a library for providing [consensus](http://en.wikipedia.org/wiki/Consensus_(computer_science)).
7
8The use cases for such a library are far-reaching, such as replicated state
9machines which are a key component of many distributed systems. They enable
10building Consistent, Partition Tolerant (CP) systems, with limited
11fault tolerance as well.
12
13## Building
14
15If you wish to build raft you'll need Go version 1.2+ installed.
16
17Please check your installation with:
18
19```
20go version
21```
22
23## Documentation
24
25For complete documentation, see the associated [Godoc](http://godoc.org/github.com/hashicorp/raft).
26
27To prevent complications with cgo, the primary backend `MDBStore` is in a separate repository,
28called [raft-mdb](http://github.com/hashicorp/raft-mdb). That is the recommended implementation
29for the `LogStore` and `StableStore`.
30
31A pure Go backend using [Bbolt](https://github.com/etcd-io/bbolt) is also available called
32[raft-boltdb](https://github.com/hashicorp/raft-boltdb). It can also be used as a `LogStore`
33and `StableStore`.
34
35
36## Community Contributed Examples
37[Raft gRPC Example](https://github.com/Jille/raft-grpc-example) - Utilizing the Raft repository with gRPC
38
39
40## Tagged Releases
41
42As of September 2017, HashiCorp will start using tags for this library to clearly indicate
43major version updates. We recommend you vendor your application's dependency on this library.
44
45* v0.1.0 is the original stable version of the library that was in main and has been maintained
46with no breaking API changes. This was in use by Consul prior to version 0.7.0.
47
48* v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version
49manages server identities using a UUID, so introduces some breaking API changes. It also versions
50the Raft protocol, and requires some special steps when interoperating with Raft servers running
51older versions of the library (see the detailed comment in config.go about version compatibility).
52You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required
53to port Consul to these new interfaces.
54
55    This version includes some new features as well, including non voting servers, a new address
56    provider abstraction in the transport layer, and more resilient snapshots.
57
58## Protocol
59
60raft is based on ["Raft: In Search of an Understandable Consensus Algorithm"](https://raft.github.io/raft.pdf)
61
62A high level overview of the Raft protocol is described below, but for details please read the full
63[Raft paper](https://raft.github.io/raft.pdf)
64followed by the raft source. Any questions about the raft protocol should be sent to the
65[raft-dev mailing list](https://groups.google.com/forum/#!forum/raft-dev).
66
67### Protocol Description
68
69Raft nodes are always in one of three states: follower, candidate or leader. All
70nodes initially start out as a follower. In this state, nodes can accept log entries
71from a leader and cast votes. If no entries are received for some time, nodes
72self-promote to the candidate state. In the candidate state nodes request votes from
73their peers. If a candidate receives a quorum of votes, then it is promoted to a leader.
74The leader must accept new log entries and replicate to all the other followers.
75In addition, if stale reads are not acceptable, all queries must also be performed on
76the leader.
77
78Once a cluster has a leader, it is able to accept new log entries. A client can
79request that a leader append a new log entry, which is an opaque binary blob to
80Raft. The leader then writes the entry to durable storage and attempts to replicate
81to a quorum of followers. Once the log entry is considered *committed*, it can be
82*applied* to a finite state machine. The finite state machine is application specific,
83and is implemented using an interface.
84
85An obvious question relates to the unbounded nature of a replicated log. Raft provides
86a mechanism by which the current state is snapshotted, and the log is compacted. Because
87of the FSM abstraction, restoring the state of the FSM must result in the same state
88as a replay of old logs. This allows Raft to capture the FSM state at a point in time,
89and then remove all the logs that were used to reach that state. This is performed automatically
90without user intervention, and prevents unbounded disk usage as well as minimizing
91time spent replaying logs.
92
93Lastly, there is the issue of updating the peer set when new servers are joining
94or existing servers are leaving. As long as a quorum of nodes is available, this
95is not an issue as Raft provides mechanisms to dynamically update the peer set.
96If a quorum of nodes is unavailable, then this becomes a very challenging issue.
97For example, suppose there are only 2 peers, A and B. The quorum size is also
982, meaning both nodes must agree to commit a log entry. If either A or B fails,
99it is now impossible to reach quorum. This means the cluster is unable to add,
100or remove a node, or commit any additional log entries. This results in *unavailability*.
101At this point, manual intervention would be required to remove either A or B,
102and to restart the remaining node in bootstrap mode.
103
104A Raft cluster of 3 nodes can tolerate a single node failure, while a cluster
105of 5 can tolerate 2 node failures. The recommended configuration is to either
106run 3 or 5 raft servers. This maximizes availability without
107greatly sacrificing performance.
108
109In terms of performance, Raft is comparable to Paxos. Assuming stable leadership,
110committing a log entry requires a single round trip to half of the cluster.
111Thus performance is bound by disk I/O and network latency.
112