github.com/hashicorp/raft

raft [![CircleCI](https://circleci.com/gh/hashicorp/raft.svg?style=svg)](https://circleci.com/gh/hashicorp/raft)
====

raft is a [Go](http://www.golang.org) library that manages a replicated
log and can be used with an FSM to manage replicated state machines. It
is a library for providing [consensus](http://en.wikipedia.org/wiki/Consensus_(computer_science)).

The use cases for such a library are far-reaching, such as replicated state
machines which are a key component of many distributed systems. They enable
building Consistent, Partition Tolerant (CP) systems, with limited
fault tolerance as well.

## Building

If you wish to build raft you'll need Go version 1.2+ installed.

Please check your installation with:

```
go version
```

## Documentation

For complete documentation, see the associated [Godoc](http://godoc.org/github.com/hashicorp/raft).

To prevent complications with cgo, the primary backend `MDBStore` is in a separate repository,
called [raft-mdb](http://github.com/hashicorp/raft-mdb). That is the recommended implementation
for the `LogStore` and `StableStore`.

A pure Go backend using [Bbolt](https://github.com/etcd-io/bbolt) is also available called
[raft-boltdb](https://github.com/hashicorp/raft-boltdb). It can also be used as a `LogStore`
and `StableStore`.


## Community Contributed Examples
[Raft gRPC Example](https://github.com/Jille/raft-grpc-example) - Utilizing the Raft repository with gRPC


## Tagged Releases

As of September 2017, HashiCorp will start using tags for this library to clearly indicate
major version updates. We recommend you vendor your application's dependency on this library.

* v0.1.0 is the original stable version of the library that was in main and has been maintained
with no breaking API changes. This was in use by Consul prior to version 0.7.0.

* v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version
manages server identities using a UUID, so introduces some breaking API changes. It also versions
the Raft protocol, and requires some special steps when interoperating with Raft servers running
older versions of the library (see the detailed comment in config.go about version compatibility).
You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required
to port Consul to these new interfaces.

    This version includes some new features as well, including non voting servers, a new address
    provider abstraction in the transport layer, and more resilient snapshots.

## Protocol

raft is based on ["Raft: In Search of an Understandable Consensus Algorithm"](https://raft.github.io/raft.pdf)

A high level overview of the Raft protocol is described below, but for details please read the full
[Raft paper](https://raft.github.io/raft.pdf)
followed by the raft source. Any questions about the raft protocol should be sent to the
[raft-dev mailing list](https://groups.google.com/forum/#!forum/raft-dev).

### Protocol Description

Raft nodes are always in one of three states: follower, candidate or leader. All
nodes initially start out as a follower. In this state, nodes can accept log entries
from a leader and cast votes. If no entries are received for some time, nodes
self-promote to the candidate state. In the candidate state nodes request votes from
their peers. If a candidate receives a quorum of votes, then it is promoted to a leader.
The leader must accept new log entries and replicate to all the other followers.
In addition, if stale reads are not acceptable, all queries must also be performed on
the leader.

Once a cluster has a leader, it is able to accept new log entries. A client can
request that a leader append a new log entry, which is an opaque binary blob to
Raft. The leader then writes the entry to durable storage and attempts to replicate
to a quorum of followers. Once the log entry is considered *committed*, it can be
*applied* to a finite state machine. The finite state machine is application specific,
and is implemented using an interface.

An obvious question relates to the unbounded nature of a replicated log. Raft provides
a mechanism by which the current state is snapshotted, and the log is compacted. Because
of the FSM abstraction, restoring the state of the FSM must result in the same state
as a replay of old logs. This allows Raft to capture the FSM state at a point in time,
and then remove all the logs that were used to reach that state. This is performed automatically
without user intervention, and prevents unbounded disk usage as well as minimizing
time spent replaying logs.

Lastly, there is the issue of updating the peer set when new servers are joining
or existing servers are leaving. As long as a quorum of nodes is available, this
is not an issue as Raft provides mechanisms to dynamically update the peer set.
If a quorum of nodes is unavailable, then this becomes a very challenging issue.
For example, suppose there are only 2 peers, A and B. The quorum size is also
2, meaning both nodes must agree to commit a log entry. If either A or B fails,
it is now impossible to reach quorum. This means the cluster is unable to add,
or remove a node, or commit any additional log entries. This results in *unavailability*.
At this point, manual intervention would be required to remove either A or B,
and to restart the remaining node in bootstrap mode.

A Raft cluster of 3 nodes can tolerate a single node failure, while a cluster
of 5 can tolerate 2 node failures. The recommended configuration is to either
run 3 or 5 raft servers. This maximizes availability without
greatly sacrificing performance.

In terms of performance, Raft is comparable to Paxos. Assuming stable leadership,
committing a log entry requires a single round trip to half of the cluster.
Thus performance is bound by disk I/O and network latency.
Name		Date	Size	#Lines	LOC
..		15-Jul-2021	-
.gitignore	H A D	15-Jul-2021	259	24	18
.golangci-lint.yml	H A D	15-Jul-2021	1.8 KiB	50	40
.travis.yml	H A D	15-Jul-2021	548	22	16
CHANGELOG.md	H A D	15-Jul-2021	6 KiB	105	67
LICENSE	H A D	15-Jul-2021	15.6 KiB	355	256
Makefile	H A D	15-Jul-2021	2.1 KiB	58	42
README.md	H A D	15-Jul-2021	5.7 KiB	112	83
api.go	H A D	15-Jul-2021	40.2 KiB	1,184	673
commands.go	H A D	15-Jul-2021	4.3 KiB	178	84
commitment.go	H A D	15-Jul-2021	3.2 KiB	102	66
config.go	H A D	15-Jul-2021	14.6 KiB	327	110
configuration.go	H A D	15-Jul-2021	11.8 KiB	362	235
discard_snapshot.go	H A D	15-Jul-2021	2 KiB	65	32
file_snapshot.go	H A D	15-Jul-2021	13.1 KiB	550	386
fsm.go	H A D	15-Jul-2021	6.9 KiB	247	152
future.go	H A D	15-Jul-2021	7.5 KiB	312	190
go.mod	H A D	15-Jul-2021	233	11	8
go.sum	H A D	15-Jul-2021	3.7 KiB	40	39
inmem_snapshot.go	H A D	15-Jul-2021	2.7 KiB	112	79
inmem_store.go	H A D	15-Jul-2021	2.8 KiB	131	102
inmem_transport.go	H A D	15-Jul-2021	8.7 KiB	360	266
log.go	H A D	15-Jul-2021	5.5 KiB	177	88
log_cache.go	H A D	15-Jul-2021	1.8 KiB	83	57
membership.md	H A D	15-Jul-2021	7 KiB	84	67
net_transport.go	H A D	15-Jul-2021	19.3 KiB	777	529
observer.go	H A D	15-Jul-2021	3.9 KiB	139	78
peersjson.go	H A D	15-Jul-2021	2.8 KiB	99	63
raft.go	H A D	15-Jul-2021	54.8 KiB	1,861	1,283
replication.go	H A D	15-Jul-2021	18.9 KiB	614	398
snapshot.go	H A D	15-Jul-2021	7.7 KiB	249	142
stable.go	H A D	15-Jul-2021	443	16	7
state.go	H A D	15-Jul-2021	3.8 KiB	172	113
tag.sh	H A D	15-Jul-2021	397	17	9
tcp_transport.go	H A D	15-Jul-2021	3 KiB	117	86
testing.go	H A D	15-Jul-2021	21.2 KiB	806	598
testing_batch.go	H A D	15-Jul-2021	598	30	19
transport.go	H A D	15-Jul-2021	4.7 KiB	128	53
util.go	H A D	15-Jul-2021	3.3 KiB	153	108