1**This is the documentation for etcd2 releases. Read [etcd3 doc][v3-docs] for etcd3 releases.**
2
3[v3-docs]: ../docs.md#documentation
4
5
6# Tuning
7
8The default settings in etcd should work well for installations on a local network where the average network latency is low.
9However, when using etcd across multiple data centers or over networks with high latency you may need to tweak the heartbeat interval and election timeout settings.
10
11The network isn't the only source of latency. Each request and response may be impacted by slow disks on both the leader and follower. Each of these timeouts represents the total time from request to successful response from the other machine.
12
13## Time Parameters
14
15The underlying distributed consensus protocol relies on two separate time parameters to ensure that nodes can handoff leadership if one stalls or goes offline.
16The first parameter is called the *Heartbeat Interval*.
17This is the frequency with which the leader will notify followers that it is still the leader.
18For best practices, the parameter should be set around round-trip time between members.
19By default, etcd uses a `100ms` heartbeat interval.
20
21The second parameter is the *Election Timeout*.
22This timeout is how long a follower node will go without hearing a heartbeat before attempting to become leader itself.
23By default, etcd uses a `1000ms` election timeout.
24
25Adjusting these values is a trade off.
26The value of heartbeat interval is recommended to be around the maximum of average round-trip time (RTT) between members, normally around 0.5-1.5x the round-trip time.
27If heartbeat interval is too low, etcd will send unnecessary messages that increase the usage of CPU and network resources.
28On the other side, a too high heartbeat interval leads to high election timeout. Higher election timeout takes longer time to detect a leader failure.
29The easiest way to measure round-trip time (RTT) is to use [PING utility][ping].
30
31The election timeout should be set based on the heartbeat interval and average round-trip time between members.
32Election timeouts must be at least 10 times the round-trip time so it can account for variance in your network.
33For example, if the round-trip time between your members is 10ms then you should have at least a 100ms election timeout.
34
35You should also set your election timeout to at least 5 to 10 times your heartbeat interval to account for variance in leader replication.
36For a heartbeat interval of 50ms you should set your election timeout to at least 250ms - 500ms.
37
38The upper limit of election timeout is 50000ms (50s), which should only be used when deploying a globally-distributed etcd cluster.
39A reasonable round-trip time for the continental United States is 130ms, and the time between US and Japan is around 350-400ms.
40If your network has uneven performance or regular packet delays/loss then it is possible that a couple of retries may be necessary to successfully send a packet. So 5s is a safe upper limit of global round-trip time.
41As the election timeout should be an order of magnitude bigger than broadcast time, in the case of ~5s for a globally distributed cluster, then 50 seconds becomes a reasonable maximum.
42
43The heartbeat interval and election timeout value should be the same for all members in one cluster. Setting different values for etcd members may disrupt cluster stability.
44
45You can override the default values on the command line:
46
47```sh
48# Command line arguments:
49$ etcd -heartbeat-interval=100 -election-timeout=500
50
51# Environment variables:
52$ ETCD_HEARTBEAT_INTERVAL=100 ETCD_ELECTION_TIMEOUT=500 etcd
53```
54
55The values are specified in milliseconds.
56
57## Snapshots
58
59etcd appends all key changes to a log file.
60This log grows forever and is a complete linear history of every change made to the keys.
61A complete history works well for lightly used clusters but clusters that are heavily used would carry around a large log.
62
63To avoid having a huge log etcd makes periodic snapshots.
64These snapshots provide a way for etcd to compact the log by saving the current state of the system and removing old logs.
65
66### Snapshot Tuning
67
68Creating snapshots can be expensive so they're only created after a given number of changes to etcd.
69By default, snapshots will be made after every 10,000 changes.
70If etcd's memory usage and disk usage are too high, you can lower the snapshot threshold by setting the following on the command line:
71
72```sh
73# Command line arguments:
74$ etcd -snapshot-count=5000
75
76# Environment variables:
77$ ETCD_SNAPSHOT_COUNT=5000 etcd
78```
79
80[ping]: https://en.wikipedia.org/wiki/Ping_(networking_utility)
81