1**This is the documentation for etcd2 releases. Read [etcd3 doc][v3-docs] for etcd3 releases.** 2 3[v3-docs]: ../docs.md#documentation 4 5 6# Runtime Reconfiguration 7 8etcd comes with support for incremental runtime reconfiguration, which allows users to update the membership of the cluster at run time. 9 10Reconfiguration requests can only be processed when the majority of the cluster members are functioning. It is **highly recommended** to always have a cluster size greater than two in production. It is unsafe to remove a member from a two member cluster. The majority of a two member cluster is also two. If there is a failure during the removal process, the cluster might not able to make progress and need to [restart from majority failure][majority failure]. 11 12To better understand the design behind runtime reconfiguration, we suggest you read [the runtime reconfiguration document][runtime-reconf]. 13 14## Reconfiguration Use Cases 15 16Let's walk through some common reasons for reconfiguring a cluster. Most of these just involve combinations of adding or removing a member, which are explained below under [Cluster Reconfiguration Operations][cluster-reconf]. 17 18### Cycle or Upgrade Multiple Machines 19 20If you need to move multiple members of your cluster due to planned maintenance (hardware upgrades, network downtime, etc.), it is recommended to modify members one at a time. 21 22It is safe to remove the leader, however there is a brief period of downtime while the election process takes place. If your cluster holds more than 50MB, it is recommended to [migrate the member's data directory][member migration]. 23 24### Change the Cluster Size 25 26Increasing the cluster size can enhance [failure tolerance][fault tolerance table] and provide better read performance. Since clients can read from any member, increasing the number of members increases the overall read throughput. 27 28Decreasing the cluster size can improve the write performance of a cluster, with a trade-off of decreased resilience. Writes into the cluster are replicated to a majority of members of the cluster before considered committed. Decreasing the cluster size lowers the majority, and each write is committed more quickly. 29 30### Replace A Failed Machine 31 32If a machine fails due to hardware failure, data directory corruption, or some other fatal situation, it should be replaced as soon as possible. Machines that have failed but haven't been removed adversely affect your quorum and reduce the tolerance for an additional failure. 33 34To replace the machine, follow the instructions for [removing the member][remove member] from the cluster, and then [add a new member][add member] in its place. If your cluster holds more than 50MB, it is recommended to [migrate the failed member's data directory][member migration] if you can still access it. 35 36### Restart Cluster from Majority Failure 37 38If the majority of your cluster is lost or all of your nodes have changed IP addresses, then you need to take manual action in order to recover safely. 39The basic steps in the recovery process include [creating a new cluster using the old data][disaster recovery], forcing a single member to act as the leader, and finally using runtime configuration to [add new members][add member] to this new cluster one at a time. 40 41## Cluster Reconfiguration Operations 42 43Now that we have the use cases in mind, let us lay out the operations involved in each. 44 45Before making any change, the simple majority (quorum) of etcd members must be available. 46This is essentially the same requirement as for any other write to etcd. 47 48All changes to the cluster are done one at a time: 49 50* To update a single member peerURLs you will make an update operation 51* To replace a single member you will make an add then a remove operation 52* To increase from 3 to 5 members you will make two add operations 53* To decrease from 5 to 3 you will make two remove operations 54 55All of these examples will use the `etcdctl` command line tool that ships with etcd. 56If you want to use the members API directly you can find the documentation [here][member-api]. 57 58### Update a Member 59 60#### Update advertise client URLs 61 62If you would like to update the advertise client URLs of a member, you can simply restart 63that member with updated client urls flag (`--advertise-client-urls`) or environment variable 64(`ETCD_ADVERTISE_CLIENT_URLS`). The restarted member will self publish the updated URLs. 65A wrongly updated client URL will not affect the health of the etcd cluster. 66 67#### Update advertise peer URLs 68 69If you would like to update the advertise peer URLs of a member, you have to first update 70it explicitly via member command and then restart the member. The additional action is required 71since updating peer URLs changes the cluster wide configuration and can affect the health of the etcd cluster. 72 73To update the peer URLs, first, we need to find the target member's ID. You can list all members with `etcdctl`: 74 75```sh 76$ etcdctl member list 776e3bd23ae5f1eae0: name=node2 peerURLs=http://localhost:23802 clientURLs=http://127.0.0.1:23792 78924e2e83e93f2560: name=node3 peerURLs=http://localhost:23803 clientURLs=http://127.0.0.1:23793 79a8266ecf031671f3: name=node1 peerURLs=http://localhost:23801 clientURLs=http://127.0.0.1:23791 80``` 81 82In this example let's `update` a8266ecf031671f3 member ID and change its peerURLs value to http://10.0.1.10:2380 83 84```sh 85$ etcdctl member update a8266ecf031671f3 http://10.0.1.10:2380 86Updated member with ID a8266ecf031671f3 in cluster 87``` 88 89### Remove a Member 90 91Let us say the member ID we want to remove is a8266ecf031671f3. 92We then use the `remove` command to perform the removal: 93 94```sh 95$ etcdctl member remove a8266ecf031671f3 96Removed member a8266ecf031671f3 from cluster 97``` 98 99The target member will stop itself at this point and print out the removal in the log: 100 101``` 102etcd: this member has been permanently removed from the cluster. Exiting. 103``` 104 105It is safe to remove the leader, however the cluster will be inactive while a new leader is elected. This duration is normally the period of election timeout plus the voting process. 106 107### Add a New Member 108 109Adding a member is a two step process: 110 111 * Add the new member to the cluster via the [members API][member-api] or the `etcdctl member add` command. 112 * Start the new member with the new cluster configuration, including a list of the updated members (existing members + the new member). 113 114Using `etcdctl` let's add the new member to the cluster by specifying its [name][conf-name] and [advertised peer URLs][conf-adv-peer]: 115 116```sh 117$ etcdctl member add infra3 http://10.0.1.13:2380 118added member 9bf1b35fc7761a23 to cluster 119 120ETCD_NAME="infra3" 121ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380" 122ETCD_INITIAL_CLUSTER_STATE=existing 123``` 124 125`etcdctl` has informed the cluster about the new member and printed out the environment variables needed to successfully start it. 126Now start the new etcd process with the relevant flags for the new member: 127 128```sh 129$ export ETCD_NAME="infra3" 130$ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380" 131$ export ETCD_INITIAL_CLUSTER_STATE=existing 132$ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://10.0.1.13:2379 -listen-peer-urls http://10.0.1.13:2380 -initial-advertise-peer-urls http://10.0.1.13:2380 -data-dir %data_dir% 133``` 134 135The new member will run as a part of the cluster and immediately begin catching up with the rest of the cluster. 136 137If you are adding multiple members the best practice is to configure a single member at a time and verify it starts correctly before adding more new members. 138If you add a new member to a 1-node cluster, the cluster cannot make progress before the new member starts because it needs two members as majority to agree on the consensus. You will only see this behavior between the time `etcdctl member add` informs the cluster about the new member and the new member successfully establishing a connection to the existing one. 139 140#### Error Cases When Adding Members 141 142In the following case we have not included our new host in the list of enumerated nodes. 143If this is a new cluster, the node must be added to the list of initial cluster members. 144 145```sh 146$ etcd -name infra3 \ 147 -initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \ 148 -initial-cluster-state existing 149etcdserver: assign ids error: the member count is unequal 150exit 1 151``` 152 153In this case we give a different address (10.0.1.14:2380) to the one that we used to join the cluster (10.0.1.13:2380). 154 155```sh 156$ etcd -name infra4 \ 157 -initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra4=http://10.0.1.14:2380 \ 158 -initial-cluster-state existing 159etcdserver: assign ids error: unmatched member while checking PeerURLs 160exit 1 161``` 162 163When we start etcd using the data directory of a removed member, etcd will exit automatically if it connects to any active member in the cluster: 164 165```sh 166$ etcd 167etcd: this member has been permanently removed from the cluster. Exiting. 168exit 1 169``` 170 171### Strict Reconfiguration Check Mode (`-strict-reconfig-check`) 172 173As described in the above, the best practice of adding new members is to configure a single member at a time and verify it starts correctly before adding more new members. This step by step approach is very important because if newly added members is not configured correctly (for example the peer URLs are incorrect), the cluster can lose quorum. The quorum loss happens since the newly added member are counted in the quorum even if that member is not reachable from other existing members. Also quorum loss might happen if there is a connectivity issue or there are operational issues. 174 175For avoiding this problem, etcd provides an option `-strict-reconfig-check`. If this option is passed to etcd, etcd rejects reconfiguration requests if the number of started members will be less than a quorum of the reconfigured cluster. 176 177It is recommended to enable this option. However, it is disabled by default because of keeping compatibility. 178 179[add member]: #add-a-new-member 180[cluster-reconf]: #cluster-reconfiguration-operations 181[conf-adv-peer]: configuration.md#-initial-advertise-peer-urls 182[conf-name]: configuration.md#-name 183[disaster recovery]: admin_guide.md#disaster-recovery 184[fault tolerance table]: admin_guide.md#fault-tolerance-table 185[majority failure]: #restart-cluster-from-majority-failure 186[member-api]: members_api.md 187[member migration]: admin_guide.md#member-migration 188[remove member]: #remove-a-member 189[runtime-reconf]: runtime-reconf-design.md 190