1<chapter id="repmgrd-automatic-failover" xreflabel="Automatic failover with repmgrd"> 2 3 <title>Automatic failover with repmgrd</title> 4 5 <indexterm> 6 <primary>repmgrd</primary> 7 <secondary>automatic failover</secondary> 8 </indexterm> 9 10 <para> 11 &repmgrd; is a management and monitoring daemon which runs 12 on each node in a replication cluster. It can automate actions such as 13 failover and updating standbys to follow the new primary, as well as 14 providing monitoring information about the state of each standby. 15 </para> 16 17 <sect1 id="repmgrd-witness-server" xreflabel="Using a witness server with repmgrd"> 18 <title>Using a witness server</title> 19 20 <indexterm> 21 <primary>repmgrd</primary> 22 <secondary>witness server</secondary> 23 </indexterm> 24 25 <indexterm> 26 <primary>witness server</primary> 27 <secondary>repmgrd</secondary> 28 </indexterm> 29 30 <para> 31 A <xref linkend="witness-server"/> is a normal PostgreSQL instance which 32 is not part of the streaming replication cluster; its purpose is, if a 33 failover situation occurs, to provide proof that it is the primary server 34 itself which is unavailable, rather than e.g. a network split between 35 different physical locations. 36 </para> 37 38 <para> 39 A typical use case for a witness server is a two-node streaming replication 40 setup, where the primary and standby are in different locations (data centres). 41 By creating a witness server in the same location (data centre) as the primary, 42 if the primary becomes unavailable it's possible for the standby to decide whether 43 it can promote itself without risking a "split brain" scenario: if it can't see either the 44 witness or the primary server, it's likely there's a network-level interruption 45 and it should not promote itself. If it can see the witness but not the primary, 46 this proves there is no network interruption and the primary itself is unavailable, 47 and it can therefore promote itself (and ideally take action to fence the 48 former primary). 49 </para> 50 <note> 51 <para> 52 <emphasis>Never</emphasis> install a witness server on the same physical host 53 as another node in the replication cluster managed by &repmgr; - it's essential 54 the witness is not affected in any way by failure of another node. 55 </para> 56 </note> 57 <para> 58 For more complex replication scenarios, e.g. with multiple datacentres, it may 59 be preferable to use location-based failover, which ensures that only nodes 60 in the same location as the primary will ever be promotion candidates; 61 see <xref linkend="repmgrd-network-split"/> for more details. 62 </para> 63 64 <note> 65 <simpara> 66 A witness server will only be useful if &repmgrd; 67 is in use. 68 </simpara> 69 </note> 70 71 <sect2 id="creating-witness-server"> 72 <title>Creating a witness server</title> 73 <para> 74 To create a witness server, set up a normal PostgreSQL instance on a server 75 in the same physical location as the cluster's primary server. 76 </para> 77 <para> 78 This instance should <emphasis>not</emphasis> be on the same physical host as the primary server, 79 as otherwise if the primary server fails due to hardware issues, the witness 80 server will be lost too. 81 </para> 82 <note> 83 <simpara> 84 &repmgr; 3.3 and earlier provided a <command>repmgr create witness</command> 85 command, which would automatically create a PostgreSQL instance. However 86 this often resulted in an unsatisfactory, hard-to-customise instance. 87 </simpara> 88 </note> 89 <para> 90 The witness server should be configured in the same way as a normal 91 &repmgr; node; see section <xref linkend="configuration"/>. 92 </para> 93 <para> 94 Register the witness server with <xref linkend="repmgr-witness-register"/>. 95 This will create the &repmgr; extension on the witness server, and make 96 a copy of the &repmgr; metadata. 97 </para> 98 <note> 99 <simpara> 100 As the witness server is not part of the replication cluster, further 101 changes to the &repmgr; metadata will be synchronised by 102 &repmgrd;. 103 </simpara> 104 </note> 105 <para> 106 Once the witness server has been configured, &repmgrd; 107 should be started. 108 </para> 109 110 <para> 111 To unregister a witness server, use <xref linkend="repmgr-witness-unregister"/>. 112 </para> 113 114 </sect2> 115 116</sect1> 117 118 119<sect1 id="repmgrd-network-split" xreflabel="Handling network splits with repmgrd"> 120 <title>Handling network splits with repmgrd</title> 121 <indexterm> 122 <primary>repmgrd</primary> 123 <secondary>network splits</secondary> 124 </indexterm> 125 126 <indexterm> 127 <primary>network splits</primary> 128 </indexterm> 129 130 <para> 131 A common pattern for replication cluster setups is to spread servers over 132 more than one datacentre. This can provide benefits such as geographically- 133 distributed read replicas and DR (disaster recovery capability). However 134 this also means there is a risk of disconnection at network level between 135 datacentre locations, which would result in a split-brain scenario if 136 servers in a secondary data centre were no longer able to see the primary 137 in the main data centre and promoted a standby among themselves. 138 </para> 139 <para> 140 &repmgr; enables provision of "<xref linkend="witness-server"/>" to 141 artificially create a quorum of servers in a particular location, ensuring 142 that nodes in another location will not elect a new primary if they 143 are unable to see the majority of nodes. However this approach does not 144 scale well, particularly with more complex replication setups, e.g. 145 where the majority of nodes are located outside of the primary datacentre. 146 It also means the <literal>witness</literal> node needs to be managed as an 147 extra PostgreSQL instance outside of the main replication cluster, which 148 adds administrative and programming complexity. 149 </para> 150 <para> 151 <literal>repmgr4</literal> introduces the concept of <literal>location</literal>: 152 each node is associated with an arbitrary location string (default is 153 <literal>default</literal>); this is set in <filename>repmgr.conf</filename>, e.g.: 154 <programlisting> 155 node_id=1 156 node_name=node1 157 conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2' 158 data_directory='/var/lib/postgresql/data' 159 location='dc1'</programlisting> 160 </para> 161 <para> 162 In a failover situation, &repmgrd; will check if any servers in the 163 same location as the current primary node are visible. If not, &repmgrd; 164 will assume a network interruption and not promote any node in any 165 other location (it will however enter <link linkend="repmgrd-degraded-monitoring">degraded monitoring</link> 166 mode until a primary becomes visible). 167 </para> 168 169</sect1> 170 171 172<sect1 id="repmgrd-primary-visibility-consensus" xreflabel="Primary visibility consensus"> 173 <title>Primary visibility consensus</title> 174 175 <indexterm> 176 <primary>repmgrd</primary> 177 <secondary>primary visibility consensus</secondary> 178 </indexterm> 179 180 <indexterm> 181 <primary>primary_visibility_consensus</primary> 182 </indexterm> 183 184 <para> 185 In more complex replication setups, particularly where replication occurs between 186 multiple datacentres, it's possible that some but not all standbys get cut off from the 187 primary (but not from the other standbys). 188 </para> 189 <para> 190 In this situation, normally it's not desirable for any of the standbys which have been 191 cut off to initiate a failover, as the primary is still functioning and standbys are 192 connected. Beginning with <link linkend="release-4.4">&repmgr; 4.4</link> 193 it is now possible for the affected standbys to build a consensus about whether 194 the primary is still available to some standbys ("primary visibility consensus"). 195 This is done by polling each standby (and the witness, if present) for the time it last saw the 196 primary; if any have seen the primary very recently, it's reasonable 197 to infer that the primary is still available and a failover should not be started. 198 </para> 199 200 <para> 201 The time the primary was last seen by each node can be checked by executing 202 <link linkend="repmgr-service-status"><command>repmgr service status</command></link> 203 (&repmgr; 4.2 - 4.4: <link linkend="repmgr-service-status"><command>repmgr daemon status</command></link>) 204 which includes this in its output, e.g.: 205 <programlisting>$ repmgr -f /etc/repmgr.conf service status 206 ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen 207----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 208 1 | node1 | primary | * running | | running | 27259 | no | n/a 209 2 | node2 | standby | running | node1 | running | 27272 | no | 1 second(s) ago 210 3 | node3 | standby | running | node1 | running | 27282 | no | 0 second(s) ago 211 4 | node4 | witness | * running | node1 | running | 27298 | no | 1 second(s) ago</programlisting> 212 213 </para> 214 215 <para> 216 To enable this functionality, in <filename>repmgr.conf</filename> set: 217 <programlisting> 218 primary_visibility_consensus=true</programlisting> 219 </para> 220 <note> 221 <para> 222 <option>primary_visibility_consensus</option> <emphasis>must</emphasis> be set to 223 <literal>true</literal> on all nodes for it to be effective. 224 </para> 225 </note> 226 227 <para> 228 The following sample &repmgrd; log output demonstrates the behaviour in a situation 229 where one of three standbys is no longer able to connect to the primary, but <emphasis>can</emphasis> 230 connect to the two other standbys ("sibling nodes"): 231 <programlisting> 232 [2019-05-17 05:36:12] [WARNING] unable to reconnect to node 1 after 3 attempts 233 [2019-05-17 05:36:12] [INFO] 2 active sibling nodes registered 234 [2019-05-17 05:36:12] [INFO] local node's last receive lsn: 0/7006E58 235 [2019-05-17 05:36:12] [INFO] checking state of sibling node "node3" (ID: 3) 236 [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago 237 [2019-05-17 05:36:12] [NOTICE] node 3 last saw primary node 1 second(s) ago, considering primary still visible 238 [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/7006E58 239 [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2) 240 [2019-05-17 05:36:12] [INFO] checking state of sibling node "node4" (ID: 4) 241 [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago 242 [2019-05-17 05:36:12] [NOTICE] node 4 last saw primary node 0 second(s) ago, considering primary still visible 243 [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node4" (ID: 4) is: 0/7006E58 244 [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) has same LSN as current candidate "node2" (ID: 2) 245 [2019-05-17 05:36:12] [INFO] 2 nodes can see the primary 246 [2019-05-17 05:36:12] [DETAIL] following nodes can see the primary: 247 - node "node3" (ID: 3): 1 second(s) ago 248 - node "node4" (ID: 4): 0 second(s) ago 249 [2019-05-17 05:36:12] [NOTICE] cancelling failover as some nodes can still see the primary 250 [2019-05-17 05:36:12] [NOTICE] election cancelled 251 [2019-05-17 05:36:14] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state</programlisting> 252 In this situation it will cancel the failover and enter degraded monitoring node, 253 waiting for the primary to reappear. 254 </para> 255</sect1> 256 257<sect1 id="repmgrd-standby-disconnection-on-failover" xreflabel="Standby disconnection on failover"> 258 <title>Standby disconnection on failover</title> 259 260 <indexterm> 261 <primary>repmgrd</primary> 262 <secondary>standby disconnection on failover</secondary> 263 </indexterm> 264 265 <indexterm> 266 <primary>standby disconnection on failover</primary> 267 </indexterm> 268 269 <para> 270 If <option>standby_disconnect_on_failover</option> is set to <literal>true</literal> in 271 <filename>repmgr.conf</filename>, in a failover situation &repmgrd; will forcibly disconnect 272 the local node's WAL receiver, and wait for the WAL receiver on all sibling nodes to be 273 disconnected, before making a failover decision. 274 </para> 275 <note> 276 <para> 277 <option>standby_disconnect_on_failover</option> is available with PostgreSQL 9.5 and later. 278 Additionally this requires that the <literal>repmgr</literal> database user is a superuser. 279 </para> 280 </note> 281 <para> 282 By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes 283 are receiving data from the primary and their LSN location will be static. 284 </para> 285 <important> 286 <para> 287 <option>standby_disconnect_on_failover</option> <emphasis>must</emphasis> be set to the same value on 288 all nodes. 289 </para> 290 </important> 291 <para> 292 Note that when using <option>standby_disconnect_on_failover</option> there will be a delay of 5 seconds 293 plus however many seconds it takes to confirm the WAL receiver is disconnected before 294 &repmgrd; proceeds with the failover decision. 295 </para> 296 <para> 297 &repmgrd; will wait up to <option>sibling_nodes_disconnect_timeout</option> seconds (default: 298 <literal>30</literal>) to confirm that the WAL receiver on all sibling nodes hase been 299 disconnected before proceding with the failover operation. If the timeout is reached, the 300 failover operation will go ahead anyway. 301 </para> 302 <para> 303 Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver. 304 </para> 305 <para> 306 If using <option>standby_disconnect_on_failover</option>, we recommend that the 307 <option>primary_visibility_consensus</option> option is also used. 308 </para> 309 310</sect1> 311 312<sect1 id="repmgrd-failover-validation" xreflabel="Failover validation"> 313 <title>Failover validation</title> 314 315 <indexterm> 316 <primary>repmgrd</primary> 317 <secondary>failover validation</secondary> 318 </indexterm> 319 320 <indexterm> 321 <primary>failover validation</primary> 322 </indexterm> 323 324 <para> 325 From <link linkend="release-4.3">repmgr 4.3</link>, &repmgr; makes it possible to provide a script 326 to &repmgrd; which, in a failover situation, 327 will be executed by the promotion candidate (the node which has been selected 328 to be the new primary) to confirm whether the node should actually be promoted. 329 </para> 330 <para> 331 To use this, <option>failover_validation_command</option> in <filename>repmgr.conf</filename> 332 to a script executable by the <literal>postgres</literal> system user, e.g.: 333 <programlisting> 334 failover_validation_command=/path/to/script.sh %n</programlisting> 335 </para> 336 <para> 337 The <literal>%n</literal> parameter will be replaced with the node ID when the script is 338 executed. A number of other parameters are also available, see section 339 "<xref linkend="repmgrd-automatic-failover-configuration-optional"/>" for details. 340 </para> 341 <para> 342 This script must return an exit code of <literal>0</literal> to indicate the node should promote itself. 343 Any other value will result in the promotion being aborted and the election rerun. 344 There is a pause of <option>election_rerun_interval</option> seconds before the election is rerun. 345 </para> 346 <para> 347 Sample &repmgrd; log file output during which the failover validation 348 script rejects the proposed promotion candidate: 349 <programlisting> 350[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds 351[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2) 352[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command" 353[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2 354[2019-03-13 21:01:30] [INFO] output returned by failover validation command: 355Node ID: 2 356 357[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1" 358[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun 359[2019-03-13 21:01:30] [INFO] 1 followers to notify 360[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection 361INFO: node 3 received notification to rerun promotion candidate election 362[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")</programlisting> 363 </para> 364 365 366</sect1> 367 368 <sect1 id="cascading-replication" xreflabel="Cascading replication"> 369 <title>repmgrd and cascading replication</title> 370 371 <indexterm> 372 <primary>repmgrd</primary> 373 <secondary>cascading replication</secondary> 374 </indexterm> 375 376 <indexterm> 377 <primary>cascading replication</primary> 378 <secondary>repmgrd</secondary> 379 </indexterm> 380 381 <para> 382 Cascading replication - where a standby can connect to an upstream node and not 383 the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and 384 &repmgrd; support cascading replication by keeping track of the relationship 385 between standby servers - each node record is stored with the node id of its 386 upstream ("parent") server (except of course the primary server). 387 </para> 388 <para> 389 In a failover situation where the primary node fails and a top-level standby 390 is promoted, a standby connected to another standby will not be affected 391 and continue working as normal (even if the upstream standby it's connected 392 to becomes the primary node). If however the node's direct upstream fails, 393 the "cascaded standby" will attempt to reconnect to that node's parent 394 (unless <varname>failover</varname> is set to <literal>manual</literal> in 395 <filename>repmgr.conf</filename>). 396 </para> 397 398 </sect1> 399 400<sect1 id="repmgrd-primary-child-disconnection" xreflabel="Monitoring standby disconnections on the primary"> 401 <title>Monitoring standby disconnections on the primary node</title> 402 403 <indexterm> 404 <primary>repmgrd</primary> 405 <secondary>standby disconnection</secondary> 406 </indexterm> 407 408 <indexterm> 409 <primary>repmgrd</primary> 410 <secondary>child node disconnection</secondary> 411 </indexterm> 412 413 <note> 414 <para> 415 This functionality is available in <link linkend="release-4.4">&repmgr; 4.4</link> and later. 416 </para> 417 </note> 418 <para> 419 When running on the primary node, &repmgrd; can 420 monitor connections and in particular disconnections by its attached 421 child nodes (standbys, and if in use, the witness server), and optionally 422 execute a custom command if certain criteria are met (such as the number of 423 attached nodes falling to zero following a failover to a new primary); this 424 command can be used for example to "fence" the node and ensure it 425 is isolated from any applications attempting to access the replication cluster. 426 </para> 427 428 <note> 429 <para> 430 Currently &repmgrd; can only detect disconnections 431 of streaming replication standbys and cannot determine whether a standby 432 has disconnected and fallen back to archive recovery. 433 </para> 434 <para> 435 See section <link linkend="repmgrd-primary-child-disconnection-caveats">caveats</link> below. 436 </para> 437 </note> 438 439 <sect2 id="repmgrd-primary-child-disconnection-monitoring-process"> 440 <title>Standby disconnections monitoring process and criteria</title> 441 <para> 442 &repmgrd; monitors attached child nodes and decides 443 whether to invoke the user-defined command based on the following process 444 and criteria: 445 <itemizedlist> 446 447 <listitem> 448 <para> 449 Every few seconds (defined by the configuration parameter <varname>child_nodes_check_interval</varname>; 450 default: <literal>5</literal> seconds, a value of <literal>0</literal> disables this altogether), &repmgrd; queries 451 the <literal>pg_stat_replication</literal> system view and compares 452 the nodes present there against the list of nodes registered with &repmgr; which 453 should be attached to the primary. 454 </para> 455 <para> 456 If a witness server is in use, &repmgrd; connects to it and checks which upstream node 457 it is following. 458 </para> 459 </listitem> 460 461 <listitem> 462 <para> 463 If a child node (standby) is no longer present in <literal>pg_stat_replication</literal>, 464 &repmgrd; notes the time it detected the node's absence, and additionally generates a 465 <literal>child_node_disconnect</literal> event. 466 </para> 467 <para> 468 If a witness server is in use, and it is no longer following the primary, or not 469 reachable at all, &repmgrd; notes the time it detected the node's absence, and additionally generates a 470 <literal>child_node_disconnect</literal> event. 471 </para> 472 </listitem> 473 474 <listitem> 475 <para> 476 If a child node (standby) which was absent from <literal>pg_stat_replication</literal> reappears, 477 &repmgrd; clears the time it detected the node's absence, and additionally generates a 478 <literal>child_node_reconnect</literal> event. 479 </para> 480 <para> 481 If a witness server is in use, which was previously not reachable or not following the 482 primary node, has become reachable and is following the primary node, &repmgrd; clears the 483 time it detected the node's absence, and additionally generates a 484 <literal>child_node_reconnect</literal> event. 485 </para> 486 </listitem> 487 488 <listitem> 489 <para> 490 If an entirely new child node (standby or witness) is detected, &repmgrd; adds it to its internal list 491 and additionally generates a <literal>child_node_new_connect</literal> event. 492 </para> 493 </listitem> 494 495 <listitem> 496 <para> 497 If the <varname>child_nodes_disconnect_command</varname> parameter is set in 498 <filename>repmgr.conf</filename>, &repmgrd; will then loop through all child nodes. 499 If it determines that insufficient child nodes are connected, and a 500 minimum of <varname>child_nodes_disconnect_timeout</varname> seconds (default: <literal>30</literal>) 501 has elapsed since the last node became disconnected, &repmgrd; will then execute the 502 <varname>child_nodes_disconnect_command</varname> script. 503 </para> 504 <para> 505 By default, the <varname>child_nodes_disconnect_command</varname> will only be executed 506 if all child nodes are disconnected. If <varname>child_nodes_connected_min_count</varname> 507 is set, the <varname>child_nodes_disconnect_command</varname> script will be triggered 508 if the number of connected child nodes falls below the specified value (e.g. 509 if set to <literal>2</literal>, the script will be triggered if only one child node 510 is connected). Alternatively, if <varname>child_nodes_disconnect_min_count</varname> 511 and more than that number of child nodes disconnects, the script will be triggered. 512 </para> 513 <note> 514 <para> 515 By default, a witness node, if in use, will <emphasis>not</emphasis> be counted as a 516 child node for the purposes of determining whether to execute 517 <varname>child_nodes_disconnect_command</varname>. 518 </para> 519 <para> 520 To enable the witness node to be counted as a child node, set 521 <varname>child_nodes_connected_include_witness</varname> in <filename>repmgr.conf</filename> 522 to <literal>true</literal> 523 (and <link linkend="repmgrd-reloading-configuration">reload the configuration</link> if &repmgrd; 524 is running). 525 </para> 526 </note> 527 </listitem> 528 529 <listitem> 530 <para> 531 Note that child nodes which are not attached when &repmgrd; 532 starts will <emphasis>not</emphasis> be considered as missing, as &repmgrd; 533 cannot know why they are not attached. 534 </para> 535 </listitem> 536 537 </itemizedlist> 538 </para> 539 </sect2> 540 541 <sect2 id="repmgrd-primary-child-disconnection-example"> 542 <title>Standby disconnections monitoring process example</title> 543 <para> 544 This example shows typical &repmgrd; log output from a three-node cluster 545 (primary and two child nodes), with <varname>child_nodes_connected_min_count</varname> 546 set to <literal>2</literal>. 547 </para> 548 <para> 549 &repmgrd; on the primary has started up, while two child 550 nodes are being provisioned: 551 <programlisting> 552[2019-04-24 15:25:33] [INFO] monitoring primary node "node1" (ID: 1) in normal state 553[2019-04-24 15:25:35] [NOTICE] new node "node2" (ID: 2) has connected 554[2019-04-24 15:25:35] [NOTICE] 1 (of 1) child nodes are connected, but at least 2 child nodes required 555[2019-04-24 15:25:35] [INFO] no child nodes have detached since repmgrd startup 556(...) 557[2019-04-24 15:25:44] [NOTICE] new node "node3" (ID: 3) has connected 558[2019-04-24 15:25:46] [INFO] monitoring primary node "node1" (ID: 1) in normal state 559(...)</programlisting> 560 </para> 561 <para> 562 One of the child nodes has disconnected; &repmgrd; 563 is now waiting <varname>child_nodes_disconnect_timeout</varname> seconds 564 before executing <varname>child_nodes_disconnect_command</varname>: 565 <programlisting> 566[2019-04-24 15:28:11] [INFO] monitoring primary node "node1" (ID: 1) in normal state 567[2019-04-24 15:28:17] [INFO] monitoring primary node "node1" (ID: 1) in normal state 568[2019-04-24 15:28:19] [NOTICE] node "node3" (ID: 3) has disconnected 569[2019-04-24 15:28:19] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required 570[2019-04-24 15:28:19] [INFO] most recently detached child node was 3 (ca. 0 seconds ago), not triggering "child_nodes_disconnect_command" 571[2019-04-24 15:28:19] [DETAIL] "child_nodes_disconnect_timeout" set To 30 seconds 572(...)</programlisting> 573 </para> 574 <para> 575 <varname>child_nodes_disconnect_command</varname> is executed once: 576 <programlisting> 577[2019-04-24 15:28:49] [INFO] most recently detached child node was 3 (ca. 30 seconds ago), triggering "child_nodes_disconnect_command" 578[2019-04-24 15:28:49] [INFO] "child_nodes_disconnect_command" is: 579 "/usr/bin/fence-all-the-things.sh" 580[2019-04-24 15:28:51] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required 581[2019-04-24 15:28:51] [INFO] "child_nodes_disconnect_command" was previously executed, taking no action</programlisting> 582 </para> 583 584 </sect2> 585 586 <sect2 id="repmgrd-primary-child-disconnection-caveats"> 587 <title>Standby disconnections monitoring caveats</title> 588 <para> 589 The follwing caveats should be considered if you are intending to use this functionality. 590 </para> 591 <para> 592 <itemizedlist mark="bullet"> 593 <listitem> 594 <para> 595 If a child node is configured to use archive recovery, it's possible that 596 the child node will disconnect from the primary node and fall back to 597 archive recovery. In this case &repmgrd; 598 will nevertheless register a node disconnection. 599 </para> 600 </listitem> 601 602 <listitem> 603 <para> 604 &repmgr; relies on <varname>application_name</varname> in the child node's 605 <varname>primary_conninfo</varname> string to be the same as the node name 606 defined in the node's <filename>repmgr.conf</filename> file. Furthermore, 607 this <varname>application_name</varname> must be unique across the replication 608 cluster. 609 </para> 610 <para> 611 If a custom <varname>application_name</varname> is used, or the 612 <varname>application_name</varname> is not unique across the replication 613 cluster, &repmgr; will not be able to reliably monitor child node connections. 614 </para> 615 </listitem> 616 617 </itemizedlist> 618 </para> 619 </sect2> 620 621 622 <sect2 id="repmgrd-primary-child-disconnection-configuration"> 623 <title>Standby disconnections monitoring process configuration</title> 624 <para> 625 The following parameters, set in <filename>repmgr.conf</filename>, 626 control how child node disconnection monitoring operates. 627 </para> 628 <variablelist> 629 630 <varlistentry> 631 <term><varname>child_nodes_check_interval</varname></term> 632 <listitem> 633 <indexterm> 634 <primary>child_nodes_check_interval</primary> 635 <secondary>child node disconnection monitoring</secondary> 636 </indexterm> 637 638 <para> 639 Interval (in seconds) after which &repmgrd; queries the 640 <literal>pg_stat_replication</literal> system view and compares the nodes present 641 there against the list of nodes registered with repmgr which should be attached to the primary. 642 </para> 643 <para> 644 Default is <literal>5</literal> seconds, a value of <literal>0</literal> disables this check 645 altogether. 646 </para> 647 </listitem> 648 </varlistentry> 649 650 <varlistentry> 651 <term><varname>child_nodes_disconnect_command</varname></term> 652 653 <listitem> 654 <indexterm> 655 <primary>child_nodes_disconnect_command</primary> 656 <secondary>child node disconnection monitoring</secondary> 657 </indexterm> 658 659 <para> 660 User-definable script to be executed when &repmgrd; 661 determines that an insufficient number of child nodes are connected. By default 662 the script is executed when no child nodes are executed, but the execution 663 threshold can be modified by setting one of <varname>child_nodes_connected_min_count</varname> 664 or<varname>child_nodes_disconnect_min_count</varname> (see below). 665 </para> 666 <para> 667 The <varname>child_nodes_disconnect_command</varname> script can be 668 any user-defined script or program. It <emphasis>must</emphasis> be able 669 to be executed by the system user under which the PostgreSQL server itself 670 runs (usually <literal>postgres</literal>). 671 </para> 672 <note> 673 <para> 674 If <varname>child_nodes_disconnect_command</varname> is not set, no action 675 will be taken. 676 </para> 677 </note> 678 <para> 679 If specified, the following format placeholder will be substituted when 680 executing <varname>child_nodes_disconnect_command</varname>: 681 </para> 682 683 <variablelist> 684 <varlistentry> 685 <term><option>%p</option></term> 686 <listitem> 687 <para> 688 ID of the node executing the <varname>child_nodes_disconnect_command</varname> script. 689 </para> 690 </listitem> 691 </varlistentry> 692 </variablelist> 693 694 <para> 695 The <varname>child_nodes_disconnect_command</varname> script will only be executed once 696 while the criteria for its execution are met. If the criteria for its execution are no longer 697 met (i.e. some child nodes have reconnected), it will be executed again if 698 the criteria for its execution are met again. 699 </para> 700 <para> 701 The <varname>child_nodes_disconnect_command</varname> script will not be executed if 702 &repmgrd; is <link linkend="repmgrd-pausing">paused</link>. 703 </para> 704 705 </listitem> 706 </varlistentry> 707 708 <varlistentry> 709 <term><varname>child_nodes_disconnect_timeout</varname></term> 710 711 <listitem> 712 <indexterm> 713 <primary>child_nodes_disconnect_timeout</primary> 714 <secondary>child node disconnection monitoring</secondary> 715 </indexterm> 716 717 <para> 718 If &repmgrd; determines that an insufficient number of 719 child nodes are connected, it will wait for the specified number of seconds 720 to execute the <varname>child_nodes_disconnect_command</varname>. 721 </para> 722 <para> 723 Default: <literal>30</literal> seconds. 724 </para> 725 </listitem> 726 </varlistentry> 727 728 <varlistentry> 729 <term><varname>child_nodes_connected_min_count</varname></term> 730 <listitem> 731 <indexterm> 732 <primary>child_nodes_connected_min_count</primary> 733 <secondary>child node disconnection monitoring</secondary> 734 </indexterm> 735 736 <para> 737 If the number of child nodes connected falls below the number specified in 738 this parameter, the <varname>child_nodes_disconnect_command</varname> script 739 will be executed. 740 </para> 741 <para> 742 For example, if <varname>child_nodes_connected_min_count</varname> is set 743 to <literal>2</literal>, the <varname>child_nodes_disconnect_command</varname> 744 script will be executed if one or no child nodes are connected. 745 </para> 746 <para> 747 Note that <varname>child_nodes_connected_min_count</varname> overrides any value 748 set in <varname>child_nodes_disconnect_min_count</varname>. 749 </para> 750 <para> 751 If neither of <varname>child_nodes_connected_min_count</varname> or 752 <varname>child_nodes_disconnect_min_count</varname> are set, 753 the <varname>child_nodes_disconnect_command</varname> script 754 will be executed when no child nodes are connected. 755 </para> 756 <para> 757 A witness node, if in use, will not be counted as a child node unless 758 <varname>child_nodes_connected_include_witness</varname> is set to <literal>true</literal>. 759 </para> 760 </listitem> 761 </varlistentry> 762 763 764 <varlistentry> 765 <term><varname>child_nodes_disconnect_min_count</varname></term> 766 <listitem> 767 <indexterm> 768 <primary>child_nodes_disconnect_min_count</primary> 769 <secondary>child node disconnection monitoring</secondary> 770 </indexterm> 771 772 <para> 773 If the number of disconnected child nodes exceeds the number specified in 774 this parameter, the <varname>child_nodes_disconnect_command</varname> script 775 will be executed. 776 </para> 777 778 <para> 779 For example, if <varname>child_nodes_disconnect_min_count</varname> is set 780 to <literal>2</literal>, the <varname>child_nodes_disconnect_command</varname> 781 script will be executed if more than two child nodes are disconnected. 782 </para> 783 784 <para> 785 Note that any value set in <varname>child_nodes_disconnect_min_count</varname> 786 will be overriden by <varname>child_nodes_connected_min_count</varname>. 787 </para> 788 <para> 789 If neither of <varname>child_nodes_connected_min_count</varname> or 790 <varname>child_nodes_disconnect_min_count</varname> are set, 791 the <varname>child_nodes_disconnect_command</varname> script 792 will be executed when no child nodes are connected. 793 </para> 794 795 <para> 796 A witness node, if in use, will not be counted as a child node unless 797 <varname>child_nodes_connected_include_witness</varname> is set to <literal>true</literal>. 798 </para> 799 800 </listitem> 801 </varlistentry> 802 803 804 <varlistentry> 805 <term><varname>child_nodes_connected_include_witness</varname></term> 806 <listitem> 807 <indexterm> 808 <primary>child_nodes_connected_include_witness</primary> 809 <secondary>child node disconnection monitoring</secondary> 810 </indexterm> 811 812 <para> 813 Whether to count the witness node (if in use) as a child node when 814 determining whether to execute <varname>child_nodes_disconnect_command</varname>. 815 </para> 816 <para> 817 Default to <literal>false</literal>. 818 </para> 819 </listitem> 820 </varlistentry> 821 822 </variablelist> 823 824 </sect2> 825 826 <sect2 id="repmgrd-primary-child-disconnection-events"> 827 <title>Standby disconnections monitoring process event notifications</title> 828 <para> 829 The following <link linkend="event-notifications">event notifications</link> may be generated: 830 </para> 831 <variablelist> 832 833 <varlistentry> 834 <term><varname>child_node_disconnect</varname></term> 835 <listitem> 836 <indexterm> 837 <primary>child_node_disconnect</primary> 838 <secondary>event notification</secondary> 839 </indexterm> 840 841 <para> 842 This event is generated after &repmgrd; 843 detects that a child node is no longer streaming from the primary node. 844 </para> 845 <para> 846 Example: 847 <programlisting> 848$ repmgr cluster event --event=child_node_disconnect 849 Node ID | Name | Event | OK | Timestamp | Details 850---------+-------+-----------------------+----+---------------------+-------------------------------------------- 851 1 | node1 | child_node_disconnect | t | 2019-04-24 12:41:36 | node "node3" (ID: 3) has disconnected</programlisting> 852 </para> 853 </listitem> 854 </varlistentry> 855 856 <varlistentry> 857 <term><varname>child_node_reconnect</varname></term> 858 <listitem> 859 <indexterm> 860 <primary>child_node_reconnect</primary> 861 <secondary>event notification</secondary> 862 </indexterm> 863 864 <para> 865 This event is generated after &repmgrd; 866 detects that a child node has resumed streaming from the primary node. 867 </para> 868 <para> 869 Example: 870 <programlisting> 871$ repmgr cluster event --event=child_node_reconnect 872 Node ID | Name | Event | OK | Timestamp | Details 873---------+-------+----------------------+----+---------------------+------------------------------------------------------------ 874 1 | node1 | child_node_reconnect | t | 2019-04-24 12:42:19 | node "node3" (ID: 3) has reconnected after 42 seconds</programlisting> 875 </para> 876 </listitem> 877 </varlistentry> 878 879 <varlistentry> 880 <term><varname>child_node_new_connect</varname></term> 881 <listitem> 882 <indexterm> 883 <primary>child_node_new_connect</primary> 884 <secondary>event notification</secondary> 885 </indexterm> 886 887 <para> 888 This event is generated after &repmgrd; 889 detects that a new child node has been registered with &repmgr; and has 890 connected to the primary. 891 </para> 892 <para> 893 Example: 894 <programlisting> 895$ repmgr cluster event --event=child_node_new_connect 896 Node ID | Name | Event | OK | Timestamp | Details 897---------+-------+------------------------+----+---------------------+--------------------------------------------- 898 1 | node1 | child_node_new_connect | t | 2019-04-24 12:41:30 | new node "node3" (ID: 3) has connected</programlisting> 899 </para> 900 </listitem> 901 </varlistentry> 902 903 <varlistentry> 904 <term><varname>child_nodes_disconnect_command</varname></term> 905 <listitem> 906 <indexterm> 907 <primary>child_nodes_disconnect_command</primary> 908 <secondary>event notification</secondary> 909 </indexterm> 910 911 <para> 912 This event is generated after &repmgrd; detects 913 that sufficient child nodes have been disconnected for a sufficient amount 914 of time to trigger execution of the <varname>child_nodes_disconnect_command</varname>. 915 </para> 916 <para> 917 Example: 918 <programlisting> 919$ repmgr cluster event --event=child_nodes_disconnect_command 920 Node ID | Name | Event | OK | Timestamp | Details 921---------+-------+--------------------------------+----+---------------------+-------------------------------------------------------- 922 1 | node1 | child_nodes_disconnect_command | t | 2019-04-24 13:08:17 | "child_nodes_disconnect_command" successfully executed</programlisting> 923 </para> 924 </listitem> 925 </varlistentry> 926 927 </variablelist> 928 929 </sect2> 930 931 932</sect1> 933 934 935</chapter> 936