1<chapter id="repmgrd-automatic-failover" xreflabel="Automatic failover with repmgrd">
2
3 <title>Automatic failover with repmgrd</title>
4
5 <indexterm>
6   <primary>repmgrd</primary>
7   <secondary>automatic failover</secondary>
8 </indexterm>
9
10 <para>
11  &repmgrd; is a management and monitoring daemon which runs
12  on each node in a replication cluster. It can automate actions such as
13  failover and updating standbys to follow the new primary, as well as
14  providing monitoring information about the state of each standby.
15 </para>
16
17 <sect1 id="repmgrd-witness-server" xreflabel="Using a witness server with repmgrd">
18   <title>Using a witness server</title>
19
20 <indexterm>
21   <primary>repmgrd</primary>
22   <secondary>witness server</secondary>
23 </indexterm>
24
25 <indexterm>
26   <primary>witness server</primary>
27   <secondary>repmgrd</secondary>
28 </indexterm>
29
30 <para>
31   A <xref linkend="witness-server"/> is a normal PostgreSQL instance which
32   is not part of the streaming replication cluster; its purpose is, if a
33   failover situation occurs, to provide proof that it is the primary server
34   itself which is unavailable, rather than e.g. a network split between
35   different physical locations.
36 </para>
37
38 <para>
39   A typical use case for a witness server is a two-node streaming replication
40   setup, where the primary and standby are in different locations (data centres).
41   By creating a witness server in the same location (data centre) as the primary,
42   if the primary becomes unavailable it's possible for the standby to decide whether
43   it can promote itself without risking a "split brain" scenario: if it can't see either the
44   witness or the primary server, it's likely there's a network-level interruption
45   and it should not promote itself. If it can see the witness but not the primary,
46   this proves there is no network interruption and the primary itself is unavailable,
47   and it can therefore promote itself (and ideally take action to fence the
48   former primary).
49 </para>
50 <note>
51   <para>
52     <emphasis>Never</emphasis> install a witness server on the same physical host
53     as another node in the replication cluster managed by &repmgr; - it's essential
54     the witness is not affected in any way by failure of another node.
55   </para>
56 </note>
57 <para>
58   For more complex replication scenarios, e.g. with multiple datacentres, it may
59   be preferable to use location-based failover, which ensures that only nodes
60   in the same location as the primary will ever be promotion candidates;
61   see <xref linkend="repmgrd-network-split"/> for more details.
62 </para>
63
64 <note>
65   <simpara>
66     A witness server will only be useful if &repmgrd;
67     is in use.
68   </simpara>
69 </note>
70
71 <sect2 id="creating-witness-server">
72   <title>Creating a witness server</title>
73 <para>
74   To create a witness server, set up a normal PostgreSQL instance on a server
75   in the same physical location as the cluster's primary server.
76 </para>
77 <para>
78   This instance should <emphasis>not</emphasis> be on the same physical host as the primary server,
79   as otherwise if the primary server fails due to hardware issues, the witness
80   server will be lost too.
81 </para>
82 <note>
83   <simpara>
84     &repmgr; 3.3 and earlier provided a <command>repmgr create witness</command>
85     command, which would automatically create a PostgreSQL instance. However
86     this often resulted in an unsatisfactory, hard-to-customise instance.
87   </simpara>
88 </note>
89 <para>
90   The witness server should be configured in the same way as a normal
91   &repmgr; node; see section <xref linkend="configuration"/>.
92 </para>
93 <para>
94   Register the witness server with <xref linkend="repmgr-witness-register"/>.
95   This will create the &repmgr; extension on the witness server, and make
96   a copy of the &repmgr; metadata.
97 </para>
98 <note>
99   <simpara>
100    As the witness server is not part of the replication cluster, further
101    changes to the &repmgr; metadata will be synchronised by
102    &repmgrd;.
103   </simpara>
104 </note>
105 <para>
106   Once the witness server has been configured, &repmgrd;
107   should be started.
108 </para>
109
110 <para>
111  To unregister a witness server, use <xref linkend="repmgr-witness-unregister"/>.
112 </para>
113
114 </sect2>
115
116</sect1>
117
118
119<sect1 id="repmgrd-network-split" xreflabel="Handling network splits with repmgrd">
120 <title>Handling network splits with repmgrd</title>
121 <indexterm>
122   <primary>repmgrd</primary>
123   <secondary>network splits</secondary>
124 </indexterm>
125
126 <indexterm>
127   <primary>network splits</primary>
128 </indexterm>
129
130 <para>
131  A common pattern for replication cluster setups is to spread servers over
132  more than one datacentre. This can provide benefits such as geographically-
133  distributed read replicas and DR (disaster recovery capability). However
134  this also means there is a risk of disconnection at network level between
135  datacentre locations, which would result in a split-brain scenario if
136  servers in a secondary data centre were no longer able to see the primary
137  in the main data centre and promoted a standby among themselves.
138 </para>
139 <para>
140  &repmgr; enables provision of &quot;<xref linkend="witness-server"/>&quot; to
141  artificially create a quorum of servers in a particular location, ensuring
142  that nodes in another location will not elect a new primary if they
143  are unable to see the majority of nodes. However this approach does not
144  scale well, particularly with more complex replication setups, e.g.
145  where the majority of nodes are located outside of the primary datacentre.
146  It also means the <literal>witness</literal> node needs to be managed as an
147  extra PostgreSQL instance outside of the main replication cluster, which
148  adds administrative and programming complexity.
149 </para>
150 <para>
151  <literal>repmgr4</literal> introduces the concept of <literal>location</literal>:
152  each node is associated with an arbitrary location string (default is
153  <literal>default</literal>); this is set in <filename>repmgr.conf</filename>, e.g.:
154  <programlisting>
155    node_id=1
156    node_name=node1
157    conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
158    data_directory='/var/lib/postgresql/data'
159    location='dc1'</programlisting>
160 </para>
161 <para>
162  In a failover situation, &repmgrd; will check if any servers in the
163  same location as the current primary node are visible.  If not, &repmgrd;
164  will assume a network interruption and not promote any node in any
165  other location (it will however enter <link linkend="repmgrd-degraded-monitoring">degraded monitoring</link>
166  mode until a primary becomes visible).
167 </para>
168
169</sect1>
170
171
172<sect1 id="repmgrd-primary-visibility-consensus" xreflabel="Primary visibility consensus">
173  <title>Primary visibility consensus</title>
174
175  <indexterm>
176   <primary>repmgrd</primary>
177   <secondary>primary visibility consensus</secondary>
178  </indexterm>
179
180  <indexterm>
181   <primary>primary_visibility_consensus</primary>
182  </indexterm>
183
184  <para>
185    In more complex replication setups, particularly where replication occurs between
186    multiple datacentres, it's possible that some but not all standbys get cut off from the
187    primary (but not from the other standbys).
188  </para>
189  <para>
190    In this situation, normally it's not desirable for any of the standbys which have been
191    cut off to initiate a failover, as the primary is still functioning and standbys are
192    connected. Beginning with <link linkend="release-4.4">&repmgr; 4.4</link>
193    it is now possible for the affected standbys to build a consensus about whether
194    the primary is still available to some standbys (&quot;primary visibility consensus&quot;).
195    This is done by polling each standby (and the witness, if present) for the time it last saw the
196    primary; if any have seen the primary very recently, it's reasonable
197    to infer that the primary is still available and a failover should not be started.
198  </para>
199
200  <para>
201    The time the primary was last seen by each node can be checked by executing
202    <link linkend="repmgr-service-status"><command>repmgr service status</command></link>
203    (&repmgr; 4.2 - 4.4: <link linkend="repmgr-service-status"><command>repmgr daemon status</command></link>)
204    which includes this in its output, e.g.:
205    <programlisting>$ repmgr -f /etc/repmgr.conf service status
206 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
207----+-------+---------+-----------+----------+---------+-------+---------+--------------------
208 1  | node1 | primary | * running |          | running | 27259 | no      | n/a
209 2  | node2 | standby |   running | node1    | running | 27272 | no      | 1 second(s) ago
210 3  | node3 | standby |   running | node1    | running | 27282 | no      | 0 second(s) ago
211 4  | node4 | witness | * running | node1    | running | 27298 | no      | 1 second(s) ago</programlisting>
212
213  </para>
214
215  <para>
216    To enable this functionality, in <filename>repmgr.conf</filename> set:
217    <programlisting>
218      primary_visibility_consensus=true</programlisting>
219  </para>
220  <note>
221    <para>
222      <option>primary_visibility_consensus</option> <emphasis>must</emphasis> be set to
223      <literal>true</literal> on all nodes for it to be effective.
224    </para>
225  </note>
226
227  <para>
228    The following sample &repmgrd; log output demonstrates the behaviour in a situation
229    where one of three standbys is no longer able to connect to the primary, but <emphasis>can</emphasis>
230    connect to the two other standbys (&quot;sibling nodes&quot;):
231    <programlisting>
232    [2019-05-17 05:36:12] [WARNING] unable to reconnect to node 1 after 3 attempts
233    [2019-05-17 05:36:12] [INFO] 2 active sibling nodes registered
234    [2019-05-17 05:36:12] [INFO] local node's last receive lsn: 0/7006E58
235    [2019-05-17 05:36:12] [INFO] checking state of sibling node "node3" (ID: 3)
236    [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago
237    [2019-05-17 05:36:12] [NOTICE] node 3 last saw primary node 1 second(s) ago, considering primary still visible
238    [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/7006E58
239    [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2)
240    [2019-05-17 05:36:12] [INFO] checking state of sibling node "node4" (ID: 4)
241    [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago
242    [2019-05-17 05:36:12] [NOTICE] node 4 last saw primary node 0 second(s) ago, considering primary still visible
243    [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node4" (ID: 4) is: 0/7006E58
244    [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) has same LSN as current candidate "node2" (ID: 2)
245    [2019-05-17 05:36:12] [INFO] 2 nodes can see the primary
246    [2019-05-17 05:36:12] [DETAIL] following nodes can see the primary:
247     - node "node3" (ID: 3): 1 second(s) ago
248     - node "node4" (ID: 4): 0 second(s) ago
249    [2019-05-17 05:36:12] [NOTICE] cancelling failover as some nodes can still see the primary
250    [2019-05-17 05:36:12] [NOTICE] election cancelled
251    [2019-05-17 05:36:14] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state</programlisting>
252    In this situation it will cancel the failover and enter degraded monitoring node,
253    waiting for the primary to reappear.
254  </para>
255</sect1>
256
257<sect1 id="repmgrd-standby-disconnection-on-failover" xreflabel="Standby disconnection on failover">
258  <title>Standby disconnection on failover</title>
259
260  <indexterm>
261   <primary>repmgrd</primary>
262   <secondary>standby disconnection on failover</secondary>
263 </indexterm>
264
265  <indexterm>
266    <primary>standby disconnection on failover</primary>
267  </indexterm>
268
269  <para>
270    If <option>standby_disconnect_on_failover</option> is set to <literal>true</literal> in
271    <filename>repmgr.conf</filename>, in a failover situation &repmgrd; will forcibly disconnect
272    the local node's WAL receiver, and wait for the WAL receiver on all sibling nodes to be
273	disconnected, before making a failover decision.
274  </para>
275  <note>
276    <para>
277      <option>standby_disconnect_on_failover</option> is available with PostgreSQL 9.5 and later.
278      Additionally this requires that the <literal>repmgr</literal> database user is a superuser.
279    </para>
280  </note>
281  <para>
282    By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes
283    are receiving data from the primary and their LSN location will be static.
284  </para>
285  <important>
286    <para>
287      <option>standby_disconnect_on_failover</option> <emphasis>must</emphasis> be set to the same value on
288      all nodes.
289    </para>
290  </important>
291  <para>
292    Note that when using <option>standby_disconnect_on_failover</option> there will be a delay of 5 seconds
293    plus however many seconds it takes to confirm the WAL receiver is disconnected before
294    &repmgrd; proceeds with the failover decision.
295  </para>
296  <para>
297	&repmgrd; will wait up to <option>sibling_nodes_disconnect_timeout</option> seconds (default:
298	<literal>30</literal>) to confirm that the WAL receiver on all sibling nodes hase been
299	disconnected before proceding with the failover operation. If the timeout is reached, the
300	failover operation will go ahead anyway.
301  </para>
302  <para>
303    Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver.
304  </para>
305  <para>
306    If using <option>standby_disconnect_on_failover</option>, we recommend that the
307    <option>primary_visibility_consensus</option> option is also used.
308  </para>
309
310</sect1>
311
312<sect1 id="repmgrd-failover-validation" xreflabel="Failover validation">
313  <title>Failover validation</title>
314
315  <indexterm>
316   <primary>repmgrd</primary>
317   <secondary>failover validation</secondary>
318 </indexterm>
319
320  <indexterm>
321    <primary>failover validation</primary>
322  </indexterm>
323
324  <para>
325    From <link linkend="release-4.3">repmgr 4.3</link>, &repmgr; makes it possible to provide a script
326    to &repmgrd; which, in a failover situation,
327    will be executed by the promotion candidate (the node which has been selected
328    to be the new primary) to confirm whether the node should actually be promoted.
329  </para>
330  <para>
331    To use this, <option>failover_validation_command</option> in <filename>repmgr.conf</filename>
332    to a script executable by the <literal>postgres</literal> system user, e.g.:
333    <programlisting>
334      failover_validation_command=/path/to/script.sh %n</programlisting>
335  </para>
336  <para>
337    The <literal>%n</literal> parameter will be replaced with the node ID when the script is
338    executed. A number of other parameters are also available, see section
339    &quot;<xref linkend="repmgrd-automatic-failover-configuration-optional"/>&quot; for details.
340  </para>
341  <para>
342    This script must return an exit code of <literal>0</literal> to indicate the node should promote itself.
343    Any other value will result in the promotion being aborted and the election rerun.
344    There is a pause of <option>election_rerun_interval</option> seconds before the election is rerun.
345  </para>
346  <para>
347    Sample &repmgrd; log file output during which the failover validation
348    script rejects the proposed promotion candidate:
349    <programlisting>
350[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
351[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2)
352[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command"
353[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2
354[2019-03-13 21:01:30] [INFO] output returned by failover validation command:
355Node ID: 2
356
357[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1"
358[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun
359[2019-03-13 21:01:30] [INFO] 1 followers to notify
360[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection
361INFO:  node 3 received notification to rerun promotion candidate election
362[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")</programlisting>
363  </para>
364
365
366</sect1>
367
368 <sect1 id="cascading-replication" xreflabel="Cascading replication">
369  <title>repmgrd and cascading replication</title>
370
371  <indexterm>
372   <primary>repmgrd</primary>
373   <secondary>cascading replication</secondary>
374  </indexterm>
375
376 <indexterm>
377   <primary>cascading replication</primary>
378   <secondary>repmgrd</secondary>
379 </indexterm>
380
381 <para>
382  Cascading replication - where a standby can connect to an upstream node and not
383  the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and
384  &repmgrd; support cascading replication by keeping track of the relationship
385  between standby servers - each node record is stored with the node id of its
386  upstream ("parent") server (except of course the primary server).
387 </para>
388 <para>
389  In a failover situation where the primary node fails and a top-level standby
390  is promoted, a standby connected to another standby will not be affected
391  and continue working as normal (even if the upstream standby it's connected
392  to becomes the primary node). If however the node's direct upstream fails,
393  the &quot;cascaded standby&quot; will attempt to reconnect to that node's parent
394  (unless <varname>failover</varname> is set to <literal>manual</literal> in
395  <filename>repmgr.conf</filename>).
396 </para>
397
398  </sect1>
399
400<sect1 id="repmgrd-primary-child-disconnection" xreflabel="Monitoring standby disconnections on the primary">
401  <title>Monitoring standby disconnections on the primary node</title>
402
403  <indexterm>
404   <primary>repmgrd</primary>
405   <secondary>standby disconnection</secondary>
406  </indexterm>
407
408  <indexterm>
409   <primary>repmgrd</primary>
410   <secondary>child node disconnection</secondary>
411  </indexterm>
412
413  <note>
414    <para>
415      This functionality is available in <link linkend="release-4.4">&repmgr; 4.4</link> and later.
416    </para>
417  </note>
418  <para>
419    When running on the primary node, &repmgrd; can
420    monitor connections and in particular disconnections by its attached
421    child nodes (standbys, and if in use, the witness server), and optionally
422    execute a custom command if certain criteria are met (such as the number of
423    attached nodes falling to zero following a failover to a new primary); this
424    command can be used for example to &quot;fence&quot; the node and ensure it
425    is isolated from any applications attempting to access the replication cluster.
426  </para>
427
428  <note>
429	<para>
430	  Currently &repmgrd; can only detect disconnections
431	  of streaming replication standbys and cannot determine whether a standby
432	  has disconnected and fallen back to archive recovery.
433	</para>
434	<para>
435	  See section <link linkend="repmgrd-primary-child-disconnection-caveats">caveats</link> below.
436	</para>
437  </note>
438
439  <sect2 id="repmgrd-primary-child-disconnection-monitoring-process">
440	<title>Standby disconnections monitoring process and criteria</title>
441	<para>
442	  &repmgrd; monitors attached child nodes and decides
443	  whether to invoke the user-defined command based on the following process
444	  and criteria:
445    <itemizedlist>
446
447      <listitem>
448        <para>
449          Every few seconds (defined by the configuration parameter <varname>child_nodes_check_interval</varname>;
450          default: <literal>5</literal> seconds, a value of <literal>0</literal> disables this altogether), &repmgrd; queries
451          the <literal>pg_stat_replication</literal> system view and compares
452          the nodes present there against the list of nodes registered with &repmgr; which
453          should be attached to the primary.
454        </para>
455        <para>
456          If a witness server is in use, &repmgrd; connects to it and checks which upstream node
457          it is following.
458        </para>
459      </listitem>
460
461      <listitem>
462        <para>
463          If a child node (standby) is no longer present in <literal>pg_stat_replication</literal>,
464          &repmgrd; notes the time it detected the node's absence, and additionally generates a
465          <literal>child_node_disconnect</literal> event.
466        </para>
467        <para>
468          If a witness server is in use, and it is no longer following the primary, or not
469          reachable at all, &repmgrd; notes the time it detected the node's absence, and additionally generates a
470          <literal>child_node_disconnect</literal> event.
471        </para>
472      </listitem>
473
474      <listitem>
475        <para>
476          If a child node (standby) which was absent from <literal>pg_stat_replication</literal> reappears,
477          &repmgrd; clears the time it detected the node's absence, and additionally generates a
478          <literal>child_node_reconnect</literal> event.
479        </para>
480        <para>
481          If a witness server is in use, which was previously not reachable or not following the
482          primary node, has become reachable and is following the primary node,  &repmgrd; clears the
483          time it detected the node's absence, and additionally generates a
484          <literal>child_node_reconnect</literal> event.
485        </para>
486      </listitem>
487
488      <listitem>
489        <para>
490          If an entirely new child node (standby or witness) is detected, &repmgrd; adds it to its internal list
491          and additionally generates a <literal>child_node_new_connect</literal> event.
492        </para>
493      </listitem>
494
495      <listitem>
496        <para>
497          If the <varname>child_nodes_disconnect_command</varname> parameter is set in
498          <filename>repmgr.conf</filename>, &repmgrd; will then loop through all child nodes.
499          If it determines that insufficient child nodes are connected, and a
500          minimum of <varname>child_nodes_disconnect_timeout</varname> seconds (default: <literal>30</literal>)
501          has elapsed since  the last node became disconnected, &repmgrd; will then execute the
502          <varname>child_nodes_disconnect_command</varname> script.
503        </para>
504        <para>
505          By default, the <varname>child_nodes_disconnect_command</varname> will only be executed
506          if all child nodes are disconnected. If <varname>child_nodes_connected_min_count</varname>
507          is set, the <varname>child_nodes_disconnect_command</varname> script will be triggered
508          if the number of connected child nodes falls below the specified value (e.g.
509          if set to <literal>2</literal>, the script will be triggered if only one child node
510          is connected). Alternatively, if <varname>child_nodes_disconnect_min_count</varname>
511          and more than that number of child nodes disconnects, the script will be triggered.
512        </para>
513        <note>
514          <para>
515            By default, a witness node, if in use, will <emphasis>not</emphasis> be counted as a
516            child node for the purposes of determining whether to execute
517            <varname>child_nodes_disconnect_command</varname>.
518          </para>
519          <para>
520            To enable the witness node to be counted as a child node, set
521            <varname>child_nodes_connected_include_witness</varname> in <filename>repmgr.conf</filename>
522            to <literal>true</literal>
523            (and <link linkend="repmgrd-reloading-configuration">reload the configuration</link> if &repmgrd;
524            is running).
525          </para>
526        </note>
527      </listitem>
528
529      <listitem>
530        <para>
531          Note that child nodes which are not attached when &repmgrd;
532          starts will <emphasis>not</emphasis> be considered as missing, as &repmgrd;
533          cannot know why they are not attached.
534        </para>
535      </listitem>
536
537    </itemizedlist>
538	</para>
539  </sect2>
540
541  <sect2 id="repmgrd-primary-child-disconnection-example">
542	<title>Standby disconnections monitoring process example</title>
543	<para>
544	  This example shows typical &repmgrd; log output from a three-node cluster
545	  (primary and two child nodes), with <varname>child_nodes_connected_min_count</varname>
546	  set to <literal>2</literal>.
547	</para>
548	<para>
549	  &repmgrd; on the primary has started up, while two child
550	  nodes are being provisioned:
551	  <programlisting>
552[2019-04-24 15:25:33] [INFO] monitoring primary node "node1" (ID: 1) in normal state
553[2019-04-24 15:25:35] [NOTICE] new node "node2" (ID: 2) has connected
554[2019-04-24 15:25:35] [NOTICE] 1 (of 1) child nodes are connected, but at least 2 child nodes required
555[2019-04-24 15:25:35] [INFO] no child nodes have detached since repmgrd startup
556(...)
557[2019-04-24 15:25:44] [NOTICE] new node "node3" (ID: 3) has connected
558[2019-04-24 15:25:46] [INFO] monitoring primary node "node1" (ID: 1) in normal state
559(...)</programlisting>
560	</para>
561	<para>
562	  One of the child nodes has disconnected; &repmgrd;
563	  is now waiting <varname>child_nodes_disconnect_timeout</varname> seconds
564	  before executing <varname>child_nodes_disconnect_command</varname>:
565	  <programlisting>
566[2019-04-24 15:28:11] [INFO] monitoring primary node "node1" (ID: 1) in normal state
567[2019-04-24 15:28:17] [INFO] monitoring primary node "node1" (ID: 1) in normal state
568[2019-04-24 15:28:19] [NOTICE] node "node3" (ID: 3) has disconnected
569[2019-04-24 15:28:19] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required
570[2019-04-24 15:28:19] [INFO] most recently detached child node was 3 (ca. 0 seconds ago), not triggering "child_nodes_disconnect_command"
571[2019-04-24 15:28:19] [DETAIL] "child_nodes_disconnect_timeout" set To 30 seconds
572(...)</programlisting>
573	</para>
574	<para>
575	  <varname>child_nodes_disconnect_command</varname> is executed once:
576	  <programlisting>
577[2019-04-24 15:28:49] [INFO] most recently detached child node was 3 (ca. 30 seconds ago), triggering "child_nodes_disconnect_command"
578[2019-04-24 15:28:49] [INFO] "child_nodes_disconnect_command" is:
579	"/usr/bin/fence-all-the-things.sh"
580[2019-04-24 15:28:51] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required
581[2019-04-24 15:28:51] [INFO] "child_nodes_disconnect_command" was previously executed, taking no action</programlisting>
582	</para>
583
584  </sect2>
585
586  <sect2 id="repmgrd-primary-child-disconnection-caveats">
587	<title>Standby disconnections monitoring caveats</title>
588	<para>
589	  The follwing caveats should be considered if you are intending to use this functionality.
590	</para>
591	<para>
592	  <itemizedlist mark="bullet">
593		<listitem>
594          <para>
595			If a child node is configured to use archive recovery, it's possible that
596			the child node will disconnect from the primary node and fall back to
597			archive recovery. In this case &repmgrd;
598			will nevertheless register a node disconnection.
599		  </para>
600		</listitem>
601
602        <listitem>
603          <para>
604			&repmgr; relies on <varname>application_name</varname> in the child node's
605			<varname>primary_conninfo</varname> string to be the same as the node name
606			defined in the node's <filename>repmgr.conf</filename> file. Furthermore,
607			this <varname>application_name</varname> must be unique across the replication
608			cluster.
609          </para>
610		  <para>
611			If a custom <varname>application_name</varname> is used, or the
612			<varname>application_name</varname> is not unique across the replication
613			cluster, &repmgr; will not be able to reliably monitor child node connections.
614		  </para>
615        </listitem>
616
617	  </itemizedlist>
618	</para>
619  </sect2>
620
621
622  <sect2 id="repmgrd-primary-child-disconnection-configuration">
623	<title>Standby disconnections monitoring process configuration</title>
624	<para>
625	  The following parameters, set in <filename>repmgr.conf</filename>,
626	  control how child node disconnection monitoring operates.
627	</para>
628	<variablelist>
629
630      <varlistentry>
631        <term><varname>child_nodes_check_interval</varname></term>
632        <listitem>
633          <indexterm>
634		    <primary>child_nodes_check_interval</primary>
635		    <secondary>child node disconnection monitoring</secondary>
636		  </indexterm>
637
638		  <para>
639			Interval (in seconds) after which &repmgrd; queries the
640			<literal>pg_stat_replication</literal> system view and compares the nodes present
641			there against the list of nodes registered with repmgr which should be attached to the primary.
642		  </para>
643		  <para>
644			Default is <literal>5</literal> seconds, a value of <literal>0</literal> disables this check
645			altogether.
646		  </para>
647		</listitem>
648	  </varlistentry>
649
650	  <varlistentry>
651        <term><varname>child_nodes_disconnect_command</varname></term>
652
653        <listitem>
654          <indexterm>
655		    <primary>child_nodes_disconnect_command</primary>
656		    <secondary>child node disconnection monitoring</secondary>
657		  </indexterm>
658
659		  <para>
660			User-definable script to be executed when &repmgrd;
661			determines that an insufficient number of child nodes are connected. By default
662			the script is executed when no child nodes are executed, but the execution
663			threshold can be modified by setting one of <varname>child_nodes_connected_min_count</varname>
664			or<varname>child_nodes_disconnect_min_count</varname> (see below).
665		  </para>
666		  <para>
667			The <varname>child_nodes_disconnect_command</varname> script can be
668			any user-defined script or program. It <emphasis>must</emphasis> be able
669			to be executed by the system user under which the PostgreSQL server itself
670			runs (usually <literal>postgres</literal>).
671		  </para>
672		  <note>
673			<para>
674			  If <varname>child_nodes_disconnect_command</varname> is not set, no action
675			  will be taken.
676			</para>
677		  </note>
678		  <para>
679			If specified, the following format placeholder will be substituted when
680			executing <varname>child_nodes_disconnect_command</varname>:
681		  </para>
682
683		   <variablelist>
684			 <varlistentry>
685			   <term><option>%p</option></term>
686			   <listitem>
687				 <para>
688				   ID of the node executing the <varname>child_nodes_disconnect_command</varname> script.
689				 </para>
690			   </listitem>
691			 </varlistentry>
692		   </variablelist>
693
694		  <para>
695			The <varname>child_nodes_disconnect_command</varname> script will only be executed once
696			while the criteria for its execution are met. If the criteria for its execution are no longer
697			met (i.e. some child nodes have reconnected), it will be executed again if
698			the criteria for its execution are met again.
699          </para>
700          <para>
701			The <varname>child_nodes_disconnect_command</varname> script will not be executed if
702			&repmgrd; is <link linkend="repmgrd-pausing">paused</link>.
703          </para>
704
705		</listitem>
706	  </varlistentry>
707
708	  <varlistentry>
709        <term><varname>child_nodes_disconnect_timeout</varname></term>
710
711        <listitem>
712          <indexterm>
713		    <primary>child_nodes_disconnect_timeout</primary>
714		    <secondary>child node disconnection monitoring</secondary>
715		  </indexterm>
716
717		  <para>
718			If &repmgrd; determines that an insufficient number of
719			child nodes are connected, it will wait for the specified number of seconds
720			to execute the <varname>child_nodes_disconnect_command</varname>.
721		  </para>
722		  <para>
723			Default: <literal>30</literal> seconds.
724		  </para>
725		</listitem>
726	  </varlistentry>
727
728      <varlistentry>
729        <term><varname>child_nodes_connected_min_count</varname></term>
730        <listitem>
731          <indexterm>
732		    <primary>child_nodes_connected_min_count</primary>
733		    <secondary>child node disconnection monitoring</secondary>
734		  </indexterm>
735
736		  <para>
737			If the number of child nodes connected falls below the number specified in
738			this parameter, the <varname>child_nodes_disconnect_command</varname> script
739			will be executed.
740		  </para>
741		  <para>
742			For example, if <varname>child_nodes_connected_min_count</varname> is set
743			to <literal>2</literal>, the <varname>child_nodes_disconnect_command</varname>
744			script will be executed if one or no child nodes are connected.
745		  </para>
746		  <para>
747			Note that <varname>child_nodes_connected_min_count</varname> overrides any value
748			set in <varname>child_nodes_disconnect_min_count</varname>.
749		  </para>
750		  <para>
751			If neither of <varname>child_nodes_connected_min_count</varname> or
752			<varname>child_nodes_disconnect_min_count</varname> are set,
753			the <varname>child_nodes_disconnect_command</varname> script
754			will be executed when no child nodes are connected.
755		  </para>
756          <para>
757            A witness node, if in use, will not be counted as a child node unless
758            <varname>child_nodes_connected_include_witness</varname> is set to <literal>true</literal>.
759          </para>
760		</listitem>
761	  </varlistentry>
762
763
764	  <varlistentry>
765        <term><varname>child_nodes_disconnect_min_count</varname></term>
766        <listitem>
767          <indexterm>
768		    <primary>child_nodes_disconnect_min_count</primary>
769		    <secondary>child node disconnection monitoring</secondary>
770		  </indexterm>
771
772		  <para>
773			If the number of disconnected child nodes exceeds the number specified in
774			this parameter, the <varname>child_nodes_disconnect_command</varname> script
775			will be executed.
776		  </para>
777
778		  <para>
779			For example, if <varname>child_nodes_disconnect_min_count</varname> is set
780			to <literal>2</literal>, the <varname>child_nodes_disconnect_command</varname>
781			script will be executed if more than two child nodes are disconnected.
782		  </para>
783
784		  <para>
785			Note that any value set in <varname>child_nodes_disconnect_min_count</varname>
786			will be overriden by <varname>child_nodes_connected_min_count</varname>.
787		  </para>
788		  <para>
789			If neither of <varname>child_nodes_connected_min_count</varname> or
790			<varname>child_nodes_disconnect_min_count</varname> are set,
791			the <varname>child_nodes_disconnect_command</varname> script
792			will be executed when no child nodes are connected.
793		  </para>
794
795          <para>
796            A witness node, if in use, will not be counted as a child node unless
797            <varname>child_nodes_connected_include_witness</varname> is set to <literal>true</literal>.
798          </para>
799
800		</listitem>
801	  </varlistentry>
802
803
804	  <varlistentry>
805        <term><varname>child_nodes_connected_include_witness</varname></term>
806        <listitem>
807          <indexterm>
808		    <primary>child_nodes_connected_include_witness</primary>
809		    <secondary>child node disconnection monitoring</secondary>
810		  </indexterm>
811
812		  <para>
813            Whether to count the witness node (if in use) as a child node when
814            determining whether to execute <varname>child_nodes_disconnect_command</varname>.
815          </para>
816          <para>
817            Default to <literal>false</literal>.
818          </para>
819        </listitem>
820      </varlistentry>
821
822	</variablelist>
823
824  </sect2>
825
826  <sect2 id="repmgrd-primary-child-disconnection-events">
827	<title>Standby disconnections monitoring process event notifications</title>
828	<para>
829	  The following <link linkend="event-notifications">event notifications</link> may be generated:
830	</para>
831	<variablelist>
832
833      <varlistentry>
834        <term><varname>child_node_disconnect</varname></term>
835        <listitem>
836          <indexterm>
837		    <primary>child_node_disconnect</primary>
838		    <secondary>event notification</secondary>
839		  </indexterm>
840
841          <para>
842			This event is generated after &repmgrd;
843			detects that a child node is no longer streaming from the primary node.
844          </para>
845		  <para>
846			Example:
847			<programlisting>
848$ repmgr cluster event --event=child_node_disconnect
849 Node ID | Name  | Event                 | OK | Timestamp           | Details
850---------+-------+-----------------------+----+---------------------+--------------------------------------------
851 1       | node1 | child_node_disconnect | t  | 2019-04-24 12:41:36 | node "node3" (ID: 3) has disconnected</programlisting>
852		  </para>
853        </listitem>
854      </varlistentry>
855
856	  <varlistentry>
857        <term><varname>child_node_reconnect</varname></term>
858        <listitem>
859          <indexterm>
860		    <primary>child_node_reconnect</primary>
861		    <secondary>event notification</secondary>
862		  </indexterm>
863
864          <para>
865			This event is generated after &repmgrd;
866			detects that a child node has resumed streaming from the primary node.
867          </para>
868		  <para>
869			Example:
870			<programlisting>
871$ repmgr cluster event --event=child_node_reconnect
872 Node ID | Name  | Event                | OK | Timestamp           | Details
873---------+-------+----------------------+----+---------------------+------------------------------------------------------------
874 1       | node1 | child_node_reconnect | t  | 2019-04-24 12:42:19 | node "node3" (ID: 3) has reconnected after 42 seconds</programlisting>
875		  </para>
876        </listitem>
877      </varlistentry>
878
879	  <varlistentry>
880        <term><varname>child_node_new_connect</varname></term>
881        <listitem>
882          <indexterm>
883		    <primary>child_node_new_connect</primary>
884		    <secondary>event notification</secondary>
885		  </indexterm>
886
887          <para>
888			This event is generated after &repmgrd;
889			detects that a new child node has been registered with &repmgr; and has
890			connected to the primary.
891          </para>
892		  <para>
893			Example:
894			<programlisting>
895$ repmgr cluster event --event=child_node_new_connect
896 Node ID | Name  | Event                  | OK | Timestamp           | Details
897---------+-------+------------------------+----+---------------------+---------------------------------------------
898 1       | node1 | child_node_new_connect | t  | 2019-04-24 12:41:30 | new node "node3" (ID: 3) has connected</programlisting>
899		  </para>
900        </listitem>
901      </varlistentry>
902
903      <varlistentry>
904        <term><varname>child_nodes_disconnect_command</varname></term>
905        <listitem>
906          <indexterm>
907		    <primary>child_nodes_disconnect_command</primary>
908		    <secondary>event notification</secondary>
909		  </indexterm>
910
911          <para>
912			This event is generated after &repmgrd; detects
913			that sufficient child nodes have been disconnected for a sufficient amount
914			of time to trigger execution of the <varname>child_nodes_disconnect_command</varname>.
915          </para>
916		  <para>
917			Example:
918			<programlisting>
919$ repmgr cluster event --event=child_nodes_disconnect_command
920 Node ID | Name  | Event                          | OK | Timestamp           | Details
921---------+-------+--------------------------------+----+---------------------+--------------------------------------------------------
922 1       | node1 | child_nodes_disconnect_command | t  | 2019-04-24 13:08:17 | "child_nodes_disconnect_command" successfully executed</programlisting>
923		  </para>
924        </listitem>
925      </varlistentry>
926
927	</variablelist>
928
929  </sect2>
930
931
932</sect1>
933
934
935</chapter>
936