1<chapter id="repmgrd-configuration">
2
3  <title>repmgrd setup and configuration</title>
4
5  <indexterm>
6    <primary>repmgrd</primary>
7    <secondary>configuration</secondary>
8  </indexterm>
9
10  <para>
11    &repmgrd; is a daemon process which runs on each PostgreSQL node,
12    monitoring the local node, and (unless it's the primary node) the upstream server
13    (the primary server or with cascading replication, another standby) which it's
14    connected to.
15  </para>
16  <para>
17    &repmgrd; can be configured to provide failover
18    capability in case the primary or upstream node becomes unreachable, and/or
19    provide monitoring data to the &repmgr; metadatabase.
20  </para>
21  <para>
22    From &repmgr; 4.4, when running on the primary node, &repmgrd; can also monitor
23    standby disconnections/reconnections (see <xref linkend="repmgrd-primary-child-disconnection"/>).
24  </para>
25
26  <sect1 id="repmgrd-basic-configuration">
27    <title>repmgrd configuration</title>
28
29    <para>
30      To use &repmgrd;, its associated function library <emphasis>must</emphasis> be
31      included via <filename>postgresql.conf</filename> with:
32
33      <programlisting>
34        shared_preload_libraries = 'repmgr'</programlisting>
35    </para>
36    <para>
37      Changing this setting requires a restart of PostgreSQL; for more details see
38      the <ulink url="https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-SHARED-PRELOAD-LIBRARIES">PostgreSQL documentation</ulink>.
39    </para>
40
41    <para>
42      The following configuraton options apply to &repmgrd; in all circumstances:
43    </para>
44    <variablelist>
45
46      <varlistentry>
47        <term><option>monitor_interval_secs</option></term>
48        <listitem>
49          <indexterm>
50            <primary>monitor_interval_secs</primary>
51          </indexterm>
52
53          <para>
54            The interval (in seconds, default: <literal>2</literal>) to check the availability of the upstream node.
55          </para>
56        </listitem>
57
58      </varlistentry>
59
60      <varlistentry id="connection-check-type">
61
62        <term><option>connection_check_type</option></term>
63        <listitem>
64          <indexterm>
65            <primary>connection_check_type</primary>
66          </indexterm>
67
68          <para>
69            The option <option>connection_check_type</option> is used to select the method
70            &repmgrd; uses to determine whether the upstream node is available.
71          </para>
72          <para>
73            Possible values are:
74            <itemizedlist spacing="compact" mark="bullet">
75              <listitem>
76                  <simpara>
77                    <literal>ping</literal> (default) - uses <command>PQping()</command> to
78                    determine server availability
79                  </simpara>
80              </listitem>
81              <listitem>
82                <simpara>
83                  <literal>connection</literal> - determines server availability
84                  by attempting to make a new connection to the upstream node
85                </simpara>
86              </listitem>
87              <listitem>
88                <simpara>
89                  <literal>query</literal> - determines server availability
90                  by executing an SQL statement on the node via the existing connection
91                </simpara>
92              </listitem>
93
94            </itemizedlist>
95          </para>
96        </listitem>
97      </varlistentry>
98
99      <varlistentry>
100        <term><option>reconnect_attempts</option></term>
101        <listitem>
102          <indexterm>
103            <primary>reconnect_attempts</primary>
104          </indexterm>
105          <para>
106            The number of attempts (default: <literal>6</literal>) will be made to reconnect to an unreachable
107	        upstream node before initiating a failover.
108          </para>
109          <para>
110            There will be an interval of <option>reconnect_interval</option> seconds between each reconnection
111            attempt.
112          </para>
113        </listitem>
114      </varlistentry>
115
116      <varlistentry>
117        <term><option>reconnect_interval</option></term>
118
119        <listitem>
120          <indexterm>
121            <primary>reconnect_interval</primary>
122          </indexterm>
123
124          <para>
125            Interval (in seconds, default: <literal>10</literal>) between attempts to reconnect to an unreachable
126            upstream node.
127          </para>
128          <para>
129              The number of reconnection attempts is defined by the parameter <option>reconnect_attempts</option>.
130          </para>
131        </listitem>
132      </varlistentry>
133
134      <varlistentry>
135        <term><option>degraded_monitoring_timeout</option></term>
136        <listitem>
137          <indexterm>
138            <primary>degraded_monitoring_timeout</primary>
139          </indexterm>
140
141	      <para>
142            Interval (in seconds) after which &repmgrd; will terminate if
143            either of the servers (local node and or upstream node) being monitored is no longer available
144            (<link linkend="repmgrd-degraded-monitoring">degraded monitoring mode</link>).
145          </para>
146          <para>
147            <literal>-1</literal> (default) disables this timeout completely.
148          </para>
149	    </listitem>
150	  </varlistentry>
151
152    </variablelist>
153
154      <para>
155        See also <filename><ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</ulink></filename> for an annotated sample configuration file.
156      </para>
157
158    <sect2 id="repmgrd-automatic-failover-configuration">
159      <title>Required configuration for automatic failover</title>
160
161      <para>
162        The following &repmgrd; options <emphasis>must</emphasis> be set in
163        <filename>repmgr.conf</filename>:
164
165        <itemizedlist spacing="compact" mark="bullet">
166          <listitem>
167            <simpara><option>failover</option></simpara>
168          </listitem>
169          <listitem>
170            <simpara><option>promote_command</option></simpara>
171          </listitem>
172          <listitem>
173            <simpara><option>follow_command</option></simpara>
174          </listitem>
175        </itemizedlist>
176      </para>
177
178
179      <para>
180        Example:
181        <programlisting>
182          failover=automatic
183          promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
184          follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'</programlisting>
185      </para>
186      <para>
187        Details of each option are as follows:
188      </para>
189      <variablelist>
190        <varlistentry>
191
192          <term><option>failover</option></term>
193          <listitem>
194            <indexterm>
195              <primary>failover</primary>
196            </indexterm>
197
198            <para>
199              <option>failover</option> can be one of <literal>automatic</literal> or <literal>manual</literal>.
200            </para>
201            <note>
202              <para>
203                If <option>failover</option> is set to <literal>manual</literal>, &repmgrd;
204                will not take any action if a failover situation is detected, and the node may need to
205                be modified manually (e.g. by executing <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>).
206              </para>
207            </note>
208
209          </listitem>
210        </varlistentry>
211
212        <varlistentry>
213          <term><option>promote_command</option></term>
214
215          <listitem>
216            <indexterm>
217              <primary>promote_command</primary>
218            </indexterm>
219
220            <para>
221              The program or script defined in <option>promote_command</option> will be executed
222              in a failover situation when &repmgrd; determines that
223              the current node is to become the new primary node.
224            </para>
225            <para>
226              Normally <option>promote_command</option> is set as &repmgr;'s
227              <command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command> command.
228            </para>
229
230            <note>
231              <para>
232                When invoking <command>repmgr standby promote</command> (either directly via
233                the <option>promote_command</option>, or in a script called
234                via <option>promote_command</option>), <option>--siblings-follow</option>
235                <emphasis>must not</emphasis> be included as a
236                command line option for <command>repmgr standby promote</command>.
237              </para>
238            </note>
239
240            <para>
241              It is also possible to provide a shell script to e.g. perform user-defined tasks
242              before promoting the current node. In this case the script <emphasis>must</emphasis>
243              at some point execute <command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command>
244              to promote the node; if this is not done, &repmgr; metadata will not be updated and
245              &repmgr; will no longer function reliably.
246            </para>
247            <para>
248              Example:
249              <programlisting>
250                promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'</programlisting>
251            </para>
252
253            <para>
254              Note that the <literal>--log-to-file</literal> option will cause
255              output generated by the &repmgr; command, when executed by &repmgrd;,
256              to be logged to the same destination configured to receive log output for &repmgrd;.
257            </para>
258            <note>
259              <para>
260                &repmgr; will not apply <option>pg_bindir</option> when executing <option>promote_command</option>
261                or <option>follow_command</option>; these can be user-defined scripts so must always be
262                specified with the full path.
263              </para>
264            </note>
265          </listitem>
266        </varlistentry>
267
268        <varlistentry>
269          <term><option>follow_command</option></term>
270          <listitem>
271            <indexterm>
272              <primary>follow_command</primary>
273            </indexterm>
274
275            <para>
276              The program or script defined in <option>follow_command</option> will be executed
277              in a failover situation when &repmgrd; determines that
278              the current node is to follow the new primary node.
279            </para>
280            <para>
281              Normally <option>follow_command</option> is set as &repmgr;'s
282              <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command> command.
283            </para>
284            <para>
285              The <option>follow_command</option> parameter
286              should provide the <literal>--upstream-node-id=%n</literal>
287              option to <command>repmgr standby follow</command>; the <literal>%n</literal> will be replaced by
288              &repmgrd; with the ID of the new primary node. If this is not provided,
289              <command>repmgr standby follow</command> will attempt to determine the new primary by itself, but if the
290              original primary comes back online after the new primary is promoted, there is a risk that
291              <command>repmgr standby follow</command> will result in the node continuing to follow
292              the original primary.
293            </para>
294            <para>
295              It is also possible to provide a shell script to e.g. perform user-defined tasks
296              before promoting the current node. In this case the script <emphasis>must</emphasis>
297              at some point execute <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>
298              to promote the node; if this is not done, &repmgr; metadata will not be updated and
299              &repmgr; will no longer function reliably.
300            </para>
301            <para>
302              Example:
303              <programlisting>
304          follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'</programlisting>
305            </para>
306
307            <para>
308              Note that the <literal>--log-to-file</literal> option will cause
309              output generated by the &repmgr; command, when executed by &repmgrd;,
310              to be logged to the same destination configured to receive log output for &repmgrd;.
311            </para>
312            <note>
313              <para>
314                &repmgr; will not apply <option>pg_bindir</option> when executing <option>promote_command</option>
315                or <option>follow_command</option>; these can be user-defined scripts so must always be
316                specified with the full path.
317              </para>
318            </note>
319          </listitem>
320
321        </varlistentry>
322
323      </variablelist>
324
325
326    </sect2>
327
328    <sect2 id="repmgrd-automatic-failover-configuration-optional" xreflabel="Optional configuration for automatic failover">
329      <title>Optional configuration for automatic failover</title>
330
331      <para>
332        The following configuraton options can be use to fine-tune automatic failover:
333      </para>
334      <variablelist>
335
336        <varlistentry>
337          <term><option>priority</option></term>
338          <listitem>
339            <indexterm>
340              <primary>priority</primary>
341            </indexterm>
342
343            <para>
344              Indicates a preferred priority (default: <literal>100</literal>) for promoting nodes;
345			  a value of zero prevents the node being promoted to primary.
346            </para>
347            <para>
348              Note that the priority setting is only applied if two or more nodes are
349              determined as promotion candidates; in that case the node with the
350              higher priority is selected.
351            </para>
352          </listitem>
353        </varlistentry>
354
355        <varlistentry>
356          <term><option>failover_validation_command</option></term>
357          <listitem>
358            <indexterm>
359              <primary>failover_validation_command</primary>
360            </indexterm>
361
362            <para>
363              User-defined script to execute for an external mechanism to validate the failover
364	      decision made by &repmgrd;.
365            </para>
366            <note>
367              <para>
368                This option <emphasis>must</emphasis> be identically configured
369                on all nodes.
370              </para>
371            </note>
372            <para>
373              One or more of the following parameter placeholders
374			  may be provided, which will be replaced by repmgrd with the appropriate
375	          value:
376              <itemizedlist spacing="compact" mark="bullet">
377                <listitem>
378                  <simpara><literal>%n</literal>: node ID</simpara>
379                </listitem>
380                <listitem>
381                  <simpara><literal>%a</literal>: node name</simpara>
382                </listitem>
383                <listitem>
384                  <simpara><literal>%v</literal>: number of visible nodes</simpara>
385                </listitem>
386                <listitem>
387                  <simpara><literal>%u</literal>: number of shared upstream nodes</simpara>
388                </listitem>
389                <listitem>
390                  <simpara><literal>%t</literal>: total number of nodes</simpara>
391                </listitem>
392              </itemizedlist>
393            </para>
394            <para>
395              See also: <link linkend="repmgrd-failover-validation">Failover validation</link>.
396            </para>
397          </listitem>
398        </varlistentry>
399
400
401        <varlistentry>
402          <term><option>primary_visibility_consensus</option></term>
403
404          <listitem>
405            <indexterm>
406              <primary>primary_visibility_consensus</primary>
407            </indexterm>
408
409            <para>
410              If <literal>true</literal>, only continue with failover if no standbys
411              (or the witness server, if present) have seen the primary node recently.
412            </para>
413            <note>
414              <para>
415                This option <emphasis>must</emphasis> be identically configured
416                on all nodes.
417              </para>
418            </note>
419          </listitem>
420        </varlistentry>
421
422        <varlistentry>
423          <term><option>always_promote</option></term>
424
425          <listitem>
426            <indexterm>
427              <primary>always_promote</primary>
428            </indexterm>
429
430            <para>
431              Default: <literal>false</literal>.
432            </para>
433            <para>
434              If <literal>true</literal>, promote the local node even if its
435              &repmgr; metadata is not up-to-date.
436            </para>
437            <para>
438              Normally &repmgr; expects its metadata (stored in the <varname>repmgr.nodes</varname>
439              table) to be up-to-date so &repmgrd; can take the correct action during a failover.
440              However it's possible that updates made on the primary may not
441              have propagated to the standby (promotion candidate). In this case &repmgrd; will
442              default to not promoting the standby. This behaviour can be overridden by setting
443              <option>always_promote</option> to <literal>true</literal>.
444            </para>
445          </listitem>
446        </varlistentry>
447
448
449        <varlistentry>
450
451          <term><option>standby_disconnect_on_failover</option></term>
452          <listitem>
453            <indexterm>
454              <primary>standby_disconnect_on_failover</primary>
455            </indexterm>
456
457            <para>
458              In a failover situation, disconnect the local node's WAL receiver.
459            </para>
460            <para>
461              This option is available from PostgreSQL 9.5 and later.
462            </para>
463            <note>
464              <para>
465                This option <emphasis>must</emphasis> be identically configured
466                on all nodes.
467              </para>
468              <para>
469                Additionally the &repmgr; user <emphasis>must</emphasis> be a superuser
470                for this option.
471              </para>
472              <para>
473                &repmgrd; will refuse to start if this option is set
474                but either of these prerequisites is not met.
475              </para>
476            </note>
477
478            <para>
479              See also: <link linkend="repmgrd-standby-disconnection-on-failover">Standby disconnection on failover</link>.
480            </para>
481          </listitem>
482        </varlistentry>
483
484      </variablelist>
485
486      <para>
487        The following options can be used to further fine-tune failover behaviour.
488        In practice it's unlikely these will need to be changed from their default
489        values, but are available as configuration options should the need arise.
490      </para>
491      <variablelist>
492
493        <varlistentry>
494          <term><option>election_rerun_interval</option></term>
495          <listitem>
496            <indexterm>
497              <primary>election_rerun_interval</primary>
498            </indexterm>
499
500			<para>
501			  If <option>failover_validation_command</option> is set, and the command returns
502			  an error, pause the specified amount of seconds (default: 15) before rerunning the election.
503			</para>
504		  </listitem>
505		</varlistentry>
506
507
508        <varlistentry>
509          <term><option>sibling_nodes_disconnect_timeout</option></term>
510          <listitem>
511            <indexterm>
512              <primary>sibling_nodes_disconnect_timeout</primary>
513            </indexterm>
514
515			<para>
516              If <option>standby_disconnect_on_failover</option> is <literal>true</literal>, the
517              maximum length of time (in seconds, default: <literal>30</literal>)
518			  to wait for other standbys to confirm they have disconnected their
519		      WAL receivers.
520			</para>
521		  </listitem>
522		</varlistentry>
523      </variablelist>
524
525
526
527    </sect2>
528
529
530    <sect2 id="repmgrd-automatic-failover-configuration-pgbouncer-fencing">
531      <title>Configuring &repmgrd; and pgbouncer to fence a failed primary node</title>
532      <indexterm>
533        <primary>fencing</primary>
534        <secondary>using repmgrd and pgbouncer to fence a failed primary node</secondary>
535      </indexterm>
536      <indexterm>
537        <primary>PgBouncer</primary>
538        <secondary>using repmgrd and pgbouncer to fence a failed primary node</secondary>
539      </indexterm>
540      <para>
541        For further details and a reference implementation, see the separate document
542        <ulink url="https://github.com/2ndQuadrant/repmgr/blob/master/doc/repmgrd-node-fencing.md">Fencing a failed master node with repmgrd and PgBouncer</ulink>.
543      </para>
544    </sect2>
545
546    <sect2 id="postgresql-service-configuration">
547      <title>PostgreSQL service configuration</title>
548
549      <indexterm>
550        <primary>repmgrd</primary>
551        <secondary>PostgreSQL service configuration</secondary>
552      </indexterm>
553      <para>
554        If using automatic failover, currently &repmgrd; will need to execute
555        <link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>
556        to restart PostgreSQL on standbys to have them follow a new primary.
557      </para>
558      <para>
559        To ensure this happens smoothly, it's essential to provide the appropriate system/service restart
560        command appropriate to your operating system via <varname>service_restart_command</varname>
561        in <filename>repmgr.conf</filename>. If you don't do this, &repmgrd;
562        will default to using <command>pg_ctl</command>, which can result in unexpected problems,
563        particularly on <application>systemd</application>-based systems.
564      </para>
565      <para>
566        For more details, see <xref linkend="configuration-file-service-commands"/>.
567      </para>
568    </sect2>
569
570    <sect2 id="repmgrd-service-configuration">
571      <title>repmgrd service configuration</title>
572
573      <indexterm>
574        <primary>repmgrd</primary>
575        <secondary>repmgrd service configuration</secondary>
576      </indexterm>
577      <para>
578        If you are intending to use the <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link>
579        and <link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link>
580        commands, the following
581        parameters <emphasis>must</emphasis> be set in <filename>repmgr.conf</filename>:
582        <itemizedlist spacing="compact" mark="bullet">
583
584          <listitem>
585            <simpara><varname>repmgrd_service_start_command</varname></simpara>
586          </listitem>
587
588          <listitem>
589            <simpara><varname>repmgrd_service_stop_command</varname></simpara>
590          </listitem>
591
592        </itemizedlist>
593
594      </para>
595      <para>
596        Example (for &repmgr; with PostgreSQL 12 on CentOS 7):
597        <programlisting>
598repmgrd_service_start_command='sudo systemctl repmgr12 start'
599repmgrd_service_stop_command='sudo systemctl repmgr12 stop'
600</programlisting>
601      </para>
602      <para>
603        For more details see the reference page for each command.
604      </para>
605    </sect2>
606
607
608    <sect2 id="repmgrd-monitoring-configuration" xreflabel="repmgrd monitoring configuration">
609      <title>Monitoring configuration</title>
610
611      <indexterm>
612        <primary>repmgrd</primary>
613        <secondary>monitoring configuration</secondary>
614      </indexterm>
615      <para>
616        To enable monitoring, set:
617        <programlisting>
618          monitoring_history=yes</programlisting>
619        in <filename>repmgr.conf</filename>.
620      </para>
621      <para>
622        Monitoring data is written at the interval defined by
623        the option <option>monitor_interval_secs</option> (see above).
624      </para>
625      <para>
626        For more details on monitoring, see <xref linkend="repmgrd-monitoring"/>. For information on
627        monitoring standby disconnections, see <xref linkend="repmgrd-primary-child-disconnection"/>.
628      </para>
629    </sect2>
630
631    <sect2 id="repmgrd-reloading-configuration" xreflabel="reloading repmgrd configuration">
632      <title>Applying configuration changes to repmgrd</title>
633
634      <indexterm>
635        <primary>repmgrd</primary>
636        <secondary>applying configuration changes</secondary>
637      </indexterm>
638      <para>
639        To apply configuration file changes to a running &repmgrd;
640        daemon, execute the operating system's &repmgrd; service reload command
641        (see <xref linkend="appendix-packages"/> for examples),
642          or for instances  which were manually started, execute <command>kill -HUP</command>, e.g.
643          <command>kill -HUP `cat /tmp/repmgrd.pid`</command>.
644      </para>
645      <tip>
646        <para>
647          Check the &repmgrd; log to see what changes were
648          applied, or if any issues were encountered when reloading the configuration.
649        </para>
650      </tip>
651      <para>
652        Note that only the following subset of configuration file parameters can be changed on a
653        running &repmgrd; daemon:
654      </para>
655      <itemizedlist spacing="compact" mark="bullet">
656
657        <listitem>
658          <simpara>
659            <varname>async_query_timeout</varname>
660          </simpara>
661        </listitem>
662
663         <listitem>
664          <simpara>
665            <varname>child_nodes_check_interval</varname>
666          </simpara>
667        </listitem>
668
669        <listitem>
670          <simpara>
671            <varname>child_nodes_connected_include_witness</varname>
672          </simpara>
673        </listitem>
674
675        <listitem>
676          <simpara>
677            <varname>child_nodes_connected_min_count</varname>
678          </simpara>
679        </listitem>
680
681        <listitem>
682          <simpara>
683            <varname>child_nodes_disconnect_command</varname>
684          </simpara>
685        </listitem>
686
687        <listitem>
688          <simpara>
689            <varname>child_nodes_disconnect_min_count</varname>
690          </simpara>
691        </listitem>
692
693        <listitem>
694          <simpara>
695            <varname>child_nodes_disconnect_timeout</varname>
696          </simpara>
697        </listitem>
698
699        <listitem>
700          <simpara>
701            <varname>connection_check_type</varname>
702          </simpara>
703        </listitem>
704
705        <listitem>
706          <simpara>
707            <varname>conninfo</varname>
708          </simpara>
709        </listitem>
710
711        <listitem>
712          <simpara>
713            <varname>degraded_monitoring_timeout</varname>
714          </simpara>
715        </listitem>
716
717        <listitem>
718          <simpara>
719            <varname>event_notification_command</varname>
720          </simpara>
721        </listitem>
722
723        <listitem>
724          <simpara>
725            <varname>event_notifications</varname>
726          </simpara>
727        </listitem>
728
729        <listitem>
730          <simpara>
731            <varname>failover_validation_command</varname>
732          </simpara>
733        </listitem>
734
735        <listitem>
736          <simpara>
737            <varname>failover</varname>
738          </simpara>
739        </listitem>
740
741        <listitem>
742          <simpara>
743            <varname>follow_command</varname>
744          </simpara>
745        </listitem>
746
747        <listitem>
748          <simpara>
749            <varname>log_facility</varname>
750          </simpara>
751        </listitem>
752
753        <listitem>
754          <simpara>
755            <varname>log_file</varname>
756          </simpara>
757        </listitem>
758
759        <listitem>
760          <simpara>
761            <varname>log_level</varname>
762          </simpara>
763        </listitem>
764
765        <listitem>
766          <simpara>
767            <varname>log_status_interval</varname>
768          </simpara>
769        </listitem>
770
771        <listitem>
772          <simpara>
773            <varname>monitor_interval_secs</varname>
774          </simpara>
775        </listitem>
776
777        <listitem>
778          <simpara>
779            <varname>monitoring_history</varname>
780          </simpara>
781        </listitem>
782
783        <listitem>
784          <simpara>
785            <varname>primary_notification_timeout</varname>
786          </simpara>
787        </listitem>
788
789        <listitem>
790          <simpara>
791            <varname>primary_visibility_consensus</varname>
792          </simpara>
793        </listitem>
794
795        <listitem>
796          <simpara>
797            <varname>always_promote</varname>
798          </simpara>
799        </listitem>
800
801        <listitem>
802          <simpara>
803            <varname>promote_command</varname>
804          </simpara>
805        </listitem>
806
807        <listitem>
808          <simpara>
809            <varname>reconnect_attempts</varname>
810          </simpara>
811        </listitem>
812
813        <listitem>
814          <simpara>
815            <varname>reconnect_interval</varname>
816          </simpara>
817        </listitem>
818
819        <listitem>
820          <simpara>
821            <varname>retry_promote_interval_secs</varname>
822          </simpara>
823        </listitem>
824
825        <listitem>
826          <simpara>
827            <varname>repmgrd_standby_startup_timeout</varname>
828          </simpara>
829        </listitem>
830
831        <listitem>
832          <simpara>
833            <varname>sibling_nodes_disconnect_timeout</varname>
834          </simpara>
835        </listitem>
836
837        <listitem>
838          <simpara>
839            <varname>standby_disconnect_on_failover</varname>
840          </simpara>
841        </listitem>
842
843      </itemizedlist>
844
845      <para>
846        The following set of configuration file parameters must be updated via
847        <command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>,
848        as they require changes to the <literal>repmgr.nodes</literal> table so they are visible to
849        all nodes in the replication cluster:
850      </para>
851      <itemizedlist spacing="compact" mark="bullet">
852
853        <listitem>
854          <simpara>
855            <varname>node_id</varname>
856          </simpara>
857        </listitem>
858
859        <listitem>
860          <simpara>
861            <varname>node_name</varname>
862          </simpara>
863        </listitem>
864
865        <listitem>
866          <simpara>
867            <varname>data_directory</varname>
868          </simpara>
869        </listitem>
870
871        <listitem>
872          <simpara>
873            <varname>location</varname>
874          </simpara>
875        </listitem>
876
877
878        <listitem>
879          <simpara>
880            <varname>priority</varname>
881          </simpara>
882        </listitem>
883
884      </itemizedlist>
885
886      <note>
887        <para>
888          After executing <command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>,
889          &repmgrd; <emphasis>must</emphasis> be restarted for the changes to take effect.
890        </para>
891      </note>
892
893    </sect2>
894
895  </sect1>
896
897  <sect1 id="repmgrd-daemon" xreflabel="repmgrd daemon">
898    <title>repmgrd daemon</title>
899
900    <indexterm>
901      <primary>repmgrd</primary>
902      <secondary>starting and stopping</secondary>
903    </indexterm>
904    <para>
905      If installed from a package, the &repmgrd; can be started
906      via the operating system's service command, e.g. in <application>systemd</application>
907      using <command>systemctl</command>.
908    </para>
909    <para>
910      See appendix <xref linkend="appendix-packages"/> for details of service commands
911      for different distributions.
912    </para>
913    <para>
914      The commands <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> and
915      <link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link> can be used
916      as convenience wrappers to start and stop &repmgrd; on the local node.
917    </para>
918    <important>
919      <para>
920        <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> and
921        <link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link> require
922        that the appropriate start/stop commands are configured as
923        <varname>repmgrd_service_start_command</varname> and <varname>repmgrd_service_stop_command</varname>
924        in <filename>repmgr.conf</filename>.
925      </para>
926    </important>
927    <para>
928      &repmgrd; can be started manually like this:
929      <programlisting>
930        repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid</programlisting>
931      and stopped with <command>kill `cat /tmp/repmgrd.pid`</command>. Adjust paths as appropriate.
932    </para>
933
934    <sect2 id="repmgrd-pid-file" xreflabel="repmgrd's PID file">
935      <title>repmgrd's PID file</title>
936
937      <indexterm>
938        <primary>repmgrd</primary>
939        <secondary>PID file</secondary>
940      </indexterm>
941      <indexterm>
942        <primary>PID file</primary>
943        <secondary>repmgrd</secondary>
944      </indexterm>
945      <para>
946        &repmgrd; will generate a PID file by default.
947      </para>
948      <note>
949        <simpara>
950          This is a behaviour change from previous versions (earlier than 4.1), where
951          the PID file had to be explicitly specified with the command line
952          parameter <option>--pid-file</option>.
953        </simpara>
954      </note>
955      <para>
956        The PID file can be specified in <filename>repmgr.conf</filename> with the configuration
957        parameter <varname>repmgrd_pid_file</varname>.
958      </para>
959      <para>
960        It can also be specified on the command line (as in previous versions) with
961        the command line parameter <option>--pid-file</option>. Note this will override
962        any value set in <filename>repmgr.conf</filename> with <varname>repmgrd_pid_file</varname>.
963        <option>--pid-file</option> may be deprecated in future releases.
964      </para>
965      <para>
966        If a PID file location was specified by the package maintainer, &repmgrd;
967        will use that. This only applies if &repmgr; was installed from a package and the package
968        maintainer has specified the PID file location.
969      </para>
970      <para>
971        If none of the above apply, &repmgrd; will create a PID file
972        in the operating system's temporary directory (as setermined by the environment variable
973        <varname>TMPDIR</varname>, or if that is not set, will use <filename>/tmp</filename>).
974      </para>
975      <para>
976        To prevent a PID file being generated at all, provide the command line option
977        <option>--no-pid-file</option>.
978      </para>
979      <para>
980        To see which PID file &repmgrd; would use, execute &repmgrd;
981        with the option <option>--show-pid-file</option>. &repmgrd;
982        will not start if this option is provided. Note that the value shown is the
983        file  &repmgrd; would use next time it starts, and is
984        not necessarily the PID file currently in use.
985      </para>
986    </sect2>
987
988    <sect2 id="repmgrd-configuration-debian-ubuntu">
989      <title>repmgrd daemon configuration on Debian/Ubuntu</title>
990
991      <indexterm>
992        <primary>repmgrd</primary>
993        <secondary>Debian/Ubuntu and daemon configuration</secondary>
994      </indexterm>
995      <indexterm>
996        <primary>Debian/Ubuntu</primary>
997        <secondary>repmgrd daemon configuration</secondary>
998      </indexterm>
999
1000      <para>
1001        If &repmgr; was installed from Debian/Ubuntu packages, additional configuration
1002        is required before &repmgrd; is started as a daemon.
1003      </para>
1004      <para>
1005        This is done via the file <filename>/etc/default/repmgrd</filename>, which by default
1006        looks like this:
1007        <programlisting>
1008# default settings for repmgrd. This file is source by /bin/sh from
1009# /etc/init.d/repmgrd
1010
1011# disable repmgrd by default so it won't get started upon installation
1012# valid values: yes/no
1013REPMGRD_ENABLED=no
1014
1015# configuration file (required)
1016#REPMGRD_CONF="/path/to/repmgr.conf"
1017
1018# additional options
1019REPMGRD_OPTS="--daemonize=false"
1020
1021# user to run repmgrd as
1022#REPMGRD_USER=postgres
1023
1024# repmgrd binary
1025#REPMGRD_BIN=/usr/bin/repmgrd
1026
1027# pid file
1028#REPMGRD_PIDFILE=/var/run/repmgrd.pid</programlisting>
1029      </para>
1030      <para>
1031        Set <varname>REPMGRD_ENABLED</varname> to <literal>yes</literal>, and <varname>REPMGRD_CONF</varname>
1032        to the <filename>repmgr.conf</filename> file you are using.
1033      </para>
1034      <tip>
1035        <para>
1036          See <xref linkend="packages-debian-ubuntu"/> for details of the Debian/Ubuntu packages and
1037          typical file locations (including <filename>repmgr.conf</filename>).
1038        </para>
1039      </tip>
1040      <para>
1041        From &repmgrd; 4.1, ensure <varname>REPMGRD_OPTS</varname> includes
1042        <option>--daemonize=false</option>, as daemonization is handled by the service command.
1043      </para>
1044      <para>
1045        If using <application>systemd</application>, you may need to execute <command>systemctl daemon-reload</command>.
1046        Also, if you attempted to start &repmgrd; using <command>systemctl start repmgrd</command>,
1047        you'll need to execute <command>systemctl stop repmgrd</command>. Because that's how <application>systemd</application>
1048        rolls.
1049      </para>
1050
1051    </sect2>
1052  </sect1>
1053
1054  <sect1 id="repmgrd-connection-settings">
1055    <title>repmgrd connection settings</title>
1056 <para>
1057  In addition to the &repmgr; configuration settings, parameters in the
1058  <varname>conninfo</varname> string influence how &repmgr; makes a network connection to
1059  PostgreSQL. In particular, if another server in the replication cluster
1060  is unreachable at network level, system network settings will influence
1061  the length of time it takes to determine that the connection is not possible.
1062 </para>
1063 <para>
1064  In particular explicitly setting a parameter for <literal>connect_timeout</literal>
1065  should be considered; the effective minimum value of <literal>2</literal>
1066  (seconds) will ensure that a connection failure at network level is reported
1067  as soon as possible, otherwise depending on the system settings (e.g.
1068  <varname>tcp_syn_retries</varname> in Linux) a delay of a minute or more
1069  is possible.
1070 </para>
1071 <para>
1072  For further details on <varname>conninfo</varname> network connection
1073  parameters, see the
1074  <ulink url="https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS">PostgreSQL documentation</ulink>.
1075 </para>
1076 </sect1>
1077
1078
1079
1080  <sect1 id="repmgrd-log-rotation">
1081     <title>repmgrd log rotation</title>
1082
1083   <indexterm>
1084     <primary>log rotation</primary>
1085     <secondary>repmgrd</secondary>
1086   </indexterm>
1087
1088   <indexterm>
1089     <primary>repmgrd</primary>
1090     <secondary>log rotation</secondary>
1091   </indexterm>
1092
1093  <para>
1094   To ensure the current &repmgrd; logfile
1095   (specified in <filename>repmgr.conf</filename> with the parameter
1096   <option>log_file</option>) does not grow indefinitely, configure your
1097   system's <command>logrotate</command> to regularly rotate it.
1098  </para>
1099  <para>
1100   Sample configuration to rotate logfiles weekly with retention for
1101   up to 52 weeks and rotation forced if a file grows beyond 100Mb:
1102   <programlisting>
1103    /var/log/repmgr/repmgrd.log {
1104        missingok
1105        compress
1106        rotate 52
1107        maxsize 100M
1108        weekly
1109        create 0600 postgres postgres
1110        postrotate
1111            /usr/bin/killall -HUP repmgrd
1112        endscript
1113    }</programlisting>
1114  </para>
1115
1116 </sect1>
1117</chapter>
1118