1<chapter id="repmgrd-configuration"> 2 3 <title>repmgrd setup and configuration</title> 4 5 <indexterm> 6 <primary>repmgrd</primary> 7 <secondary>configuration</secondary> 8 </indexterm> 9 10 <para> 11 &repmgrd; is a daemon process which runs on each PostgreSQL node, 12 monitoring the local node, and (unless it's the primary node) the upstream server 13 (the primary server or with cascading replication, another standby) which it's 14 connected to. 15 </para> 16 <para> 17 &repmgrd; can be configured to provide failover 18 capability in case the primary or upstream node becomes unreachable, and/or 19 provide monitoring data to the &repmgr; metadatabase. 20 </para> 21 <para> 22 From &repmgr; 4.4, when running on the primary node, &repmgrd; can also monitor 23 standby disconnections/reconnections (see <xref linkend="repmgrd-primary-child-disconnection"/>). 24 </para> 25 26 <sect1 id="repmgrd-basic-configuration"> 27 <title>repmgrd configuration</title> 28 29 <para> 30 To use &repmgrd;, its associated function library <emphasis>must</emphasis> be 31 included via <filename>postgresql.conf</filename> with: 32 33 <programlisting> 34 shared_preload_libraries = 'repmgr'</programlisting> 35 </para> 36 <para> 37 Changing this setting requires a restart of PostgreSQL; for more details see 38 the <ulink url="https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-SHARED-PRELOAD-LIBRARIES">PostgreSQL documentation</ulink>. 39 </para> 40 41 <para> 42 The following configuraton options apply to &repmgrd; in all circumstances: 43 </para> 44 <variablelist> 45 46 <varlistentry> 47 <term><option>monitor_interval_secs</option></term> 48 <listitem> 49 <indexterm> 50 <primary>monitor_interval_secs</primary> 51 </indexterm> 52 53 <para> 54 The interval (in seconds, default: <literal>2</literal>) to check the availability of the upstream node. 55 </para> 56 </listitem> 57 58 </varlistentry> 59 60 <varlistentry id="connection-check-type"> 61 62 <term><option>connection_check_type</option></term> 63 <listitem> 64 <indexterm> 65 <primary>connection_check_type</primary> 66 </indexterm> 67 68 <para> 69 The option <option>connection_check_type</option> is used to select the method 70 &repmgrd; uses to determine whether the upstream node is available. 71 </para> 72 <para> 73 Possible values are: 74 <itemizedlist spacing="compact" mark="bullet"> 75 <listitem> 76 <simpara> 77 <literal>ping</literal> (default) - uses <command>PQping()</command> to 78 determine server availability 79 </simpara> 80 </listitem> 81 <listitem> 82 <simpara> 83 <literal>connection</literal> - determines server availability 84 by attempting to make a new connection to the upstream node 85 </simpara> 86 </listitem> 87 <listitem> 88 <simpara> 89 <literal>query</literal> - determines server availability 90 by executing an SQL statement on the node via the existing connection 91 </simpara> 92 </listitem> 93 94 </itemizedlist> 95 </para> 96 </listitem> 97 </varlistentry> 98 99 <varlistentry> 100 <term><option>reconnect_attempts</option></term> 101 <listitem> 102 <indexterm> 103 <primary>reconnect_attempts</primary> 104 </indexterm> 105 <para> 106 The number of attempts (default: <literal>6</literal>) will be made to reconnect to an unreachable 107 upstream node before initiating a failover. 108 </para> 109 <para> 110 There will be an interval of <option>reconnect_interval</option> seconds between each reconnection 111 attempt. 112 </para> 113 </listitem> 114 </varlistentry> 115 116 <varlistentry> 117 <term><option>reconnect_interval</option></term> 118 119 <listitem> 120 <indexterm> 121 <primary>reconnect_interval</primary> 122 </indexterm> 123 124 <para> 125 Interval (in seconds, default: <literal>10</literal>) between attempts to reconnect to an unreachable 126 upstream node. 127 </para> 128 <para> 129 The number of reconnection attempts is defined by the parameter <option>reconnect_attempts</option>. 130 </para> 131 </listitem> 132 </varlistentry> 133 134 <varlistentry> 135 <term><option>degraded_monitoring_timeout</option></term> 136 <listitem> 137 <indexterm> 138 <primary>degraded_monitoring_timeout</primary> 139 </indexterm> 140 141 <para> 142 Interval (in seconds) after which &repmgrd; will terminate if 143 either of the servers (local node and or upstream node) being monitored is no longer available 144 (<link linkend="repmgrd-degraded-monitoring">degraded monitoring mode</link>). 145 </para> 146 <para> 147 <literal>-1</literal> (default) disables this timeout completely. 148 </para> 149 </listitem> 150 </varlistentry> 151 152 </variablelist> 153 154 <para> 155 See also <filename><ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</ulink></filename> for an annotated sample configuration file. 156 </para> 157 158 <sect2 id="repmgrd-automatic-failover-configuration"> 159 <title>Required configuration for automatic failover</title> 160 161 <para> 162 The following &repmgrd; options <emphasis>must</emphasis> be set in 163 <filename>repmgr.conf</filename>: 164 165 <itemizedlist spacing="compact" mark="bullet"> 166 <listitem> 167 <simpara><option>failover</option></simpara> 168 </listitem> 169 <listitem> 170 <simpara><option>promote_command</option></simpara> 171 </listitem> 172 <listitem> 173 <simpara><option>follow_command</option></simpara> 174 </listitem> 175 </itemizedlist> 176 </para> 177 178 179 <para> 180 Example: 181 <programlisting> 182 failover=automatic 183 promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file' 184 follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'</programlisting> 185 </para> 186 <para> 187 Details of each option are as follows: 188 </para> 189 <variablelist> 190 <varlistentry> 191 192 <term><option>failover</option></term> 193 <listitem> 194 <indexterm> 195 <primary>failover</primary> 196 </indexterm> 197 198 <para> 199 <option>failover</option> can be one of <literal>automatic</literal> or <literal>manual</literal>. 200 </para> 201 <note> 202 <para> 203 If <option>failover</option> is set to <literal>manual</literal>, &repmgrd; 204 will not take any action if a failover situation is detected, and the node may need to 205 be modified manually (e.g. by executing <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>). 206 </para> 207 </note> 208 209 </listitem> 210 </varlistentry> 211 212 <varlistentry> 213 <term><option>promote_command</option></term> 214 215 <listitem> 216 <indexterm> 217 <primary>promote_command</primary> 218 </indexterm> 219 220 <para> 221 The program or script defined in <option>promote_command</option> will be executed 222 in a failover situation when &repmgrd; determines that 223 the current node is to become the new primary node. 224 </para> 225 <para> 226 Normally <option>promote_command</option> is set as &repmgr;'s 227 <command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command> command. 228 </para> 229 230 <note> 231 <para> 232 When invoking <command>repmgr standby promote</command> (either directly via 233 the <option>promote_command</option>, or in a script called 234 via <option>promote_command</option>), <option>--siblings-follow</option> 235 <emphasis>must not</emphasis> be included as a 236 command line option for <command>repmgr standby promote</command>. 237 </para> 238 </note> 239 240 <para> 241 It is also possible to provide a shell script to e.g. perform user-defined tasks 242 before promoting the current node. In this case the script <emphasis>must</emphasis> 243 at some point execute <command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command> 244 to promote the node; if this is not done, &repmgr; metadata will not be updated and 245 &repmgr; will no longer function reliably. 246 </para> 247 <para> 248 Example: 249 <programlisting> 250 promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'</programlisting> 251 </para> 252 253 <para> 254 Note that the <literal>--log-to-file</literal> option will cause 255 output generated by the &repmgr; command, when executed by &repmgrd;, 256 to be logged to the same destination configured to receive log output for &repmgrd;. 257 </para> 258 <note> 259 <para> 260 &repmgr; will not apply <option>pg_bindir</option> when executing <option>promote_command</option> 261 or <option>follow_command</option>; these can be user-defined scripts so must always be 262 specified with the full path. 263 </para> 264 </note> 265 </listitem> 266 </varlistentry> 267 268 <varlistentry> 269 <term><option>follow_command</option></term> 270 <listitem> 271 <indexterm> 272 <primary>follow_command</primary> 273 </indexterm> 274 275 <para> 276 The program or script defined in <option>follow_command</option> will be executed 277 in a failover situation when &repmgrd; determines that 278 the current node is to follow the new primary node. 279 </para> 280 <para> 281 Normally <option>follow_command</option> is set as &repmgr;'s 282 <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command> command. 283 </para> 284 <para> 285 The <option>follow_command</option> parameter 286 should provide the <literal>--upstream-node-id=%n</literal> 287 option to <command>repmgr standby follow</command>; the <literal>%n</literal> will be replaced by 288 &repmgrd; with the ID of the new primary node. If this is not provided, 289 <command>repmgr standby follow</command> will attempt to determine the new primary by itself, but if the 290 original primary comes back online after the new primary is promoted, there is a risk that 291 <command>repmgr standby follow</command> will result in the node continuing to follow 292 the original primary. 293 </para> 294 <para> 295 It is also possible to provide a shell script to e.g. perform user-defined tasks 296 before promoting the current node. In this case the script <emphasis>must</emphasis> 297 at some point execute <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command> 298 to promote the node; if this is not done, &repmgr; metadata will not be updated and 299 &repmgr; will no longer function reliably. 300 </para> 301 <para> 302 Example: 303 <programlisting> 304 follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'</programlisting> 305 </para> 306 307 <para> 308 Note that the <literal>--log-to-file</literal> option will cause 309 output generated by the &repmgr; command, when executed by &repmgrd;, 310 to be logged to the same destination configured to receive log output for &repmgrd;. 311 </para> 312 <note> 313 <para> 314 &repmgr; will not apply <option>pg_bindir</option> when executing <option>promote_command</option> 315 or <option>follow_command</option>; these can be user-defined scripts so must always be 316 specified with the full path. 317 </para> 318 </note> 319 </listitem> 320 321 </varlistentry> 322 323 </variablelist> 324 325 326 </sect2> 327 328 <sect2 id="repmgrd-automatic-failover-configuration-optional" xreflabel="Optional configuration for automatic failover"> 329 <title>Optional configuration for automatic failover</title> 330 331 <para> 332 The following configuraton options can be use to fine-tune automatic failover: 333 </para> 334 <variablelist> 335 336 <varlistentry> 337 <term><option>priority</option></term> 338 <listitem> 339 <indexterm> 340 <primary>priority</primary> 341 </indexterm> 342 343 <para> 344 Indicates a preferred priority (default: <literal>100</literal>) for promoting nodes; 345 a value of zero prevents the node being promoted to primary. 346 </para> 347 <para> 348 Note that the priority setting is only applied if two or more nodes are 349 determined as promotion candidates; in that case the node with the 350 higher priority is selected. 351 </para> 352 </listitem> 353 </varlistentry> 354 355 <varlistentry> 356 <term><option>failover_validation_command</option></term> 357 <listitem> 358 <indexterm> 359 <primary>failover_validation_command</primary> 360 </indexterm> 361 362 <para> 363 User-defined script to execute for an external mechanism to validate the failover 364 decision made by &repmgrd;. 365 </para> 366 <note> 367 <para> 368 This option <emphasis>must</emphasis> be identically configured 369 on all nodes. 370 </para> 371 </note> 372 <para> 373 One or more of the following parameter placeholders 374 may be provided, which will be replaced by repmgrd with the appropriate 375 value: 376 <itemizedlist spacing="compact" mark="bullet"> 377 <listitem> 378 <simpara><literal>%n</literal>: node ID</simpara> 379 </listitem> 380 <listitem> 381 <simpara><literal>%a</literal>: node name</simpara> 382 </listitem> 383 <listitem> 384 <simpara><literal>%v</literal>: number of visible nodes</simpara> 385 </listitem> 386 <listitem> 387 <simpara><literal>%u</literal>: number of shared upstream nodes</simpara> 388 </listitem> 389 <listitem> 390 <simpara><literal>%t</literal>: total number of nodes</simpara> 391 </listitem> 392 </itemizedlist> 393 </para> 394 <para> 395 See also: <link linkend="repmgrd-failover-validation">Failover validation</link>. 396 </para> 397 </listitem> 398 </varlistentry> 399 400 401 <varlistentry> 402 <term><option>primary_visibility_consensus</option></term> 403 404 <listitem> 405 <indexterm> 406 <primary>primary_visibility_consensus</primary> 407 </indexterm> 408 409 <para> 410 If <literal>true</literal>, only continue with failover if no standbys 411 (or the witness server, if present) have seen the primary node recently. 412 </para> 413 <note> 414 <para> 415 This option <emphasis>must</emphasis> be identically configured 416 on all nodes. 417 </para> 418 </note> 419 </listitem> 420 </varlistentry> 421 422 <varlistentry> 423 <term><option>always_promote</option></term> 424 425 <listitem> 426 <indexterm> 427 <primary>always_promote</primary> 428 </indexterm> 429 430 <para> 431 Default: <literal>false</literal>. 432 </para> 433 <para> 434 If <literal>true</literal>, promote the local node even if its 435 &repmgr; metadata is not up-to-date. 436 </para> 437 <para> 438 Normally &repmgr; expects its metadata (stored in the <varname>repmgr.nodes</varname> 439 table) to be up-to-date so &repmgrd; can take the correct action during a failover. 440 However it's possible that updates made on the primary may not 441 have propagated to the standby (promotion candidate). In this case &repmgrd; will 442 default to not promoting the standby. This behaviour can be overridden by setting 443 <option>always_promote</option> to <literal>true</literal>. 444 </para> 445 </listitem> 446 </varlistentry> 447 448 449 <varlistentry> 450 451 <term><option>standby_disconnect_on_failover</option></term> 452 <listitem> 453 <indexterm> 454 <primary>standby_disconnect_on_failover</primary> 455 </indexterm> 456 457 <para> 458 In a failover situation, disconnect the local node's WAL receiver. 459 </para> 460 <para> 461 This option is available from PostgreSQL 9.5 and later. 462 </para> 463 <note> 464 <para> 465 This option <emphasis>must</emphasis> be identically configured 466 on all nodes. 467 </para> 468 <para> 469 Additionally the &repmgr; user <emphasis>must</emphasis> be a superuser 470 for this option. 471 </para> 472 <para> 473 &repmgrd; will refuse to start if this option is set 474 but either of these prerequisites is not met. 475 </para> 476 </note> 477 478 <para> 479 See also: <link linkend="repmgrd-standby-disconnection-on-failover">Standby disconnection on failover</link>. 480 </para> 481 </listitem> 482 </varlistentry> 483 484 </variablelist> 485 486 <para> 487 The following options can be used to further fine-tune failover behaviour. 488 In practice it's unlikely these will need to be changed from their default 489 values, but are available as configuration options should the need arise. 490 </para> 491 <variablelist> 492 493 <varlistentry> 494 <term><option>election_rerun_interval</option></term> 495 <listitem> 496 <indexterm> 497 <primary>election_rerun_interval</primary> 498 </indexterm> 499 500 <para> 501 If <option>failover_validation_command</option> is set, and the command returns 502 an error, pause the specified amount of seconds (default: 15) before rerunning the election. 503 </para> 504 </listitem> 505 </varlistentry> 506 507 508 <varlistentry> 509 <term><option>sibling_nodes_disconnect_timeout</option></term> 510 <listitem> 511 <indexterm> 512 <primary>sibling_nodes_disconnect_timeout</primary> 513 </indexterm> 514 515 <para> 516 If <option>standby_disconnect_on_failover</option> is <literal>true</literal>, the 517 maximum length of time (in seconds, default: <literal>30</literal>) 518 to wait for other standbys to confirm they have disconnected their 519 WAL receivers. 520 </para> 521 </listitem> 522 </varlistentry> 523 </variablelist> 524 525 526 527 </sect2> 528 529 530 <sect2 id="repmgrd-automatic-failover-configuration-pgbouncer-fencing"> 531 <title>Configuring &repmgrd; and pgbouncer to fence a failed primary node</title> 532 <indexterm> 533 <primary>fencing</primary> 534 <secondary>using repmgrd and pgbouncer to fence a failed primary node</secondary> 535 </indexterm> 536 <indexterm> 537 <primary>PgBouncer</primary> 538 <secondary>using repmgrd and pgbouncer to fence a failed primary node</secondary> 539 </indexterm> 540 <para> 541 For further details and a reference implementation, see the separate document 542 <ulink url="https://github.com/2ndQuadrant/repmgr/blob/master/doc/repmgrd-node-fencing.md">Fencing a failed master node with repmgrd and PgBouncer</ulink>. 543 </para> 544 </sect2> 545 546 <sect2 id="postgresql-service-configuration"> 547 <title>PostgreSQL service configuration</title> 548 549 <indexterm> 550 <primary>repmgrd</primary> 551 <secondary>PostgreSQL service configuration</secondary> 552 </indexterm> 553 <para> 554 If using automatic failover, currently &repmgrd; will need to execute 555 <link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link> 556 to restart PostgreSQL on standbys to have them follow a new primary. 557 </para> 558 <para> 559 To ensure this happens smoothly, it's essential to provide the appropriate system/service restart 560 command appropriate to your operating system via <varname>service_restart_command</varname> 561 in <filename>repmgr.conf</filename>. If you don't do this, &repmgrd; 562 will default to using <command>pg_ctl</command>, which can result in unexpected problems, 563 particularly on <application>systemd</application>-based systems. 564 </para> 565 <para> 566 For more details, see <xref linkend="configuration-file-service-commands"/>. 567 </para> 568 </sect2> 569 570 <sect2 id="repmgrd-service-configuration"> 571 <title>repmgrd service configuration</title> 572 573 <indexterm> 574 <primary>repmgrd</primary> 575 <secondary>repmgrd service configuration</secondary> 576 </indexterm> 577 <para> 578 If you are intending to use the <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> 579 and <link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link> 580 commands, the following 581 parameters <emphasis>must</emphasis> be set in <filename>repmgr.conf</filename>: 582 <itemizedlist spacing="compact" mark="bullet"> 583 584 <listitem> 585 <simpara><varname>repmgrd_service_start_command</varname></simpara> 586 </listitem> 587 588 <listitem> 589 <simpara><varname>repmgrd_service_stop_command</varname></simpara> 590 </listitem> 591 592 </itemizedlist> 593 594 </para> 595 <para> 596 Example (for &repmgr; with PostgreSQL 12 on CentOS 7): 597 <programlisting> 598repmgrd_service_start_command='sudo systemctl repmgr12 start' 599repmgrd_service_stop_command='sudo systemctl repmgr12 stop' 600</programlisting> 601 </para> 602 <para> 603 For more details see the reference page for each command. 604 </para> 605 </sect2> 606 607 608 <sect2 id="repmgrd-monitoring-configuration" xreflabel="repmgrd monitoring configuration"> 609 <title>Monitoring configuration</title> 610 611 <indexterm> 612 <primary>repmgrd</primary> 613 <secondary>monitoring configuration</secondary> 614 </indexterm> 615 <para> 616 To enable monitoring, set: 617 <programlisting> 618 monitoring_history=yes</programlisting> 619 in <filename>repmgr.conf</filename>. 620 </para> 621 <para> 622 Monitoring data is written at the interval defined by 623 the option <option>monitor_interval_secs</option> (see above). 624 </para> 625 <para> 626 For more details on monitoring, see <xref linkend="repmgrd-monitoring"/>. For information on 627 monitoring standby disconnections, see <xref linkend="repmgrd-primary-child-disconnection"/>. 628 </para> 629 </sect2> 630 631 <sect2 id="repmgrd-reloading-configuration" xreflabel="reloading repmgrd configuration"> 632 <title>Applying configuration changes to repmgrd</title> 633 634 <indexterm> 635 <primary>repmgrd</primary> 636 <secondary>applying configuration changes</secondary> 637 </indexterm> 638 <para> 639 To apply configuration file changes to a running &repmgrd; 640 daemon, execute the operating system's &repmgrd; service reload command 641 (see <xref linkend="appendix-packages"/> for examples), 642 or for instances which were manually started, execute <command>kill -HUP</command>, e.g. 643 <command>kill -HUP `cat /tmp/repmgrd.pid`</command>. 644 </para> 645 <tip> 646 <para> 647 Check the &repmgrd; log to see what changes were 648 applied, or if any issues were encountered when reloading the configuration. 649 </para> 650 </tip> 651 <para> 652 Note that only the following subset of configuration file parameters can be changed on a 653 running &repmgrd; daemon: 654 </para> 655 <itemizedlist spacing="compact" mark="bullet"> 656 657 <listitem> 658 <simpara> 659 <varname>async_query_timeout</varname> 660 </simpara> 661 </listitem> 662 663 <listitem> 664 <simpara> 665 <varname>child_nodes_check_interval</varname> 666 </simpara> 667 </listitem> 668 669 <listitem> 670 <simpara> 671 <varname>child_nodes_connected_include_witness</varname> 672 </simpara> 673 </listitem> 674 675 <listitem> 676 <simpara> 677 <varname>child_nodes_connected_min_count</varname> 678 </simpara> 679 </listitem> 680 681 <listitem> 682 <simpara> 683 <varname>child_nodes_disconnect_command</varname> 684 </simpara> 685 </listitem> 686 687 <listitem> 688 <simpara> 689 <varname>child_nodes_disconnect_min_count</varname> 690 </simpara> 691 </listitem> 692 693 <listitem> 694 <simpara> 695 <varname>child_nodes_disconnect_timeout</varname> 696 </simpara> 697 </listitem> 698 699 <listitem> 700 <simpara> 701 <varname>connection_check_type</varname> 702 </simpara> 703 </listitem> 704 705 <listitem> 706 <simpara> 707 <varname>conninfo</varname> 708 </simpara> 709 </listitem> 710 711 <listitem> 712 <simpara> 713 <varname>degraded_monitoring_timeout</varname> 714 </simpara> 715 </listitem> 716 717 <listitem> 718 <simpara> 719 <varname>event_notification_command</varname> 720 </simpara> 721 </listitem> 722 723 <listitem> 724 <simpara> 725 <varname>event_notifications</varname> 726 </simpara> 727 </listitem> 728 729 <listitem> 730 <simpara> 731 <varname>failover_validation_command</varname> 732 </simpara> 733 </listitem> 734 735 <listitem> 736 <simpara> 737 <varname>failover</varname> 738 </simpara> 739 </listitem> 740 741 <listitem> 742 <simpara> 743 <varname>follow_command</varname> 744 </simpara> 745 </listitem> 746 747 <listitem> 748 <simpara> 749 <varname>log_facility</varname> 750 </simpara> 751 </listitem> 752 753 <listitem> 754 <simpara> 755 <varname>log_file</varname> 756 </simpara> 757 </listitem> 758 759 <listitem> 760 <simpara> 761 <varname>log_level</varname> 762 </simpara> 763 </listitem> 764 765 <listitem> 766 <simpara> 767 <varname>log_status_interval</varname> 768 </simpara> 769 </listitem> 770 771 <listitem> 772 <simpara> 773 <varname>monitor_interval_secs</varname> 774 </simpara> 775 </listitem> 776 777 <listitem> 778 <simpara> 779 <varname>monitoring_history</varname> 780 </simpara> 781 </listitem> 782 783 <listitem> 784 <simpara> 785 <varname>primary_notification_timeout</varname> 786 </simpara> 787 </listitem> 788 789 <listitem> 790 <simpara> 791 <varname>primary_visibility_consensus</varname> 792 </simpara> 793 </listitem> 794 795 <listitem> 796 <simpara> 797 <varname>always_promote</varname> 798 </simpara> 799 </listitem> 800 801 <listitem> 802 <simpara> 803 <varname>promote_command</varname> 804 </simpara> 805 </listitem> 806 807 <listitem> 808 <simpara> 809 <varname>reconnect_attempts</varname> 810 </simpara> 811 </listitem> 812 813 <listitem> 814 <simpara> 815 <varname>reconnect_interval</varname> 816 </simpara> 817 </listitem> 818 819 <listitem> 820 <simpara> 821 <varname>retry_promote_interval_secs</varname> 822 </simpara> 823 </listitem> 824 825 <listitem> 826 <simpara> 827 <varname>repmgrd_standby_startup_timeout</varname> 828 </simpara> 829 </listitem> 830 831 <listitem> 832 <simpara> 833 <varname>sibling_nodes_disconnect_timeout</varname> 834 </simpara> 835 </listitem> 836 837 <listitem> 838 <simpara> 839 <varname>standby_disconnect_on_failover</varname> 840 </simpara> 841 </listitem> 842 843 </itemizedlist> 844 845 <para> 846 The following set of configuration file parameters must be updated via 847 <command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>, 848 as they require changes to the <literal>repmgr.nodes</literal> table so they are visible to 849 all nodes in the replication cluster: 850 </para> 851 <itemizedlist spacing="compact" mark="bullet"> 852 853 <listitem> 854 <simpara> 855 <varname>node_id</varname> 856 </simpara> 857 </listitem> 858 859 <listitem> 860 <simpara> 861 <varname>node_name</varname> 862 </simpara> 863 </listitem> 864 865 <listitem> 866 <simpara> 867 <varname>data_directory</varname> 868 </simpara> 869 </listitem> 870 871 <listitem> 872 <simpara> 873 <varname>location</varname> 874 </simpara> 875 </listitem> 876 877 878 <listitem> 879 <simpara> 880 <varname>priority</varname> 881 </simpara> 882 </listitem> 883 884 </itemizedlist> 885 886 <note> 887 <para> 888 After executing <command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>, 889 &repmgrd; <emphasis>must</emphasis> be restarted for the changes to take effect. 890 </para> 891 </note> 892 893 </sect2> 894 895 </sect1> 896 897 <sect1 id="repmgrd-daemon" xreflabel="repmgrd daemon"> 898 <title>repmgrd daemon</title> 899 900 <indexterm> 901 <primary>repmgrd</primary> 902 <secondary>starting and stopping</secondary> 903 </indexterm> 904 <para> 905 If installed from a package, the &repmgrd; can be started 906 via the operating system's service command, e.g. in <application>systemd</application> 907 using <command>systemctl</command>. 908 </para> 909 <para> 910 See appendix <xref linkend="appendix-packages"/> for details of service commands 911 for different distributions. 912 </para> 913 <para> 914 The commands <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> and 915 <link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link> can be used 916 as convenience wrappers to start and stop &repmgrd; on the local node. 917 </para> 918 <important> 919 <para> 920 <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> and 921 <link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link> require 922 that the appropriate start/stop commands are configured as 923 <varname>repmgrd_service_start_command</varname> and <varname>repmgrd_service_stop_command</varname> 924 in <filename>repmgr.conf</filename>. 925 </para> 926 </important> 927 <para> 928 &repmgrd; can be started manually like this: 929 <programlisting> 930 repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid</programlisting> 931 and stopped with <command>kill `cat /tmp/repmgrd.pid`</command>. Adjust paths as appropriate. 932 </para> 933 934 <sect2 id="repmgrd-pid-file" xreflabel="repmgrd's PID file"> 935 <title>repmgrd's PID file</title> 936 937 <indexterm> 938 <primary>repmgrd</primary> 939 <secondary>PID file</secondary> 940 </indexterm> 941 <indexterm> 942 <primary>PID file</primary> 943 <secondary>repmgrd</secondary> 944 </indexterm> 945 <para> 946 &repmgrd; will generate a PID file by default. 947 </para> 948 <note> 949 <simpara> 950 This is a behaviour change from previous versions (earlier than 4.1), where 951 the PID file had to be explicitly specified with the command line 952 parameter <option>--pid-file</option>. 953 </simpara> 954 </note> 955 <para> 956 The PID file can be specified in <filename>repmgr.conf</filename> with the configuration 957 parameter <varname>repmgrd_pid_file</varname>. 958 </para> 959 <para> 960 It can also be specified on the command line (as in previous versions) with 961 the command line parameter <option>--pid-file</option>. Note this will override 962 any value set in <filename>repmgr.conf</filename> with <varname>repmgrd_pid_file</varname>. 963 <option>--pid-file</option> may be deprecated in future releases. 964 </para> 965 <para> 966 If a PID file location was specified by the package maintainer, &repmgrd; 967 will use that. This only applies if &repmgr; was installed from a package and the package 968 maintainer has specified the PID file location. 969 </para> 970 <para> 971 If none of the above apply, &repmgrd; will create a PID file 972 in the operating system's temporary directory (as setermined by the environment variable 973 <varname>TMPDIR</varname>, or if that is not set, will use <filename>/tmp</filename>). 974 </para> 975 <para> 976 To prevent a PID file being generated at all, provide the command line option 977 <option>--no-pid-file</option>. 978 </para> 979 <para> 980 To see which PID file &repmgrd; would use, execute &repmgrd; 981 with the option <option>--show-pid-file</option>. &repmgrd; 982 will not start if this option is provided. Note that the value shown is the 983 file &repmgrd; would use next time it starts, and is 984 not necessarily the PID file currently in use. 985 </para> 986 </sect2> 987 988 <sect2 id="repmgrd-configuration-debian-ubuntu"> 989 <title>repmgrd daemon configuration on Debian/Ubuntu</title> 990 991 <indexterm> 992 <primary>repmgrd</primary> 993 <secondary>Debian/Ubuntu and daemon configuration</secondary> 994 </indexterm> 995 <indexterm> 996 <primary>Debian/Ubuntu</primary> 997 <secondary>repmgrd daemon configuration</secondary> 998 </indexterm> 999 1000 <para> 1001 If &repmgr; was installed from Debian/Ubuntu packages, additional configuration 1002 is required before &repmgrd; is started as a daemon. 1003 </para> 1004 <para> 1005 This is done via the file <filename>/etc/default/repmgrd</filename>, which by default 1006 looks like this: 1007 <programlisting> 1008# default settings for repmgrd. This file is source by /bin/sh from 1009# /etc/init.d/repmgrd 1010 1011# disable repmgrd by default so it won't get started upon installation 1012# valid values: yes/no 1013REPMGRD_ENABLED=no 1014 1015# configuration file (required) 1016#REPMGRD_CONF="/path/to/repmgr.conf" 1017 1018# additional options 1019REPMGRD_OPTS="--daemonize=false" 1020 1021# user to run repmgrd as 1022#REPMGRD_USER=postgres 1023 1024# repmgrd binary 1025#REPMGRD_BIN=/usr/bin/repmgrd 1026 1027# pid file 1028#REPMGRD_PIDFILE=/var/run/repmgrd.pid</programlisting> 1029 </para> 1030 <para> 1031 Set <varname>REPMGRD_ENABLED</varname> to <literal>yes</literal>, and <varname>REPMGRD_CONF</varname> 1032 to the <filename>repmgr.conf</filename> file you are using. 1033 </para> 1034 <tip> 1035 <para> 1036 See <xref linkend="packages-debian-ubuntu"/> for details of the Debian/Ubuntu packages and 1037 typical file locations (including <filename>repmgr.conf</filename>). 1038 </para> 1039 </tip> 1040 <para> 1041 From &repmgrd; 4.1, ensure <varname>REPMGRD_OPTS</varname> includes 1042 <option>--daemonize=false</option>, as daemonization is handled by the service command. 1043 </para> 1044 <para> 1045 If using <application>systemd</application>, you may need to execute <command>systemctl daemon-reload</command>. 1046 Also, if you attempted to start &repmgrd; using <command>systemctl start repmgrd</command>, 1047 you'll need to execute <command>systemctl stop repmgrd</command>. Because that's how <application>systemd</application> 1048 rolls. 1049 </para> 1050 1051 </sect2> 1052 </sect1> 1053 1054 <sect1 id="repmgrd-connection-settings"> 1055 <title>repmgrd connection settings</title> 1056 <para> 1057 In addition to the &repmgr; configuration settings, parameters in the 1058 <varname>conninfo</varname> string influence how &repmgr; makes a network connection to 1059 PostgreSQL. In particular, if another server in the replication cluster 1060 is unreachable at network level, system network settings will influence 1061 the length of time it takes to determine that the connection is not possible. 1062 </para> 1063 <para> 1064 In particular explicitly setting a parameter for <literal>connect_timeout</literal> 1065 should be considered; the effective minimum value of <literal>2</literal> 1066 (seconds) will ensure that a connection failure at network level is reported 1067 as soon as possible, otherwise depending on the system settings (e.g. 1068 <varname>tcp_syn_retries</varname> in Linux) a delay of a minute or more 1069 is possible. 1070 </para> 1071 <para> 1072 For further details on <varname>conninfo</varname> network connection 1073 parameters, see the 1074 <ulink url="https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS">PostgreSQL documentation</ulink>. 1075 </para> 1076 </sect1> 1077 1078 1079 1080 <sect1 id="repmgrd-log-rotation"> 1081 <title>repmgrd log rotation</title> 1082 1083 <indexterm> 1084 <primary>log rotation</primary> 1085 <secondary>repmgrd</secondary> 1086 </indexterm> 1087 1088 <indexterm> 1089 <primary>repmgrd</primary> 1090 <secondary>log rotation</secondary> 1091 </indexterm> 1092 1093 <para> 1094 To ensure the current &repmgrd; logfile 1095 (specified in <filename>repmgr.conf</filename> with the parameter 1096 <option>log_file</option>) does not grow indefinitely, configure your 1097 system's <command>logrotate</command> to regularly rotate it. 1098 </para> 1099 <para> 1100 Sample configuration to rotate logfiles weekly with retention for 1101 up to 52 weeks and rotation forced if a file grows beyond 100Mb: 1102 <programlisting> 1103 /var/log/repmgr/repmgrd.log { 1104 missingok 1105 compress 1106 rotate 52 1107 maxsize 100M 1108 weekly 1109 create 0600 postgres postgres 1110 postrotate 1111 /usr/bin/killall -HUP repmgrd 1112 endscript 1113 }</programlisting> 1114 </para> 1115 1116 </sect1> 1117</chapter> 1118