1Fencing a failed master node with repmgrd and pgbouncer
2=======================================================
3
4With automatic failover, it's essential to ensure that a failed primary
5remains inaccessible to your application, even if it comes back online
6again, to avoid a split-brain situation.
7
8By using `pgbouncer` together with `repmgrd`, it's possible to combine
9automatic failover with a process to isolate the failed primary from
10your application and ensure that all connections which should go to
11the primary are directed there smoothly without having to reconfigure
12your application. (Note that as a connection pooler, `pgbouncer` can
13benefit your application in other ways, but those are beyond the scope
14of this document).
15
16* * *
17
18> *WARNING*: automatic failover is tricky to get right. This document
19> demonstrates one possible implementation method, however you should
20> carefully configure and test any setup to suit the needs of your own
21> replication cluster/application.
22
23* * *
24
25In a failover situation, `repmgrd` promotes a standby to primary by executing
26the command defined in `promote_command`. Normally this would be something like:
27
28    repmgr standby promote -f /etc/repmgr.conf
29
30By wrapping this in a custom script which adjusts the `pgbouncer` configuration
31on all nodes, it's possible to fence the failed primary and redirect write
32connections to the new primary.
33
34The script consists of two sections:
35
36* the promotion command itself
37* commands to reconfigure `pgbouncer` on all nodes
38
39Note that it requires password-less SSH access from the `repmgr` nodes
40to all the `pgbouncer` nodes to be able to update the `pgbouncer`
41configuration files.
42
43For the purposes of this demonstration, we'll assume there are 3 nodes (primary
44and two standbys), with `pgbouncer` listening on port 6432 handling connections
45to a database called `appdb`.  The `postgres` system user must have write
46access to the `pgbouncer` configuration files on all nodes. We'll assume
47there's a main `pgbouncer` configuration file, `/etc/pgbouncer.ini`, which uses
48the `%include` directive (available from PgBouncer 1.6) to include a separate
49configuration file, `/etc/pgbouncer.database.ini`, which will be modified by
50`repmgr`.
51
52* * *
53
54> *NOTE*: in this self-contained demonstration, `pgbouncer` is running on the
55> database servers, however in a production environment it will make more
56> sense to run `pgbouncer` on either separate nodes or the application server.
57
58* * *
59
60`/etc/pgbouncer.ini` should look something like this:
61
62    [pgbouncer]
63
64    logfile = /var/log/pgbouncer/pgbouncer.log
65    pidfile = /var/run/pgbouncer/pgbouncer.pid
66
67    listen_addr = *
68    listen_port = 6532
69    unix_socket_dir = /tmp
70
71    auth_type = trust
72    auth_file = /etc/pgbouncer.auth
73
74    admin_users = postgres
75    stats_users = postgres
76
77    pool_mode = transaction
78
79    max_client_conn = 100
80    default_pool_size = 20
81    min_pool_size = 5
82    reserve_pool_size = 5
83    reserve_pool_timeout = 3
84
85    log_connections = 1
86    log_disconnections = 1
87    log_pooler_errors = 1
88
89    %include /etc/pgbouncer.database.ini
90
91The actual script is as follows; adjust the configurable items as appropriate:
92
93`/var/lib/postgres/repmgr/promote.sh`
94
95
96    #!/usr/bin/env bash
97    set -u
98    set -e
99
100    # Configurable items
101    PGBOUNCER_HOSTS="node1 node2 node3"
102    PGBOUNCER_DATABASE_INI="/etc/pgbouncer.database.ini"
103    PGBOUNCER_DATABASE="appdb"
104    PGBOUNCER_PORT=6432
105
106    REPMGR_DB="repmgr"
107    REPMGR_USER="repmgr"
108
109    # 1. Promote this node from standby to primary
110
111    repmgr standby promote -f /etc/repmgr.conf --log-to-file
112
113    # 2. Reconfigure pgbouncer instances
114
115    PGBOUNCER_DATABASE_INI_NEW="/tmp/pgbouncer.database.ini"
116
117    for HOST in $PGBOUNCER_HOSTS
118    do
119        # Recreate the pgbouncer config file
120        echo -e "[databases]\n" > $PGBOUNCER_DATABASE_INI_NEW
121
122        psql -d $REPMGR_DB -U $REPMGR_USER -t -A \
123          -c "SELECT '${PGBOUNCER_DATABASE}-rw= ' || conninfo || ' application_name=pgbouncer_${HOST}' \
124              FROM repmgr.nodes \
125              WHERE active = TRUE AND type='primary'" >> $PGBOUNCER_DATABASE_INI_NEW
126
127        psql -d $REPMGR_DB -U $REPMGR_USER -t -A \
128          -c "SELECT '${PGBOUNCER_DATABASE}-ro= ' || conninfo || ' application_name=pgbouncer_${HOST}' \
129              FROM repmgr.nodes \
130              WHERE node_name='${HOST}'" >> $PGBOUNCER_DATABASE_INI_NEW
131
132
133        rsync $PGBOUNCER_DATABASE_INI_NEW $HOST:$PGBOUNCER_DATABASE_INI
134
135        psql -tc "reload" -h $HOST -p $PGBOUNCER_PORT -U postgres pgbouncer
136
137    done
138
139    # Clean up generated file
140    rm $PGBOUNCER_DATABASE_INI_NEW
141
142    echo "Reconfiguration of pgbouncer complete"
143
144Script and template file should be installed on each node where `repmgrd` is running.
145
146Finally, set `promote_command` in `repmgr.conf` on each node to
147point to the custom promote script:
148
149    promote_command='/var/lib/postgres/repmgr/promote.sh'
150
151and reload/restart any running `repmgrd` instances for the changes to take
152effect.
153