1Fencing a failed master node with repmgrd and pgbouncer 2======================================================= 3 4With automatic failover, it's essential to ensure that a failed primary 5remains inaccessible to your application, even if it comes back online 6again, to avoid a split-brain situation. 7 8By using `pgbouncer` together with `repmgrd`, it's possible to combine 9automatic failover with a process to isolate the failed primary from 10your application and ensure that all connections which should go to 11the primary are directed there smoothly without having to reconfigure 12your application. (Note that as a connection pooler, `pgbouncer` can 13benefit your application in other ways, but those are beyond the scope 14of this document). 15 16* * * 17 18> *WARNING*: automatic failover is tricky to get right. This document 19> demonstrates one possible implementation method, however you should 20> carefully configure and test any setup to suit the needs of your own 21> replication cluster/application. 22 23* * * 24 25In a failover situation, `repmgrd` promotes a standby to primary by executing 26the command defined in `promote_command`. Normally this would be something like: 27 28 repmgr standby promote -f /etc/repmgr.conf 29 30By wrapping this in a custom script which adjusts the `pgbouncer` configuration 31on all nodes, it's possible to fence the failed primary and redirect write 32connections to the new primary. 33 34The script consists of two sections: 35 36* the promotion command itself 37* commands to reconfigure `pgbouncer` on all nodes 38 39Note that it requires password-less SSH access from the `repmgr` nodes 40to all the `pgbouncer` nodes to be able to update the `pgbouncer` 41configuration files. 42 43For the purposes of this demonstration, we'll assume there are 3 nodes (primary 44and two standbys), with `pgbouncer` listening on port 6432 handling connections 45to a database called `appdb`. The `postgres` system user must have write 46access to the `pgbouncer` configuration files on all nodes. We'll assume 47there's a main `pgbouncer` configuration file, `/etc/pgbouncer.ini`, which uses 48the `%include` directive (available from PgBouncer 1.6) to include a separate 49configuration file, `/etc/pgbouncer.database.ini`, which will be modified by 50`repmgr`. 51 52* * * 53 54> *NOTE*: in this self-contained demonstration, `pgbouncer` is running on the 55> database servers, however in a production environment it will make more 56> sense to run `pgbouncer` on either separate nodes or the application server. 57 58* * * 59 60`/etc/pgbouncer.ini` should look something like this: 61 62 [pgbouncer] 63 64 logfile = /var/log/pgbouncer/pgbouncer.log 65 pidfile = /var/run/pgbouncer/pgbouncer.pid 66 67 listen_addr = * 68 listen_port = 6532 69 unix_socket_dir = /tmp 70 71 auth_type = trust 72 auth_file = /etc/pgbouncer.auth 73 74 admin_users = postgres 75 stats_users = postgres 76 77 pool_mode = transaction 78 79 max_client_conn = 100 80 default_pool_size = 20 81 min_pool_size = 5 82 reserve_pool_size = 5 83 reserve_pool_timeout = 3 84 85 log_connections = 1 86 log_disconnections = 1 87 log_pooler_errors = 1 88 89 %include /etc/pgbouncer.database.ini 90 91The actual script is as follows; adjust the configurable items as appropriate: 92 93`/var/lib/postgres/repmgr/promote.sh` 94 95 96 #!/usr/bin/env bash 97 set -u 98 set -e 99 100 # Configurable items 101 PGBOUNCER_HOSTS="node1 node2 node3" 102 PGBOUNCER_DATABASE_INI="/etc/pgbouncer.database.ini" 103 PGBOUNCER_DATABASE="appdb" 104 PGBOUNCER_PORT=6432 105 106 REPMGR_DB="repmgr" 107 REPMGR_USER="repmgr" 108 109 # 1. Promote this node from standby to primary 110 111 repmgr standby promote -f /etc/repmgr.conf --log-to-file 112 113 # 2. Reconfigure pgbouncer instances 114 115 PGBOUNCER_DATABASE_INI_NEW="/tmp/pgbouncer.database.ini" 116 117 for HOST in $PGBOUNCER_HOSTS 118 do 119 # Recreate the pgbouncer config file 120 echo -e "[databases]\n" > $PGBOUNCER_DATABASE_INI_NEW 121 122 psql -d $REPMGR_DB -U $REPMGR_USER -t -A \ 123 -c "SELECT '${PGBOUNCER_DATABASE}-rw= ' || conninfo || ' application_name=pgbouncer_${HOST}' \ 124 FROM repmgr.nodes \ 125 WHERE active = TRUE AND type='primary'" >> $PGBOUNCER_DATABASE_INI_NEW 126 127 psql -d $REPMGR_DB -U $REPMGR_USER -t -A \ 128 -c "SELECT '${PGBOUNCER_DATABASE}-ro= ' || conninfo || ' application_name=pgbouncer_${HOST}' \ 129 FROM repmgr.nodes \ 130 WHERE node_name='${HOST}'" >> $PGBOUNCER_DATABASE_INI_NEW 131 132 133 rsync $PGBOUNCER_DATABASE_INI_NEW $HOST:$PGBOUNCER_DATABASE_INI 134 135 psql -tc "reload" -h $HOST -p $PGBOUNCER_PORT -U postgres pgbouncer 136 137 done 138 139 # Clean up generated file 140 rm $PGBOUNCER_DATABASE_INI_NEW 141 142 echo "Reconfiguration of pgbouncer complete" 143 144Script and template file should be installed on each node where `repmgrd` is running. 145 146Finally, set `promote_command` in `repmgr.conf` on each node to 147point to the custom promote script: 148 149 promote_command='/var/lib/postgres/repmgr/promote.sh' 150 151and reload/restart any running `repmgrd` instances for the changes to take 152effect. 153