1---
2stage: Enablement
3group: Database
4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
5---
6
7# PostgreSQL replication and failover with Omnibus GitLab **(PREMIUM SELF)**
8
9If you're a Free user of GitLab self-managed, consider using a cloud-hosted solution.
10This document doesn't cover installations from source.
11
12If a setup with replication and failover isn't what you were looking for, see
13the [database configuration document](https://docs.gitlab.com/omnibus/settings/database.html)
14for the Omnibus GitLab packages.
15
16It's recommended to read this document fully before attempting to configure PostgreSQL with
17replication and failover for GitLab.
18
19## Architecture
20
21The Omnibus GitLab recommended configuration for a PostgreSQL cluster with
22replication failover requires:
23
24- A minimum of three PostgreSQL nodes.
25- A minimum of three Consul server nodes.
26- A minimum of three PgBouncer nodes that track and handle primary database reads and writes.
27  - An internal load balancer (TCP) to balance requests between the PgBouncer nodes.
28- [Database Load Balancing](database_load_balancing.md) enabled.
29  - A local PgBouncer service configured on each PostgreSQL node. Note that this is separate from the main PgBouncer cluster that tracks the primary.
30
31```plantuml
32@startuml
33card "**Internal Load Balancer**" as ilb #9370DB
34skinparam linetype ortho
35
36together {
37  collections "**GitLab Rails** x3" as gitlab #32CD32
38  collections "**Sidekiq** x4" as sidekiq #ff8dd1
39}
40
41collections "**Consul** x3" as consul #e76a9b
42
43card "Database" as database {
44  collections "**PGBouncer x3**\n//Consul//" as pgbouncer #4EA7FF
45
46  card "**PostgreSQL** //Primary//\n//Patroni//\n//PgBouncer//\n//Consul//" as postgres_primary #4EA7FF
47  collections "**PostgreSQL** //Secondary// **x2**\n//Patroni//\n//PgBouncer//\n//Consul//" as postgres_secondary #4EA7FF
48
49  pgbouncer -[#4EA7FF]-> postgres_primary
50  postgres_primary .[#4EA7FF]r-> postgres_secondary
51}
52
53gitlab -[#32CD32]-> ilb
54gitlab -[hidden]-> pgbouncer
55gitlab .[#32CD32,norank]-> postgres_primary
56gitlab .[#32CD32,norank]-> postgres_secondary
57
58sidekiq -[#ff8dd1]-> ilb
59sidekiq -[hidden]-> pgbouncer
60sidekiq .[#ff8dd1,norank]-> postgres_primary
61sidekiq .[#ff8dd1,norank]-> postgres_secondary
62
63ilb -[#9370DB]-> pgbouncer
64
65consul -[#e76a9b]r-> pgbouncer
66consul .[#e76a9b,norank]r-> postgres_primary
67consul .[#e76a9b,norank]r-> postgres_secondary
68@enduml
69```
70
71You also need to take into consideration the underlying network topology, making
72sure you have redundant connectivity between all Database and GitLab instances
73to avoid the network becoming a single point of failure.
74
75NOTE:
76As of GitLab 13.3, PostgreSQL 12 is shipped with Omnibus GitLab. Clustering for PostgreSQL 12 is supported only with
77Patroni. See the [Patroni](#patroni) section for further details. Starting with GitLab 14.0, only PostgreSQL 12 is
78shipped with Omnibus GitLab, and thus Patroni becomes mandatory for replication and failover.
79
80### Database node
81
82Each database node runs four services:
83
84- `PostgreSQL`: The database itself.
85- `Patroni`: Communicates with other Patroni services in the cluster and handles failover when issues with the leader server occurs. The failover procedure consists of:
86  - Selecting a new leader for the cluster.
87  - Promoting the new node to leader.
88  - Instructing remaining servers to follow the new leader node.
89- `PgBouncer`: A local pooler for the node. Used for _read_ queries as part of [Database Load Balancing](database_load_balancing.md).
90- `Consul` agent: To communicate with Consul cluster which stores the current Patroni state. The agent monitors the status of each node in the database cluster and tracks its health in a service definition on the Consul cluster.
91
92### Consul server node
93
94The Consul server node runs the Consul server service. These nodes must have reached the quorum and elected a leader _before_ Patroni cluster bootstrap; otherwise, database nodes wait until such Consul leader is elected.
95
96### PgBouncer node
97
98Each PgBouncer node runs two services:
99
100- `PgBouncer`: The database connection pooler itself.
101- `Consul` agent: Watches the status of the PostgreSQL service definition on the Consul cluster. If that status changes, Consul runs a script which updates the PgBouncer configuration to point to the new PostgreSQL leader node and reloads the PgBouncer service.
102
103### Connection flow
104
105Each service in the package comes with a set of [default ports](../package_information/defaults.md#ports). You may need to make specific firewall rules for the connections listed below:
106
107There are several connection flows in this setup:
108
109- [Primary](#primary)
110- [Database Load Balancing](#database-load-balancing)
111- [Replication](#replication)
112
113#### Primary
114
115- Application servers connect to either PgBouncer directly via its [default port](../package_information/defaults.md) or via a configured Internal Load Balancer (TCP) that serves multiple PgBouncers.
116- PgBouncer connects to the primary database server's [PostgreSQL default port](../package_information/defaults.md).
117
118#### Database Load Balancing
119
120For read queries against data that haven't been recently changed and are up to date on all database nodes:
121
122- Application servers connect to the local PgBouncer service via its [default port](../package_information/defaults.md) on each database node in a round-robin approach.
123- Local PgBouncer connects to the local database server's [PostgreSQL default port](../package_information/defaults.md).
124
125#### Replication
126
127- Patroni actively manages the running PostgreSQL processes and configuration.
128- PostgreSQL secondaries connect to the primary database servers [PostgreSQL default port](../package_information/defaults.md)
129- Consul servers and agents connect to each others [Consul default ports](../package_information/defaults.md)
130
131## Setting it up
132
133### Required information
134
135Before proceeding with configuration, you need to collect all the necessary
136information.
137
138#### Network information
139
140PostgreSQL doesn't listen on any network interface by default. It needs to know
141which IP address to listen on to be accessible to other services. Similarly,
142PostgreSQL access is controlled based on the network source.
143
144This is why you need:
145
146- The IP address of each node's network interface. This can be set to `0.0.0.0` to
147  listen on all interfaces. It cannot be set to the loopback address `127.0.0.1`.
148- Network Address. This can be in subnet (that is, `192.168.0.0/255.255.255.0`)
149  or Classless Inter-Domain Routing (CIDR) (`192.168.0.0/24`) form.
150
151#### Consul information
152
153When using default setup, minimum configuration requires:
154
155- `CONSUL_USERNAME`. The default user for Omnibus GitLab is `gitlab-consul`
156- `CONSUL_DATABASE_PASSWORD`. Password for the database user.
157- `CONSUL_PASSWORD_HASH`. This is a hash generated out of Consul username/password pair. It can be generated with:
158
159   ```shell
160   sudo gitlab-ctl pg-password-md5 CONSUL_USERNAME
161   ```
162
163- `CONSUL_SERVER_NODES`. The IP addresses or DNS records of the Consul server nodes.
164
165Few notes on the service itself:
166
167- The service runs under a system account, by default `gitlab-consul`.
168- If you are using a different username, you have to specify it through the `CONSUL_USERNAME` variable.
169- Passwords are stored in the following locations:
170  - `/etc/gitlab/gitlab.rb`: hashed
171  - `/var/opt/gitlab/pgbouncer/pg_auth`: hashed
172  - `/var/opt/gitlab/consul/.pgpass`: plaintext
173
174#### PostgreSQL information
175
176When configuring PostgreSQL, we do the following:
177
178- Set `max_replication_slots` to double the number of database nodes. Patroni uses one extra slot per node when initiating the replication.
179- Set `max_wal_senders` to one more than the allocated number of replication slots in the cluster. This prevents replication from using up all of the available database connections.
180
181In this document we are assuming 3 database nodes, which makes this configuration:
182
183```ruby
184patroni['postgresql']['max_replication_slots'] = 6
185patroni['postgresql']['max_wal_senders'] = 7
186```
187
188As previously mentioned, prepare the network subnets that need permission
189to authenticate with the database.
190You also need to have the IP addresses or DNS records of Consul
191server nodes on hand.
192
193You need the following password information for the application's database user:
194
195- `POSTGRESQL_USERNAME`. The default user for Omnibus GitLab is `gitlab`
196- `POSTGRESQL_USER_PASSWORD`. The password for the database user
197- `POSTGRESQL_PASSWORD_HASH`. This is a hash generated out of the username/password pair.
198  It can be generated with:
199
200  ```shell
201  sudo gitlab-ctl pg-password-md5 POSTGRESQL_USERNAME
202  ```
203
204#### Patroni information
205
206You need the following password information for the Patroni API:
207
208- `PATRONI_API_USERNAME`. A username for basic auth to the API
209- `PATRONI_API_PASSWORD`. A password for basic auth to the API
210
211#### PgBouncer information
212
213When using a default setup, the minimum configuration requires:
214
215- `PGBOUNCER_USERNAME`. The default user for Omnibus GitLab is `pgbouncer`
216- `PGBOUNCER_PASSWORD`. This is a password for PgBouncer service.
217- `PGBOUNCER_PASSWORD_HASH`. This is a hash generated out of PgBouncer username/password pair. It can be generated with:
218
219  ```shell
220  sudo gitlab-ctl pg-password-md5 PGBOUNCER_USERNAME
221  ```
222
223- `PGBOUNCER_NODE`, is the IP address or a FQDN of the node running PgBouncer.
224
225Few things to remember about the service itself:
226
227- The service runs as the same system account as the database. In the package, this is by default `gitlab-psql`
228- If you use a non-default user account for PgBouncer service (by default `pgbouncer`), you need to specify this username.
229- Passwords are stored in the following locations:
230  - `/etc/gitlab/gitlab.rb`: hashed, and in plain text
231  - `/var/opt/gitlab/pgbouncer/pg_auth`: hashed
232
233### Installing Omnibus GitLab
234
235First, make sure to [download/install](https://about.gitlab.com/install/)
236Omnibus GitLab **on each node**.
237
238Make sure you install the necessary dependencies from step 1,
239add GitLab package repository from step 2.
240When installing the GitLab package, do not supply `EXTERNAL_URL` value.
241
242### Configuring the Database nodes
243
2441. Make sure to [configure the Consul nodes](../consul.md).
2451. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information), [`POSTGRESQL_PASSWORD_HASH`](#postgresql-information), the [number of db nodes](#postgresql-information), and the [network address](#network-information) before executing the next step.
246
247#### Configuring Patroni cluster
248
249You must enable Patroni explicitly to be able to use it (with `patroni['enable'] = true`).
250
251Any PostgreSQL configuration item that controls replication, for example `wal_level`, `max_wal_senders`, or others are strictly
252controlled by Patroni. These configurations override the original settings that you make with the `postgresql[...]` configuration key.
253Hence, they are all separated and placed under `patroni['postgresql'][...]`. This behavior is limited to replication.
254Patroni honours any other PostgreSQL configuration that was made with the `postgresql[...]` configuration key. For example,
255`max_wal_senders` by default is set to `5`. If you wish to change this you must set it with the `patroni['postgresql']['max_wal_senders']`
256configuration key.
257
258NOTE:
259The configuration of a Patroni node is very similar to a repmgr but shorter. When Patroni is enabled, first you can ignore
260any replication setting of PostgreSQL (which is overwritten). Then, you can remove any `repmgr[...]` or
261repmgr-specific configuration as well. Especially, make sure that you remove `postgresql['shared_preload_libraries'] = 'repmgr_funcs'`.
262
263Here is an example:
264
265```ruby
266# Disable all components except Patroni, PgBouncer and Consul
267roles(['patroni_role', 'pgbouncer_role'])
268
269# PostgreSQL configuration
270postgresql['listen_address'] = '0.0.0.0'
271
272# Disable automatic database migrations
273gitlab_rails['auto_migrate'] = false
274
275# Configure the Consul agent
276consul['services'] = %w(postgresql)
277
278# START user configuration
279# Please set the real values as explained in Required Information section
280#
281# Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
282postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
283# Replace POSTGRESQL_REPLICATION_PASSWORD_HASH with a generated md5 value
284postgresql['sql_replication_password'] = 'POSTGRESQL_REPLICATION_PASSWORD_HASH'
285# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
286postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
287
288# Replace PATRONI_API_USERNAME with a username for Patroni Rest API calls (use the same username in all nodes)
289patroni['username'] = 'PATRONI_API_USERNAME'
290# Replace PATRONI_API_PASSWORD with a password for Patroni Rest API calls (use the same password in all nodes)
291patroni['password'] = 'PATRONI_API_PASSWORD'
292
293# Sets `max_replication_slots` to double the number of database nodes.
294# Patroni uses one extra slot per node when initiating the replication.
295patroni['postgresql']['max_replication_slots'] = X
296
297# Set `max_wal_senders` to one more than the number of replication slots in the cluster.
298# This is used to prevent replication from using up all of the
299# available database connections.
300patroni['postgresql']['max_wal_senders'] = X+1
301
302# Replace XXX.XXX.XXX.XXX/YY with Network Addresses for your other patroni nodes
303patroni['allowlist'] = %w(XXX.XXX.XXX.XXX/YY 127.0.0.1/32)
304
305# Replace XXX.XXX.XXX.XXX/YY with Network Address
306postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY 127.0.0.1/32)
307
308# Local PgBouncer service for Database Load Balancing
309pgbouncer['databases'] = {
310  gitlabhq_production: {
311    host: "127.0.0.1",
312    user: "PGBOUNCER_USERNAME",
313    password: 'PGBOUNCER_PASSWORD_HASH'
314  }
315}
316
317# Replace placeholders:
318#
319# Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
320# with the addresses gathered for CONSUL_SERVER_NODES
321consul['configuration'] = {
322  retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
323}
324#
325# END user configuration
326```
327
328All database nodes use the same configuration. The leader node is not determined in configuration,
329and there is no additional or different configuration for either leader or replica nodes.
330
331After the configuration of a node is complete, you must [reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure)
332on each node for the changes to take effect.
333
334Generally, when Consul cluster is ready, the first node that [reconfigures](../restart_gitlab.md#omnibus-gitlab-reconfigure)
335becomes the leader. You do not need to sequence the nodes reconfiguration. You can run them in parallel or in any order.
336If you choose an arbitrary order, you do not have any predetermined leader.
337
338#### Enable Monitoring
339
340> [Introduced](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/3786) in GitLab 12.0.
341
342If you enable Monitoring, it must be enabled on **all** database servers.
343
3441. Create/edit `/etc/gitlab/gitlab.rb` and add the following configuration:
345
346   ```ruby
347   # Enable service discovery for Prometheus
348   consul['monitoring_service_discovery'] = true
349
350   # Set the network addresses that the exporters must listen on
351   node_exporter['listen_address'] = '0.0.0.0:9100'
352   postgres_exporter['listen_address'] = '0.0.0.0:9187'
353   ```
354
3551. Run `sudo gitlab-ctl reconfigure` to compile the configuration.
356
357#### Enable TLS support for the Patroni API
358
359By default, Patroni's [REST API](https://patroni.readthedocs.io/en/latest/rest_api.html#rest-api) is served over HTTP.
360You have the option to enable TLS and use HTTPS over the same [port](../package_information/defaults.md).
361
362To enable TLS, you need PEM-formatted certificate and private key files. Both files must be readable by the PostgreSQL user (`gitlab-psql` by default, or the one set by `postgresql['username']`):
363
364```ruby
365patroni['tls_certificate_file'] = '/path/to/server/certificate.pem'
366patroni['tls_key_file'] = '/path/to/server/key.pem'
367```
368
369If the server's private key is encrypted, specify the password to decrypt it:
370
371```ruby
372patroni['tls_key_password'] = 'private-key-password' # This is the plain-text password.
373```
374
375If you are using a self-signed certificate or an internal CA, you need to either disable the TLS verification or pass the certificate of the
376internal CA, otherwise you may run into an unexpected error when using the `gitlab-ctl patroni ....` commands. Omnibus ensures that Patroni API
377clients honor this configuration.
378
379TLS certificate verification is enabled by default. To disable it:
380
381```ruby
382patroni['tls_verify'] = false
383```
384
385Alternatively, you can pass a PEM-formatted certificate of the internal CA. Again, the file must be readable by the PostgreSQL user:
386
387```ruby
388patroni['tls_ca_file'] = '/path/to/ca.pem'
389```
390
391When TLS is enabled, mutual authentication of the API server and client is possible for all endpoints, the extent of which depends on
392the `patroni['tls_client_mode']` attribute:
393
394- `none` (default): The API does not check for any client certificates.
395- `optional`: Client certificates are required for all [unsafe](https://patroni.readthedocs.io/en/latest/security.html#protecting-the-rest-api) API calls.
396- `required`: Client certificates are required for all API calls.
397
398The client certificates are verified against the CA certificate that is specified with the `patroni['tls_ca_file']` attribute. Therefore,
399this attribute is required for mutual TLS authentication. You also need to specify PEM-formatted client certificate and private key files.
400Both files must be readable by the PostgreSQL user:
401
402```ruby
403patroni['tls_client_mode'] = 'required'
404patroni['tls_ca_file'] = '/path/to/ca.pem'
405
406patroni['tls_client_certificate_file'] = '/path/to/client/certificate.pem'
407patroni['tls_client_key_file'] = '/path/to/client/key.pem'
408```
409
410You can use different certificates and keys for both API server and client on different Patroni nodes as long as they can be verified.
411However, the CA certificate (`patroni['tls_ca_file']`), TLS certificate verification (`patroni['tls_verify']`), and client TLS
412authentication mode (`patroni['tls_client_mode']`), must each have the same value on all nodes.
413
414### Configure PgBouncer nodes
415
4161. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`CONSUL_PASSWORD_HASH`](#consul-information), and [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information) before executing the next step.
417
4181. One each node, edit the `/etc/gitlab/gitlab.rb` configuration file and replace values noted in the `# START user configuration` section as below:
419
420   ```ruby
421   # Disable all components except PgBouncer and Consul agent
422   roles(['pgbouncer_role'])
423
424   # Configure PgBouncer
425   pgbouncer['admin_users'] = %w(pgbouncer gitlab-consul)
426
427   # Configure Consul agent
428   consul['watchers'] = %w(postgresql)
429
430   # START user configuration
431   # Please set the real values as explained in Required Information section
432   # Replace CONSUL_PASSWORD_HASH with with a generated md5 value
433   # Replace PGBOUNCER_PASSWORD_HASH with with a generated md5 value
434   pgbouncer['users'] = {
435     'gitlab-consul': {
436       password: 'CONSUL_PASSWORD_HASH'
437     },
438     'pgbouncer': {
439       password: 'PGBOUNCER_PASSWORD_HASH'
440     }
441   }
442   # Replace placeholders:
443   #
444   # Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
445   # with the addresses gathered for CONSUL_SERVER_NODES
446   consul['configuration'] = {
447     retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
448   }
449   #
450   # END user configuration
451   ```
452
453   NOTE:
454   `pgbouncer_role` was introduced with GitLab 10.3.
455
4561. Run `gitlab-ctl reconfigure`
457
4581. Create a `.pgpass` file so Consul is able to
459   reload PgBouncer. Enter the `PGBOUNCER_PASSWORD` twice when asked:
460
461   ```shell
462   gitlab-ctl write-pgpass --host 127.0.0.1 --database pgbouncer --user pgbouncer --hostuser gitlab-consul
463   ```
464
4651. [Enable monitoring](../postgresql/pgbouncer.md#enable-monitoring)
466
467#### PgBouncer Checkpoint
468
4691. Ensure each node is talking to the current node leader:
470
471   ```shell
472   gitlab-ctl pgb-console # Supply PGBOUNCER_PASSWORD when prompted
473   ```
474
475   If there is an error `psql: ERROR:  Auth failed` after typing in the
476   password, ensure you have previously generated the MD5 password hashes with the correct
477   format. The correct format is to concatenate the password and the username:
478   `PASSWORDUSERNAME`. For example, `Sup3rS3cr3tpgbouncer` would be the text
479   needed to generate an MD5 password hash for the `pgbouncer` user.
480
4811. After the console prompt has become available, run the following queries:
482
483   ```shell
484   show databases ; show clients ;
485   ```
486
487   The output should be similar to the following:
488
489   ```plaintext
490           name         |  host       | port |      database       | force_user | pool_size | reserve_pool | pool_mode | max_connections | current_connections
491   ---------------------+-------------+------+---------------------+------------+-----------+--------------+-----------+-----------------+---------------------
492    gitlabhq_production | MASTER_HOST | 5432 | gitlabhq_production |            |        20 |            0 |           |               0 |                   0
493    pgbouncer           |             | 6432 | pgbouncer           | pgbouncer  |         2 |            0 | statement |               0 |                   0
494   (2 rows)
495
496    type |   user    |      database       |  state  |   addr         | port  | local_addr | local_port |    connect_time     |    request_time     |    ptr    | link | remote_pid | tls
497   ------+-----------+---------------------+---------+----------------+-------+------------+------------+---------------------+---------------------+-----------+------+------------+-----
498    C    | pgbouncer | pgbouncer           | active  | 127.0.0.1      | 56846 | 127.0.0.1  |       6432 | 2017-08-21 18:09:59 | 2017-08-21 18:10:48 | 0x22b3880 |      |          0 |
499   (2 rows)
500   ```
501
502#### Configure the internal load balancer
503
504If you're running more than one PgBouncer node as recommended, you must set up a TCP internal load balancer to serve each correctly. This can be accomplished with any reputable TCP load balancer.
505
506As an example, here's how you could do it with [HAProxy](https://www.haproxy.org/):
507
508```plaintext
509global
510    log /dev/log local0
511    log localhost local1 notice
512    log stdout format raw local0
513
514defaults
515    log global
516    default-server inter 10s fall 3 rise 2
517    balance leastconn
518
519frontend internal-pgbouncer-tcp-in
520    bind *:6432
521    mode tcp
522    option tcplog
523
524    default_backend pgbouncer
525
526backend pgbouncer
527    mode tcp
528    option tcp-check
529
530    server pgbouncer1 <ip>:6432 check
531    server pgbouncer2 <ip>:6432 check
532    server pgbouncer3 <ip>:6432 check
533```
534
535Refer to your preferred Load Balancer's documentation for further guidance.
536
537### Configuring the Application nodes
538
539Application nodes run the `gitlab-rails` service. You may have other
540attributes set, but the following need to be set.
541
5421. Edit `/etc/gitlab/gitlab.rb`:
543
544   ```ruby
545   # Disable PostgreSQL on the application node
546   postgresql['enable'] = false
547
548   gitlab_rails['db_host'] = 'PGBOUNCER_NODE' or 'INTERNAL_LOAD_BALANCER'
549   gitlab_rails['db_port'] = 6432
550   gitlab_rails['db_password'] = 'POSTGRESQL_USER_PASSWORD'
551   gitlab_rails['auto_migrate'] = false
552   gitlab_rails['db_load_balancing'] = { 'hosts' => ['POSTGRESQL_NODE_1', 'POSTGRESQL_NODE_2', 'POSTGRESQL_NODE_3'] }
553   ```
554
5551. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
556
557#### Application node post-configuration
558
559Ensure that all migrations ran:
560
561```shell
562gitlab-rake gitlab:db:configure
563```
564
565> **Note**: If you encounter a `rake aborted!` error stating that PgBouncer is failing to connect to PostgreSQL it may be that your PgBouncer node's IP address is missing from
566PostgreSQL's `trust_auth_cidr_addresses` in `gitlab.rb` on your database nodes. See
567[PgBouncer error `ERROR:  pgbouncer cannot connect to server`](#pgbouncer-error-error-pgbouncer-cannot-connect-to-server)
568in the Troubleshooting section before proceeding.
569
570### Backups
571
572Do not backup or restore GitLab through a PgBouncer connection: this causes a GitLab outage.
573
574[Read more about this and how to reconfigure backups](../../raketasks/backup_restore.md#back-up-and-restore-for-installations-using-pgbouncer).
575
576### Ensure GitLab is running
577
578At this point, your GitLab instance should be up and running. Verify you're able
579to sign in, and create issues and merge requests. If you encounter issues, see
580the [Troubleshooting section](#troubleshooting).
581
582## Example configuration
583
584This section describes several fully expanded example configurations.
585
586### Example recommended setup
587
588This example uses three Consul servers, three PgBouncer servers (with an
589associated internal load balancer), three PostgreSQL servers, and one
590application node.
591
592We start with all servers on the same 10.6.0.0/16 private network range, they
593can connect to each freely other on those addresses.
594
595Here is a list and description of each machine and the assigned IP:
596
597- `10.6.0.11`: Consul 1
598- `10.6.0.12`: Consul 2
599- `10.6.0.13`: Consul 3
600- `10.6.0.20`: Internal Load Balancer
601- `10.6.0.21`: PgBouncer 1
602- `10.6.0.22`: PgBouncer 2
603- `10.6.0.23`: PgBouncer 3
604- `10.6.0.31`: PostgreSQL 1
605- `10.6.0.32`: PostgreSQL 2
606- `10.6.0.33`: PostgreSQL 3
607- `10.6.0.41`: GitLab application
608
609All passwords are set to `toomanysecrets`. Please do not use this password or derived hashes and the `external_url` for GitLab is `http://gitlab.example.com`.
610
611After the initial configuration, if a failover occurs, the PostgresSQL leader node changes to one of the available secondaries until it is failed back.
612
613#### Example recommended setup for Consul servers
614
615On each server edit `/etc/gitlab/gitlab.rb`:
616
617```ruby
618# Disable all components except Consul
619roles(['consul_role'])
620
621consul['configuration'] = {
622  server: true,
623  retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13)
624}
625consul['monitoring_service_discovery'] =  true
626```
627
628[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
629
630#### Example recommended setup for PgBouncer servers
631
632On each server edit `/etc/gitlab/gitlab.rb`:
633
634```ruby
635# Disable all components except Pgbouncer and Consul agent
636roles(['pgbouncer_role'])
637
638# Configure PgBouncer
639pgbouncer['admin_users'] = %w(pgbouncer gitlab-consul)
640
641pgbouncer['users'] = {
642  'gitlab-consul': {
643    password: '5e0e3263571e3704ad655076301d6ebe'
644  },
645  'pgbouncer': {
646    password: '771a8625958a529132abe6f1a4acb19c'
647  }
648}
649
650consul['watchers'] = %w(postgresql)
651consul['configuration'] = {
652  retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13)
653}
654consul['monitoring_service_discovery'] =  true
655```
656
657[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
658
659#### Internal load balancer setup
660
661An internal load balancer (TCP) is then required to be setup to serve each PgBouncer node (in this example on the IP of `10.6.0.20`). An example of how to do this can be found in the [PgBouncer Configure Internal Load Balancer](#configure-the-internal-load-balancer) section.
662
663#### Example recommended setup for PostgreSQL servers
664
665On database nodes edit `/etc/gitlab/gitlab.rb`:
666
667```ruby
668# Disable all components except Patroni, PgBouncer and Consul
669roles(['patroni_role', 'pgbouncer_role'])
670
671# PostgreSQL configuration
672postgresql['listen_address'] = '0.0.0.0'
673postgresql['hot_standby'] = 'on'
674postgresql['wal_level'] = 'replica'
675
676# Disable automatic database migrations
677gitlab_rails['auto_migrate'] = false
678
679postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c'
680postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f'
681patroni['username'] = 'PATRONI_API_USERNAME'
682patroni['password'] = 'PATRONI_API_PASSWORD'
683patroni['postgresql']['max_replication_slots'] = 6
684patroni['postgresql']['max_wal_senders'] = 7
685
686patroni['allowlist'] = = %w(10.6.0.0/16 127.0.0.1/32)
687postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16 127.0.0.1/32)
688
689# Local PgBouncer service for Database Load Balancing
690pgbouncer['databases'] = {
691  gitlabhq_production: {
692    host: "127.0.0.1",
693    user: "pgbouncer",
694    password: '771a8625958a529132abe6f1a4acb19c'
695  }
696}
697
698# Configure the Consul agent
699consul['services'] = %w(postgresql)
700consul['configuration'] = {
701  retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13)
702}
703consul['monitoring_service_discovery'] =  true
704```
705
706[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
707
708#### Example recommended setup manual steps
709
710After deploying the configuration follow these steps:
711
7121. Find the primary database node:
713
714   ```shell
715   gitlab-ctl get-postgresql-primary
716   ```
717
7181. On `10.6.0.41`, our application server:
719
720   Set `gitlab-consul` user's PgBouncer password to `toomanysecrets`:
721
722   ```shell
723   gitlab-ctl write-pgpass --host 127.0.0.1 --database pgbouncer --user pgbouncer --hostuser gitlab-consul
724   ```
725
726   Run database migrations:
727
728   ```shell
729   gitlab-rake gitlab:db:configure
730   ```
731
732## Patroni
733
734NOTE:
735Using Patroni instead of Repmgr is supported for PostgreSQL 11 and required for PostgreSQL 12. Starting with GitLab 14.0, only PostgreSQL 12 is available and hence Patroni is mandatory to achieve failover and replication.
736
737Patroni is an opinionated solution for PostgreSQL high-availability. It takes the control of PostgreSQL, overrides its configuration, and manages its lifecycle (start, stop, restart). Patroni is the only option for PostgreSQL 12 clustering and for cascading replication for Geo deployments.
738
739The fundamental [architecture](#example-recommended-setup-manual-steps) (mentioned above) does not change for Patroni.
740You do not need any special consideration for Patroni while provisioning your database nodes. Patroni heavily relies on Consul to store the state of the cluster and elect a leader. Any failure in Consul cluster and its leader election propagates to the Patroni cluster as well.
741
742Patroni monitors the cluster and handles any failover. When the primary node fails, it works with Consul to notify PgBouncer. On failure, Patroni handles the transitioning of the old primary to a replica and rejoins it to the cluster automatically.
743
744With Patroni, the connection flow is slightly different. Patroni on each node connects to Consul agent to join the cluster. Only after this point it decides if the node is the primary or a replica. Based on this decision, it configures and starts PostgreSQL which it communicates with directly over a Unix socket. This means that if the Consul cluster is not functional or does not have a leader, Patroni and by extension PostgreSQL does not start. Patroni also exposes a REST API which can be accessed via its [default port](../package_information/defaults.md)
745on each node.
746
747### Check replication status
748
749Run `gitlab-ctl patroni members` to query Patroni for a summary of the cluster status:
750
751```plaintext
752+ Cluster: postgresql-ha (6970678148837286213) ------+---------+---------+----+-----------+
753| Member                              | Host         | Role    | State   | TL | Lag in MB |
754+-------------------------------------+--------------+---------+---------+----+-----------+
755| gitlab-database-1.example.com       | 172.18.0.111 | Replica | running |  5 |         0 |
756| gitlab-database-2.example.com       | 172.18.0.112 | Replica | running |  5 |       100 |
757| gitlab-database-3.example.com       | 172.18.0.113 | Leader  | running |  5 |           |
758+-------------------------------------+--------------+---------+---------+----+-----------+
759```
760
761To verify the status of replication:
762
763```shell
764echo -e 'select * from pg_stat_wal_receiver\x\g\x \n select * from pg_stat_replication\x\g\x' | gitlab-psql
765```
766
767The same command can be run on all three database servers. It returns any information
768about replication available depending on the role the server is performing.
769
770The leader should return one record per replica:
771
772```sql
773-[ RECORD 1 ]----+------------------------------
774pid              | 371
775usesysid         | 16384
776usename          | gitlab_replicator
777application_name | gitlab-database-1.example.com
778client_addr      | 172.18.0.111
779client_hostname  |
780client_port      | 42900
781backend_start    | 2021-06-14 08:01:59.580341+00
782backend_xmin     |
783state            | streaming
784sent_lsn         | 0/EA13220
785write_lsn        | 0/EA13220
786flush_lsn        | 0/EA13220
787replay_lsn       | 0/EA13220
788write_lag        |
789flush_lag        |
790replay_lag       |
791sync_priority    | 0
792sync_state       | async
793reply_time       | 2021-06-18 19:17:14.915419+00
794```
795
796Investigate further if:
797
798- There are missing or extra records.
799- `reply_time` is not current.
800
801The `lsn` fields relate to which write-ahead-log segments have been replicated.
802Run the following on the leader to find out the current Log Sequence Number (LSN):
803
804```shell
805echo 'SELECT pg_current_wal_lsn();' | gitlab-psql
806```
807
808If a replica is not in sync, `gitlab-ctl patroni members` indicates the volume
809of missing data, and the `lag` fields indicate the elapsed time.
810
811Read more about the data returned by the leader
812[in the PostgreSQL documentation](https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-REPLICATION-VIEW),
813including other values for the `state` field.
814
815The replicas should return:
816
817```sql
818-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------
819pid                   | 391
820status                | streaming
821receive_start_lsn     | 0/D000000
822receive_start_tli     | 5
823received_lsn          | 0/EA13220
824received_tli          | 5
825last_msg_send_time    | 2021-06-18 19:16:54.807375+00
826last_msg_receipt_time | 2021-06-18 19:16:54.807512+00
827latest_end_lsn        | 0/EA13220
828latest_end_time       | 2021-06-18 19:07:23.844879+00
829slot_name             | gitlab-database-1.example.com
830sender_host           | 172.18.0.113
831sender_port           | 5432
832conninfo              | user=gitlab_replicator host=172.18.0.113 port=5432 application_name=gitlab-database-1.example.com
833```
834
835Read more about the data returned by the replica
836[in the PostgreSQL documentation](https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-WAL-RECEIVER-VIEW).
837
838### Selecting the appropriate Patroni replication method
839
840[Review the Patroni documentation carefully](https://patroni.readthedocs.io/en/latest/SETTINGS.html#postgresql)
841before making changes as **_some of the options carry a risk of potential data
842loss if not fully understood_**. The [replication mode](https://patroni.readthedocs.io/en/latest/replication_modes.html)
843configured determines the amount of tolerable data loss.
844
845WARNING:
846Replication is not a backup strategy! There is no replacement for a well-considered and tested backup solution.
847
848Omnibus GitLab defaults [`synchronous_commit`](https://www.postgresql.org/docs/11/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT) to `on`.
849
850```ruby
851postgresql['synchronous_commit'] = 'on'
852gitlab['geo-postgresql']['synchronous_commit'] = 'on'
853```
854
855#### Customizing Patroni failover behavior
856
857Omnibus GitLab exposes several options allowing more control over the [Patroni restoration process](#recovering-the-patroni-cluster).
858
859Each option is shown below with its default value in `/etc/gitlab/gitlab.rb`.
860
861```ruby
862patroni['use_pg_rewind'] = true
863patroni['remove_data_directory_on_rewind_failure'] = false
864patroni['remove_data_directory_on_diverged_timelines'] = false
865```
866
867[The upstream documentation is always more up to date](https://patroni.readthedocs.io/en/latest/SETTINGS.html#postgresql), but the table below should provide a minimal overview of functionality.
868
869|Setting|Overview|
870|-|-|
871|`use_pg_rewind`|Try running `pg_rewind` on the former cluster leader before it rejoins the database cluster.|
872|`remove_data_directory_on_rewind_failure`|If `pg_rewind` fails, remove the local PostgreSQL data directory and re-replicate from the current cluster leader.|
873|`remove_data_directory_on_diverged_timelines`|If `pg_rewind` cannot be used and the former leader's timeline has diverged from the current one, delete the local data directory and re-replicate from the current cluster leader.|
874
875### Database authorization for Patroni
876
877Patroni uses a Unix socket to manage the PostgreSQL instance. Therefore, a connection from the `local` socket must be trusted.
878
879Also, replicas use the replication user (`gitlab_replicator` by default) to communicate with the leader. For this user,
880you can choose between `trust` and `md5` authentication. If you set `postgresql['sql_replication_password']`,
881Patroni uses `md5` authentication, and otherwise falls back to `trust`. You must to specify the cluster CIDR in
882`postgresql['md5_auth_cidr_addresses']` or `postgresql['trust_auth_cidr_addresses']` respectively.
883
884### Interacting with Patroni cluster
885
886You can use `gitlab-ctl patroni members` to check the status of the cluster members. To check the status of each node
887`gitlab-ctl patroni` provides two additional sub-commands, `check-leader` and `check-replica` which indicate if a node
888is the primary or a replica.
889
890When Patroni is enabled, it exclusively controls PostgreSQL's startup,
891shutdown, and restart. This means, to shut down PostgreSQL on a certain node, you must shutdown Patroni on the same node with:
892
893```shell
894sudo gitlab-ctl stop patroni
895```
896
897Stopping or restarting the Patroni service on the leader node triggers an automatic failover. If you need Patroni to reload its configuration or restart the PostgreSQL process without triggering the failover, you must use the `reload` or `restart` sub-commands of `gitlab-ctl patroni` instead. These two sub-commands are wrappers of the same `patronictl` commands.
898
899### Manual failover procedure for Patroni
900
901While Patroni supports automatic failover, you also have the ability to perform
902a manual one, where you have two slightly different options:
903
904- **Failover**: allows you to perform a manual failover when there are no healthy nodes.
905  You can perform this action in any PostgreSQL node:
906
907  ```shell
908  sudo gitlab-ctl patroni failover
909  ```
910
911- **Switchover**: only works when the cluster is healthy and allows you to schedule a switchover (it can happen immediately).
912  You can perform this action in any PostgreSQL node:
913
914  ```shell
915  sudo gitlab-ctl patroni switchover
916  ```
917
918For further details on this subject, see the
919[Patroni documentation](https://patroni.readthedocs.io/en/latest/rest_api.html#switchover-and-failover-endpoints).
920
921#### Geo secondary site considerations
922
923When a Geo secondary site is replicating from a primary site that uses `Patroni` and `PgBouncer`, [replicating through PgBouncer is not supported](https://github.com/pgbouncer/pgbouncer/issues/382#issuecomment-517911529). The secondary *must* replicate directly from the leader node in the `Patroni` cluster. When there is an automatic or manual failover in the `Patroni` cluster, you can manually re-point your secondary site to replicate from the new leader with:
924
925```shell
926sudo gitlab-ctl replicate-geo-database --host=<new_leader_ip> --replication-slot=<slot_name>
927```
928
929Otherwise, the replication does not happen, even if the original node gets re-added as a follower node. This re-syncs your secondary site database and may take a long time depending on the amount of data to sync. You may also need to run `gitlab-ctl reconfigure` if replication is still not working after re-syncing.
930
931### Recovering the Patroni cluster
932
933To recover the old primary and rejoin it to the cluster as a replica, you can start Patroni with:
934
935```shell
936sudo gitlab-ctl start patroni
937```
938
939No further configuration or intervention is needed.
940
941### Maintenance procedure for Patroni
942
943With Patroni enabled, you can run planned maintenance on your nodes. To perform maintenance on one node without Patroni, you can put it into maintenance mode with:
944
945```shell
946sudo gitlab-ctl patroni pause
947```
948
949When Patroni runs in a paused mode, it does not change the state of PostgreSQL. After you are done, you can resume Patroni:
950
951```shell
952sudo gitlab-ctl patroni resume
953```
954
955For further details, see [Patroni documentation on this subject](https://patroni.readthedocs.io/en/latest/pause.html).
956
957### Switching from repmgr to Patroni
958
959WARNING:
960Switching from repmgr to Patroni is straightforward, the other way around is *not*. Rolling back from Patroni to repmgr can be complicated and may involve deletion of data directory. If you need to do that, please contact GitLab support.
961
962You can switch an exiting database cluster to use Patroni instead of repmgr with the following steps:
963
9641. Stop repmgr on all replica nodes and lastly with the primary node:
965
966   ```shell
967   sudo gitlab-ctl stop repmgrd
968   ```
969
9701. Stop PostgreSQL on all replica nodes:
971
972   ```shell
973   sudo gitlab-ctl stop postgresql
974   ```
975
976   NOTE:
977   Ensure that there is no `walsender` process running on the primary node.
978   `ps aux | grep walsender` must not show any running process.
979
9801. On the primary node, [configure Patroni](#configuring-patroni-cluster). Remove `repmgr` and any other
981   repmgr-specific configuration. Also remove any configuration that is related to PostgreSQL replication.
9821. [Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) on the primary node.
983   It makes it the leader. You can check this with:
984
985   ```shell
986   sudo gitlab-ctl tail patroni
987   ```
988
9891. Repeat the last two steps for all replica nodes. `gitlab.rb` should look the same on all nodes.
9901. If present, remove the `gitlab_repmgr` database and role on the primary. If you don't delete the `gitlab_repmgr`
991   database, upgrading PostgreSQL 11 to 12 fails with:
992
993   ```plaintext
994   could not load library "$libdir/repmgr_funcs": ERROR:  could not access file "$libdir/repmgr_funcs": No such file or directory
995   ```
996
997### Upgrading PostgreSQL major version in a Patroni cluster
998
999As of GitLab 13.3, PostgreSQL 11.7 and 12.3 are both shipped with Omnibus GitLab by default. As of GitLab 13.7, PostgreSQL 12 is the default. If you want to upgrade to PostgreSQL 12 in versions prior to GitLab 13.7, you must ask for it explicitly.
1000
1001WARNING:
1002The procedure for upgrading PostgreSQL in a Patroni cluster is different than when upgrading using repmgr.
1003The following outlines the key differences and important considerations that need to be accounted for when
1004upgrading PostgreSQL.
1005
1006Here are a few key facts that you must consider before upgrading PostgreSQL:
1007
1008- The main point is that you have to **shut down the Patroni cluster**. This means that your
1009  GitLab deployment is down for the duration of database upgrade or, at least, as long as your leader
1010  node is upgraded. This can be **a significant downtime depending on the size of your database**.
1011
1012- Upgrading PostgreSQL creates a new data directory with a new control data. From Patroni's perspective this is a new cluster that needs to be bootstrapped again. Therefore, as part of the upgrade procedure, the cluster state (stored in Consul) is wiped out. After the upgrade is complete, Patroni bootstraps a new cluster. **This changes your _cluster ID_**.
1013
1014- The procedures for upgrading leader and replicas are not the same. That is why it is important to use the right procedure on each node.
1015
1016- Upgrading a replica node **deletes the data directory and resynchronizes it** from the leader using the
1017  configured replication method (`pg_basebackup` is the only available option). It might take some
1018  time for replica to catch up with the leader, depending on the size of your database.
1019
1020- An overview of the upgrade procedure is outlined in [Patroni's documentation](https://patroni.readthedocs.io/en/latest/existing_data.html#major-upgrade-of-postgresql-version).
1021  You can still use `gitlab-ctl pg-upgrade` which implements this procedure with a few adjustments.
1022
1023Considering these, you should carefully plan your PostgreSQL upgrade:
1024
10251. Find out which node is the leader and which node is a replica:
1026
1027   ```shell
1028   gitlab-ctl patroni members
1029   ```
1030
1031   NOTE:
1032   `gitlab-ctl pg-upgrade` tries to detect the role of the node. If for any reason the auto-detection does not work or you believe it did not detect the role correctly, you can use the `--leader` or `--replica` arguments to manually override it.
1033
10341. Stop Patroni **only on replicas**.
1035
1036   ```shell
1037   sudo gitlab-ctl stop patroni
1038   ```
1039
10401. Enable the maintenance mode on the **application node**:
1041
1042   ```shell
1043   sudo gitlab-ctl deploy-page up
1044   ```
1045
10461. Upgrade PostgreSQL on **the leader node** and make sure that the upgrade is completed successfully:
1047
1048   ```shell
1049   sudo gitlab-ctl pg-upgrade -V 12
1050   ```
1051
10521. Check the status of the leader and cluster. You can proceed only if you have a healthy leader:
1053
1054   ```shell
1055   gitlab-ctl patroni check-leader
1056
1057   # OR
1058
1059   gitlab-ctl patroni members
1060   ```
1061
10621. You can now disable the maintenance mode on the **application node**:
1063
1064   ```shell
1065   sudo gitlab-ctl deploy-page down
1066   ```
1067
10681. Upgrade PostgreSQL **on replicas** (you can do this in parallel on all of them):
1069
1070   ```shell
1071   sudo gitlab-ctl pg-upgrade -V 12
1072   ```
1073
1074NOTE:
1075Reverting the PostgreSQL upgrade with `gitlab-ctl revert-pg-upgrade` has the same considerations as
1076`gitlab-ctl pg-upgrade`. You should follow the same procedure by first stopping the replicas,
1077then reverting the leader, and finally reverting the replicas.
1078
1079## Troubleshooting
1080
1081### Consul and PostgreSQL changes not taking effect
1082
1083Due to the potential impacts, `gitlab-ctl reconfigure` only reloads Consul and PostgreSQL, it does not restart the services. However, not all changes can be activated by reloading.
1084
1085To restart either service, run `gitlab-ctl restart SERVICE`
1086
1087For PostgreSQL, it is usually safe to restart the leader node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done.
1088
1089On the Consul server nodes, it is important to [restart the Consul service](../consul.md#restart-consul) in a controlled manner.
1090
1091### PgBouncer error `ERROR: pgbouncer cannot connect to server`
1092
1093You may get this error when running `gitlab-rake gitlab:db:configure` or you
1094may see the error in the PgBouncer log file.
1095
1096```plaintext
1097PG::ConnectionBad: ERROR:  pgbouncer cannot connect to server
1098```
1099
1100The problem may be that your PgBouncer node's IP address is not included in the
1101`trust_auth_cidr_addresses` setting in `/etc/gitlab/gitlab.rb` on the database nodes.
1102
1103You can confirm that this is the issue by checking the PostgreSQL log on the leader
1104database node. If you see the following error then `trust_auth_cidr_addresses`
1105is the problem.
1106
1107```plaintext
11082018-03-29_13:59:12.11776 FATAL:  no pg_hba.conf entry for host "123.123.123.123", user "pgbouncer", database "gitlabhq_production", SSL off
1109```
1110
1111To fix the problem, add the IP address to `/etc/gitlab/gitlab.rb`.
1112
1113```ruby
1114postgresql['trust_auth_cidr_addresses'] = %w(123.123.123.123/32 <other_cidrs>)
1115```
1116
1117[Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
1118
1119### Reinitialize a replica
1120
1121If replication is not occurring, it may be necessary to reinitialize a replica.
1122
11231. On any server in the cluster, determine the Cluster and Member names,
1124   and check the replication lag by running `gitlab-ctl patroni members`. Here is an example:
1125
1126   ```plaintext
1127   + Cluster: postgresql-ha (6970678148837286213) ------+---------+---------+----+-----------+
1128   | Member                              | Host         | Role    | State   | TL | Lag in MB |
1129   +-------------------------------------+--------------+---------+---------+----+-----------+
1130   | gitlab-database-1.example.com       | 172.18.0.111 | Replica | running |  5 |         0 |
1131   | gitlab-database-2.example.com       | 172.18.0.112 | Replica | running |  5 |       100 |
1132   | gitlab-database-3.example.com       | 172.18.0.113 | Leader  | running |  5 |           |
1133   +-------------------------------------+--------------+---------+---------+----+-----------+
1134   ```
1135
11361. Reinitialize the affected replica server:
1137
1138   ```plaintext
1139   gitlab-ctl patroni reinitialize-replica postgresql-ha gitlab-database-2.example.com
1140   ```
1141
1142### Reset the Patroni state in Consul
1143
1144WARNING:
1145This is a destructive process and may lead the cluster into a bad state. Make sure that you have a healthy backup before running this process.
1146
1147As a last resort, if your Patroni cluster is in an unknown or bad state and no node can start, you can
1148reset the Patroni state in Consul completely, resulting in a reinitialized Patroni cluster when
1149the first Patroni node starts.
1150
1151To reset the Patroni state in Consul:
1152
11531. Take note of the Patroni node that was the leader, or that the application thinks is the current leader, if the current state shows more than one, or none. One way to do this is to look on the PgBouncer nodes in `/var/opt/gitlab/consul/databases.ini`, which contains the hostname of the current leader.
11541. Stop Patroni on all nodes:
1155
1156   ```shell
1157   sudo gitlab-ctl stop patroni
1158   ```
1159
11601. Reset the state in Consul:
1161
1162   ```shell
1163   /opt/gitlab/embedded/bin/consul kv delete -recurse /service/postgresql-ha/
1164   ```
1165
11661. Start one Patroni node, which initializes the Patroni cluster to elect as a leader.
1167   It's highly recommended to start the previous leader (noted in the first step),
1168   so as to not lose existing writes that may have not been replicated because
1169   of the broken cluster state:
1170
1171   ```shell
1172   sudo gitlab-ctl start patroni
1173   ```
1174
11751. Start all other Patroni nodes that join the Patroni cluster as replicas:
1176
1177   ```shell
1178   sudo gitlab-ctl start patroni
1179   ```
1180
1181If you are still seeing issues, the next step is restoring the last healthy backup.
1182
1183### Errors in the Patroni log about a `pg_hba.conf` entry for `127.0.0.1`
1184
1185The following log entry in the Patroni log indicates the replication is not working
1186and a configuration change is needed:
1187
1188```plaintext
1189FATAL:  no pg_hba.conf entry for replication connection from host "127.0.0.1", user "gitlab_replicator"
1190```
1191
1192To fix the problem, ensure the loopback interface is included in the CIDR addresses list:
1193
11941. Edit `/etc/gitlab/gitlab.rb`:
1195
1196   ```ruby
1197   postgresql['trust_auth_cidr_addresses'] = %w(<other_cidrs> 127.0.0.1/32)
1198   ```
1199
12001. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
12011. Check that [all the replicas are synchronized](#check-replication-status)
1202
1203### Errors in Patroni logs: the requested start point is ahead of the Write Ahead Log (WAL) flush position
1204
1205This error indicates that the database is not replicating:
1206
1207```plaintext
1208FATAL:  could not receive data from WAL stream: ERROR:  requested starting point 0/5000000 is ahead of the WAL flush position of this server 0/4000388
1209```
1210
1211This example error is from a replica that was initially misconfigured, and had never replicated.
1212
1213Fix it [by reinitializing the replica](#reinitialize-a-replica).
1214
1215### Patroni fails to start with `MemoryError`
1216
1217Patroni may fail to start, logging an error and stack trace:
1218
1219```plaintext
1220MemoryError
1221Traceback (most recent call last):
1222  File "/opt/gitlab/embedded/bin/patroni", line 8, in <module>
1223    sys.exit(main())
1224[..]
1225  File "/opt/gitlab/embedded/lib/python3.7/ctypes/__init__.py", line 273, in _reset_cache
1226    CFUNCTYPE(c_int)(lambda: None)
1227```
1228
1229If the stack trace ends with `CFUNCTYPE(c_int)(lambda: None)`, this code triggers `MemoryError`
1230if the Linux server has been hardened for security.
1231
1232The code causes Python to write temporary executable files, and if it cannot find a file system in which to do this. For example, if `noexec` is set on the `/tmp` file system, it fails with `MemoryError` ([read more in the issue](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6184)).
1233
1234Workarounds:
1235
1236- Remove `noexec` from the mount options for filesystems like `/tmp` and `/var/tmp`.
1237- If set to enforcing, SELinux may also prevent these operations. Verify the issue is fixed by setting
1238  SELinux to permissive.
1239
1240Patroni has been shipping with Omnibus GitLab since 13.1, along with a build of Python 3.7.
1241Workarounds should stop being required when GitLab 14.x starts shipping with
1242[a later version of Python](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6164) as
1243the code which causes this was removed from Python 3.8.
1244
1245### Issues with other components
1246
1247If you're running into an issue with a component not outlined here, be sure to check the troubleshooting section of their specific documentation page:
1248
1249- [Consul](../consul.md#troubleshooting-consul)
1250- [PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#troubleshooting)
1251