1---
2stage: Create
3group: Gitaly
4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
5type: reference
6---
7
8# Troubleshooting Gitaly and Gitaly Cluster **(FREE SELF)**
9
10Refer to the information below when troubleshooting Gitaly and Gitaly Cluster.
11
12Before troubleshooting, see the Gitaly and Gitaly Cluster
13[frequently asked questions](faq.md).
14
15## Troubleshoot Gitaly
16
17The following sections provide possible solutions to Gitaly errors.
18
19See also [Gitaly timeout](../../user/admin_area/settings/gitaly_timeouts.md) settings.
20
21### Check versions when using standalone Gitaly servers
22
23When using standalone Gitaly servers, you must make sure they are the same version
24as GitLab to ensure full compatibility:
25
261. On the top bar, select **Menu > Admin** on your GitLab instance.
271. On the left sidebar, select **Overview > Gitaly Servers**.
281. Confirm all Gitaly servers indicate that they are up to date.
29
30### Use `gitaly-debug`
31
32The `gitaly-debug` command provides "production debugging" tools for Gitaly and Git
33performance. It is intended to help production engineers and support
34engineers investigate Gitaly performance problems.
35
36If you're using GitLab 11.6 or newer, this tool should be installed on
37your GitLab or Gitaly server already at `/opt/gitlab/embedded/bin/gitaly-debug`.
38If you're investigating an older GitLab version you can compile this
39tool offline and copy the executable to your server:
40
41```shell
42git clone https://gitlab.com/gitlab-org/gitaly.git
43cd cmd/gitaly-debug
44GOOS=linux GOARCH=amd64 go build -o gitaly-debug
45```
46
47To see the help page of `gitaly-debug` for a list of supported sub-commands, run:
48
49```shell
50gitaly-debug -h
51```
52
53### Commits, pushes, and clones return a 401
54
55```plaintext
56remote: GitLab: 401 Unauthorized
57```
58
59You need to sync your `gitlab-secrets.json` file with your GitLab
60application nodes.
61
62### Client side gRPC logs
63
64Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC
65client has its own log file which may contain useful information when
66you are seeing Gitaly errors. You can control the log level of the
67gRPC client with the `GRPC_LOG_LEVEL` environment variable. The
68default level is `WARN`.
69
70You can run a gRPC trace with:
71
72```shell
73sudo GRPC_TRACE=all GRPC_VERBOSITY=DEBUG gitlab-rake gitlab:gitaly:check
74```
75
76### Server side gRPC logs
77
78gRPC tracing can also be enabled in Gitaly itself with the `GODEBUG=http2debug`
79environment variable. To set this in an Omnibus GitLab install:
80
811. Add the following to your `gitlab.rb` file:
82
83   ```ruby
84   gitaly['env'] = {
85     "GODEBUG=http2debug" => "2"
86   }
87   ```
88
891. [Reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure) GitLab.
90
91### Correlating Git processes with RPCs
92
93Sometimes you need to find out which Gitaly RPC created a particular Git process.
94
95One method for doing this is by using `DEBUG` logging. However, this needs to be enabled
96ahead of time and the logs produced are quite verbose.
97
98A lightweight method for doing this correlation is by inspecting the environment
99of the Git process (using its `PID`) and looking at the `CORRELATION_ID` variable:
100
101```shell
102PID=<Git process ID>
103sudo cat /proc/$PID/environ | tr '\0' '\n' | grep ^CORRELATION_ID=
104```
105
106This method isn't reliable for `git cat-file` processes, because Gitaly
107internally pools and re-uses those across RPCs.
108
109### Observing `gitaly-ruby` traffic
110
111[`gitaly-ruby`](configure_gitaly.md#gitaly-ruby) is an internal implementation detail of Gitaly,
112so, there's not that much visibility into what goes on inside
113`gitaly-ruby` processes.
114
115If you have Prometheus set up to scrape your Gitaly process, you can see
116request rates and error codes for individual RPCs in `gitaly-ruby` by
117querying `grpc_client_handled_total`.
118
119- In theory, this metric does not differentiate between `gitaly-ruby` and other RPCs.
120- In practice from GitLab 11.9, all gRPC calls made by Gitaly itself are internal calls from the
121  main Gitaly process to one of its `gitaly-ruby` sidecars.
122
123Assuming your `grpc_client_handled_total` counter only observes Gitaly,
124the following query shows you RPCs are (most likely) internally
125implemented as calls to `gitaly-ruby`:
126
127```prometheus
128sum(rate(grpc_client_handled_total[5m])) by (grpc_method) > 0
129```
130
131### Repository changes fail with a `401 Unauthorized` error
132
133If you run Gitaly on its own server and notice these conditions:
134
135- Users can successfully clone and fetch repositories by using both SSH and HTTPS.
136- Users can't push to repositories, or receive a `401 Unauthorized` message when attempting to
137  make changes to them in the web UI.
138
139Gitaly may be failing to authenticate with the Gitaly client because it has the
140[wrong secrets file](configure_gitaly.md#configure-gitaly-servers).
141
142Confirm the following are all true:
143
144- When any user performs a `git push` to any repository on this Gitaly server, it
145  fails with a `401 Unauthorized` error:
146
147  ```shell
148  remote: GitLab: 401 Unauthorized
149  To <REMOTE_URL>
150  ! [remote rejected] branch-name -> branch-name (pre-receive hook declined)
151  error: failed to push some refs to '<REMOTE_URL>'
152  ```
153
154- When any user adds or modifies a file from the repository using the GitLab
155  UI, it immediately fails with a red `401 Unauthorized` banner.
156- Creating a new project and [initializing it with a README](../../user/project/working_with_projects.md#create-a-blank-project)
157  successfully creates the project but doesn't create the README.
158- When [tailing the logs](https://docs.gitlab.com/omnibus/settings/logs.html#tail-logs-in-a-console-on-the-server)
159  on a Gitaly client and reproducing the error, you get `401` errors
160  when reaching the [`/api/v4/internal/allowed`](../../development/internal_api/index.md) endpoint:
161
162  ```shell
163  # api_json.log
164  {
165    "time": "2019-07-18T00:30:14.967Z",
166    "severity": "INFO",
167    "duration": 0.57,
168    "db": 0,
169    "view": 0.57,
170    "status": 401,
171    "method": "POST",
172    "path": "\/api\/v4\/internal\/allowed",
173    "params": [
174      {
175        "key": "action",
176        "value": "git-receive-pack"
177      },
178      {
179        "key": "changes",
180        "value": "REDACTED"
181      },
182      {
183        "key": "gl_repository",
184        "value": "REDACTED"
185      },
186      {
187        "key": "project",
188        "value": "\/path\/to\/project.git"
189      },
190      {
191        "key": "protocol",
192        "value": "web"
193      },
194      {
195        "key": "env",
196        "value": "{\"GIT_ALTERNATE_OBJECT_DIRECTORIES\":[],\"GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE\":[],\"GIT_OBJECT_DIRECTORY\":null,\"GIT_OBJECT_DIRECTORY_RELATIVE\":null}"
197      },
198      {
199        "key": "user_id",
200        "value": "2"
201      },
202      {
203        "key": "secret_token",
204        "value": "[FILTERED]"
205      }
206    ],
207    "host": "gitlab.example.com",
208    "ip": "REDACTED",
209    "ua": "Ruby",
210    "route": "\/api\/:version\/internal\/allowed",
211    "queue_duration": 4.24,
212    "gitaly_calls": 0,
213    "gitaly_duration": 0,
214    "correlation_id": "XPUZqTukaP3"
215  }
216
217  # nginx_access.log
218  [IP] - - [18/Jul/2019:00:30:14 +0000] "POST /api/v4/internal/allowed HTTP/1.1" 401 30 "" "Ruby"
219  ```
220
221To fix this problem, confirm that your [`gitlab-secrets.json` file](configure_gitaly.md#configure-gitaly-servers)
222on the Gitaly server matches the one on Gitaly client. If it doesn't match,
223update the secrets file on the Gitaly server to match the Gitaly client, then
224[reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure).
225
226### Repository pushes fail with a `deny updating a hidden ref` error
227
228Due to [a change](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3426)
229introduced in GitLab 13.12, Gitaly has read-only, internal GitLab references that users are not
230permitted to update. If you attempt to update internal references with `git push --mirror`, Git
231returns the rejection error, `deny updating a hidden ref`.
232
233The following references are read-only:
234
235- refs/environments/
236- refs/keep-around/
237- refs/merge-requests/
238- refs/pipelines/
239
240To mirror-push branches and tags only, and avoid attempting to mirror-push protected refs, run:
241
242```shell
243git push origin +refs/heads/*:refs/heads/* +refs/tags/*:refs/tags/*
244```
245
246Any other namespaces that the administrator wants to push can be included there as well via additional patterns.
247
248### Command line tools cannot connect to Gitaly
249
250gRPC cannot reach your Gitaly server if:
251
252- You can't connect to a Gitaly server with command-line tools.
253- Certain actions result in a `14: Connect Failed` error message.
254
255Verify you can reach Gitaly by using TCP:
256
257```shell
258sudo gitlab-rake gitlab:tcp_check[GITALY_SERVER_IP,GITALY_LISTEN_PORT]
259```
260
261If the TCP connection:
262
263- Fails, check your network settings and your firewall rules.
264- Succeeds, your networking and firewall rules are correct.
265
266If you use proxy servers in your command line environment such as Bash, these can interfere with
267your gRPC traffic.
268
269If you use Bash or a compatible command line environment, run the following commands to determine
270whether you have proxy servers configured:
271
272```shell
273echo $http_proxy
274echo $https_proxy
275```
276
277If either of these variables have a value, your Gitaly CLI connections may be getting routed through
278a proxy which cannot connect to Gitaly.
279
280To remove the proxy setting, run the following commands (depending on which variables had values):
281
282```shell
283unset http_proxy
284unset https_proxy
285```
286
287### Permission denied errors appearing in Gitaly or Praefect logs when accessing repositories
288
289You might see the following in Gitaly and Praefect logs:
290
291```shell
292{
293  ...
294  "error":"rpc error: code = PermissionDenied desc = permission denied",
295  "grpc.code":"PermissionDenied",
296  "grpc.meta.client_name":"gitlab-web",
297  "grpc.request.fullMethod":"/gitaly.ServerService/ServerInfo",
298  "level":"warning",
299  "msg":"finished unary call with code PermissionDenied",
300  ...
301}
302```
303
304This is a GRPC call
305[error response code](https://grpc.github.io/grpc/core/md_doc_statuscodes.html).
306
307If this error occurs, even though
308[the Gitaly auth tokens are set up correctly](#praefect-errors-in-logs),
309it's likely that the Gitaly servers are experiencing
310[clock drift](https://en.wikipedia.org/wiki/Clock_drift).
311
312Ensure the Gitaly clients and servers are synchronized, and use an NTP time
313server to keep them synchronized.
314
315### Gitaly not listening on new address after reconfiguring
316
317When updating the `gitaly['listen_addr']` or `gitaly['prometheus_listen_addr']` values, Gitaly may
318continue to listen on the old address after a `sudo gitlab-ctl reconfigure`.
319
320When this occurs, run `sudo gitlab-ctl restart` to resolve the issue. This should no longer be
321necessary because [this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2521) is resolved.
322
323### Permission denied errors appearing in Gitaly logs when accessing repositories from a standalone Gitaly node
324
325If this error occurs even though file permissions are correct, it's likely that the Gitaly node is
326experiencing [clock drift](https://en.wikipedia.org/wiki/Clock_drift).
327
328Please ensure that the GitLab and Gitaly nodes are synchronized and use an NTP time
329server to keep them synchronized if possible.
330
331### Health check warnings
332
333The following warning in `/var/log/gitlab/praefect/current` can be ignored.
334
335```plaintext
336"error":"full method name not found: /grpc.health.v1.Health/Check",
337"msg":"error when looking up method info"
338```
339
340### File not found errors
341
342The following errors in `/var/log/gitlab/gitaly/current` can be ignored.
343They are caused by the GitLab Rails application checking for specific files
344that do not exist in a repository.
345
346```plaintext
347"error":"not found: .gitlab/route-map.yml"
348"error":"not found: Dockerfile"
349"error":"not found: .gitlab-ci.yml"
350```
351
352## Troubleshoot Praefect (Gitaly Cluster)
353
354The following sections provide possible solutions to Gitaly Cluster errors.
355
356### Check cluster health
357
358> [Introduced](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/) in GitLab 14.6.
359
360The `check` Praefect sub-command runs a series of checks to determine the health of the Gitaly Cluster.
361
362```shell
363gitlab-ctl praefect check
364```
365
366The following sections describe the checks that are run.
367
368#### Praefect migrations
369
370Because Database migrations must be up to date for Praefect to work correctly, checks if Praefect migrations are up to date.
371
372If this check fails:
373
3741. See the `schema_migrations` table in the database to see which migrations have run.
3751. Run `praefect sql-migrate` to bring the migrations up to date.
376
377#### Node connectivity and disk access
378
379Checks if Praefect can reach all of its Gitaly nodes, and if each Gitaly node has read and write access to all of its storages.
380
381If this check fails:
382
3831. Confirm the network addresses and tokens are set up correctly:
384   - In the Praefect configuration.
385   - In each Gitaly node's configuration.
3861. On the Gitaly nodes, check that the `gitaly` process being run as `git`. There might be a permissions issue that is preventing Gitaly from
387   accessing its storage directories.
3881. Confirm that there are no issues with the network that connects Praefect to Gitaly nodes.
389
390#### Database read and write access
391
392Checks if Praefect can read from and write to the database.
393
394If this check fails:
395
3961. See if the Praefect database is in recovery mode. In recovery mode, tables may be read only. To check, run:
397
398   ```sql
399   select pg_is_in_recovery()
400   ```
401
4021. Confirm that the user that Praefect uses to connect to PostgreSQL has read and write access to the database.
4031. See if the database has been placed into read-only mode. To check, run:
404
405   ```sql
406   show default_transaction_read_only
407   ```
408
409#### Inaccessible repositories
410
411Checks how many repositories are inaccessible because they are missing a primary assignment, or their primary is unavailable.
412
413If this check fails:
414
4151. See if any Gitaly nodes are down. Run `praefect ping-nodes` to check.
4161. Check if there is a high load on the Praefect database. If the Praefect database is slow to respond, it can lead health checks failing to persist
417   to the database, leading Praefect to think nodes are unhealthy.
418
419### Praefect errors in logs
420
421If you receive an error, check `/var/log/gitlab/gitlab-rails/production.log`.
422
423Here are common errors and potential causes:
424
425- 500 response code
426  - **ActionView::Template::Error (7:permission denied)**
427    - `praefect['auth_token']` and `gitlab_rails['gitaly_token']` do not match on the GitLab server.
428  - **Unable to save project. Error: 7:permission denied**
429    - Secret token in `praefect['storage_nodes']` on GitLab server does not match the
430      value in `gitaly['auth_token']` on one or more Gitaly servers.
431- 503 response code
432  - **GRPC::Unavailable (14:failed to connect to all addresses)**
433    - GitLab was unable to reach Praefect.
434  - **GRPC::Unavailable (14:all SubCons are in TransientFailure...)**
435    - Praefect cannot reach one or more of its child Gitaly nodes. Try running
436      the Praefect connection checker to diagnose.
437
438### Determine primary Gitaly node
439
440To determine the primary node of a repository:
441
442- In GitLab 14.6 and later, use the [`praefect metadata`](#view-repository-metadata) subcommand.
443- In GitLab 13.12 to GitLab 14.5 with [repository-specific primaries](praefect.md#repository-specific-primary-nodes),
444  use the [`gitlab:praefect:replicas` Rake task](../raketasks/praefect.md#replica-checksums).
445- With legacy election strategies in GitLab 13.12 and earlier, the primary was the same for all repositories in a virtual storage.
446  To determine the current primary Gitaly node for a specific virtual storage:
447
448  - Use the `Shard Primary Election` [Grafana chart](praefect.md#grafana) on the
449    [`Gitlab Omnibus - Praefect` dashboard](https://gitlab.com/gitlab-org/grafana-dashboards/-/blob/master/omnibus/praefect.json).
450    This is recommended.
451  - If you do not have Grafana set up, use the following command on each host of each
452    Praefect node:
453
454    ```shell
455    curl localhost:9652/metrics | grep gitaly_praefect_primaries`
456    ```
457
458### View repository metadata
459
460> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3481) in GitLab 14.6.
461
462Gitaly Cluster maintains a [metadata database](index.md#components) about the repositories stored on the cluster. Use the `praefect metadata` subcommand
463to inspect the metadata for troubleshooting.
464
465You can retrieve a repository's metadata by its Praefect-assigned repository ID:
466
467```shell
468sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml metadata -repository-id <repository-id>
469```
470
471You can also retrieve a repository's metadata by its virtual storage and relative path:
472
473```shell
474sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml metadata -virtual-storage <virtual-storage> -relative-path <relative-path>
475```
476
477#### Examples
478
479To retrieve the metadata for a repository with a Praefect-assigned repository ID of 1:
480
481```shell
482sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml metadata -repository-id 1
483```
484
485To retrieve the metadata for a repository with virtual storage `default` and relative path `@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git`:
486
487```shell
488sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml metadata -virtual-storage default -relative-path @hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git
489```
490
491Either of these examples retrieve the following metadata for an example repository:
492
493```plaintext
494Repository ID: 54771
495Virtual Storage: "default"
496Relative Path: "@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git"
497Replica Path: "@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git"
498Primary: "gitaly-1"
499Generation: 1
500Replicas:
501- Storage: "gitaly-1"
502  Assigned: true
503  Generation: 1, fully up to date
504  Healthy: true
505  Valid Primary: true
506- Storage: "gitaly-2"
507  Assigned: true
508  Generation: 0, behind by 1 changes
509  Healthy: true
510  Valid Primary: false
511- Storage: "gitaly-3"
512  Assigned: true
513  Generation: replica not yet created
514  Healthy: false
515  Valid Primary: false
516```
517
518#### Available metadata
519
520The metadata retrieved by `praefect metadata` includes the fields in the following tables.
521
522| Field             | Description                                                                                                        |
523|:------------------|:-------------------------------------------------------------------------------------------------------------------|
524| `Repository ID`   | Permanent unique ID assigned to the repository by Praefect. Different to the ID GitLab uses for repositories.      |
525| `Virtual Storage` | Name of the virtual storage the repository is stored in.                                                           |
526| `Relative Path`   | Repository's path in the virtual storage.                                                                          |
527| `Replica Path`    | Where on the Gitaly node's disk the repository's replicas are stored.                                                |
528| `Primary`         | Current primary of the repository.                                                                                 |
529| `Generation`      | Used by Praefect to track repository changes. Each write in the repository increments the repository's generation. |
530| `Replicas`        | A list of replicas that exist or are expected to exist.                                                            |
531
532For each replica, the following metadata is available:
533
534| `Replicas` Field | Description                                                                                                                                                                                                                                                                                                                                                                                                                                            |
535|:-----------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
536| `Storage`        | Name of the Gitaly storage that contains the replica.                                                                                                                                                                                                                                                                                                                                                                                                  |
537| `Assigned`       | Indicates whether the replica is expected to exist in the storage. Can be `false` if a Gitaly node is removed from the cluster or if the storage contains an extra copy after the repository's replication factor was decreased.                                                                                                                                                                                                                       |
538| `Generation`     | Latest confirmed generation of the replica. It indicates:<br><br>- The replica is fully up to date if the generation matches the repository's generation.<br>- The replica is outdated if the replica's generation is less than the repository's generation.<br>- `replica not yet created` if the replica does not yet exist at all on the storage.                                                                                                          |
539| `Healthy`        | Indicates whether the Gitaly node that is hosting this replica is considered healthy by the consensus of Praefect nodes.                                                                                                                                                                                                                                                                                                                               |
540| `Valid Primary`  | Indicates whether the replica is fit to serve as the primary node. If the repository's primary is not a valid primary, a failover occurs on the next write to the repository if there is another replica that is a valid primary. A replica is a valid primary if:<br><br>- It is stored on a healthy Gitaly node.<br>- It is fully up to date.<br>- It is not targeted by a pending deletion job from decreasing replication factor.<br>- It is assigned. |
541
542### Check that repositories are in sync
543
544Is [some cases](index.md#known-issues) the Praefect database can get out of sync with the underlying Gitaly nodes. To check that
545a given repository is fully synced on all nodes, run the [`gitlab:praefect:replicas` Rake task](../raketasks/praefect.md#replica-checksums)
546that checksums the repository on all Gitaly nodes.
547
548The [Praefect dataloss](recovery.md#check-for-data-loss) command only checks the state of the repo in the Praefect database, and cannot
549be relied to detect sync problems in this scenario.
550
551### Relation does not exist errors
552
553By default Praefect database tables are created automatically by `gitlab-ctl reconfigure` task.
554
555However, the Praefect database tables are not created on initial reconfigure and can throw
556errors that relations do not exist if either:
557
558- The `gitlab-ctl reconfigure` command isn't executed.
559- There are errors during the execution.
560
561For example:
562
563- `ERROR:  relation "node_status" does not exist at character 13`
564- `ERROR:  relation "replication_queue_lock" does not exist at character 40`
565- This error:
566
567  ```json
568  {"level":"error","msg":"Error updating node: pq: relation \"node_status\" does not exist","pid":210882,"praefectName":"gitlab1x4m:0.0.0.0:2305","time":"2021-04-01T19:26:19.473Z","virtual_storage":"praefect-cluster-1"}
569  ```
570
571To solve this, the database schema migration can be done using `sql-migrate` sub-command of
572the `praefect` command:
573
574```shell
575$ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-migrate
576praefect sql-migrate: OK (applied 21 migrations)
577```
578
579### Requests fail with 'repository scoped: invalid Repository' errors
580
581This indicates that the virtual storage name used in the
582[Praefect configuration](praefect.md#praefect) does not match the storage name used in
583[`git_data_dirs` setting](praefect.md#gitaly) for GitLab.
584
585Resolve this by matching the virtual storage names used in Praefect and GitLab configuration.
586
587### Gitaly Cluster performance issues on cloud platforms
588
589Praefect does not require a lot of CPU or memory, and can run on small virtual machines.
590Cloud services may place other limits on the resources that small VMs can use, such as
591disk IO and network traffic.
592
593Praefect nodes generate a lot of network traffic. The following symptoms can be observed if their network bandwidth has
594been throttled by the cloud service:
595
596- Poor performance of Git operations.
597- High network latency.
598- High memory use by Praefect.
599
600Possible solutions:
601
602- Provision larger VMs to gain access to larger network traffic allowances.
603- Use your cloud service's monitoring and logging to check that the Praefect nodes are not exhausting their traffic allowances.
604