1--- 2stage: Create 3group: Gitaly 4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments 5type: reference 6--- 7 8# Troubleshooting Gitaly and Gitaly Cluster **(FREE SELF)** 9 10Refer to the information below when troubleshooting Gitaly and Gitaly Cluster. 11 12Before troubleshooting, see the Gitaly and Gitaly Cluster 13[frequently asked questions](faq.md). 14 15## Troubleshoot Gitaly 16 17The following sections provide possible solutions to Gitaly errors. 18 19See also [Gitaly timeout](../../user/admin_area/settings/gitaly_timeouts.md) settings. 20 21### Check versions when using standalone Gitaly servers 22 23When using standalone Gitaly servers, you must make sure they are the same version 24as GitLab to ensure full compatibility: 25 261. On the top bar, select **Menu > Admin** on your GitLab instance. 271. On the left sidebar, select **Overview > Gitaly Servers**. 281. Confirm all Gitaly servers indicate that they are up to date. 29 30### Use `gitaly-debug` 31 32The `gitaly-debug` command provides "production debugging" tools for Gitaly and Git 33performance. It is intended to help production engineers and support 34engineers investigate Gitaly performance problems. 35 36If you're using GitLab 11.6 or newer, this tool should be installed on 37your GitLab or Gitaly server already at `/opt/gitlab/embedded/bin/gitaly-debug`. 38If you're investigating an older GitLab version you can compile this 39tool offline and copy the executable to your server: 40 41```shell 42git clone https://gitlab.com/gitlab-org/gitaly.git 43cd cmd/gitaly-debug 44GOOS=linux GOARCH=amd64 go build -o gitaly-debug 45``` 46 47To see the help page of `gitaly-debug` for a list of supported sub-commands, run: 48 49```shell 50gitaly-debug -h 51``` 52 53### Commits, pushes, and clones return a 401 54 55```plaintext 56remote: GitLab: 401 Unauthorized 57``` 58 59You need to sync your `gitlab-secrets.json` file with your GitLab 60application nodes. 61 62### Client side gRPC logs 63 64Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC 65client has its own log file which may contain useful information when 66you are seeing Gitaly errors. You can control the log level of the 67gRPC client with the `GRPC_LOG_LEVEL` environment variable. The 68default level is `WARN`. 69 70You can run a gRPC trace with: 71 72```shell 73sudo GRPC_TRACE=all GRPC_VERBOSITY=DEBUG gitlab-rake gitlab:gitaly:check 74``` 75 76### Server side gRPC logs 77 78gRPC tracing can also be enabled in Gitaly itself with the `GODEBUG=http2debug` 79environment variable. To set this in an Omnibus GitLab install: 80 811. Add the following to your `gitlab.rb` file: 82 83 ```ruby 84 gitaly['env'] = { 85 "GODEBUG=http2debug" => "2" 86 } 87 ``` 88 891. [Reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure) GitLab. 90 91### Correlating Git processes with RPCs 92 93Sometimes you need to find out which Gitaly RPC created a particular Git process. 94 95One method for doing this is by using `DEBUG` logging. However, this needs to be enabled 96ahead of time and the logs produced are quite verbose. 97 98A lightweight method for doing this correlation is by inspecting the environment 99of the Git process (using its `PID`) and looking at the `CORRELATION_ID` variable: 100 101```shell 102PID=<Git process ID> 103sudo cat /proc/$PID/environ | tr '\0' '\n' | grep ^CORRELATION_ID= 104``` 105 106This method isn't reliable for `git cat-file` processes, because Gitaly 107internally pools and re-uses those across RPCs. 108 109### Observing `gitaly-ruby` traffic 110 111[`gitaly-ruby`](configure_gitaly.md#gitaly-ruby) is an internal implementation detail of Gitaly, 112so, there's not that much visibility into what goes on inside 113`gitaly-ruby` processes. 114 115If you have Prometheus set up to scrape your Gitaly process, you can see 116request rates and error codes for individual RPCs in `gitaly-ruby` by 117querying `grpc_client_handled_total`. 118 119- In theory, this metric does not differentiate between `gitaly-ruby` and other RPCs. 120- In practice from GitLab 11.9, all gRPC calls made by Gitaly itself are internal calls from the 121 main Gitaly process to one of its `gitaly-ruby` sidecars. 122 123Assuming your `grpc_client_handled_total` counter only observes Gitaly, 124the following query shows you RPCs are (most likely) internally 125implemented as calls to `gitaly-ruby`: 126 127```prometheus 128sum(rate(grpc_client_handled_total[5m])) by (grpc_method) > 0 129``` 130 131### Repository changes fail with a `401 Unauthorized` error 132 133If you run Gitaly on its own server and notice these conditions: 134 135- Users can successfully clone and fetch repositories by using both SSH and HTTPS. 136- Users can't push to repositories, or receive a `401 Unauthorized` message when attempting to 137 make changes to them in the web UI. 138 139Gitaly may be failing to authenticate with the Gitaly client because it has the 140[wrong secrets file](configure_gitaly.md#configure-gitaly-servers). 141 142Confirm the following are all true: 143 144- When any user performs a `git push` to any repository on this Gitaly server, it 145 fails with a `401 Unauthorized` error: 146 147 ```shell 148 remote: GitLab: 401 Unauthorized 149 To <REMOTE_URL> 150 ! [remote rejected] branch-name -> branch-name (pre-receive hook declined) 151 error: failed to push some refs to '<REMOTE_URL>' 152 ``` 153 154- When any user adds or modifies a file from the repository using the GitLab 155 UI, it immediately fails with a red `401 Unauthorized` banner. 156- Creating a new project and [initializing it with a README](../../user/project/working_with_projects.md#create-a-blank-project) 157 successfully creates the project but doesn't create the README. 158- When [tailing the logs](https://docs.gitlab.com/omnibus/settings/logs.html#tail-logs-in-a-console-on-the-server) 159 on a Gitaly client and reproducing the error, you get `401` errors 160 when reaching the [`/api/v4/internal/allowed`](../../development/internal_api/index.md) endpoint: 161 162 ```shell 163 # api_json.log 164 { 165 "time": "2019-07-18T00:30:14.967Z", 166 "severity": "INFO", 167 "duration": 0.57, 168 "db": 0, 169 "view": 0.57, 170 "status": 401, 171 "method": "POST", 172 "path": "\/api\/v4\/internal\/allowed", 173 "params": [ 174 { 175 "key": "action", 176 "value": "git-receive-pack" 177 }, 178 { 179 "key": "changes", 180 "value": "REDACTED" 181 }, 182 { 183 "key": "gl_repository", 184 "value": "REDACTED" 185 }, 186 { 187 "key": "project", 188 "value": "\/path\/to\/project.git" 189 }, 190 { 191 "key": "protocol", 192 "value": "web" 193 }, 194 { 195 "key": "env", 196 "value": "{\"GIT_ALTERNATE_OBJECT_DIRECTORIES\":[],\"GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE\":[],\"GIT_OBJECT_DIRECTORY\":null,\"GIT_OBJECT_DIRECTORY_RELATIVE\":null}" 197 }, 198 { 199 "key": "user_id", 200 "value": "2" 201 }, 202 { 203 "key": "secret_token", 204 "value": "[FILTERED]" 205 } 206 ], 207 "host": "gitlab.example.com", 208 "ip": "REDACTED", 209 "ua": "Ruby", 210 "route": "\/api\/:version\/internal\/allowed", 211 "queue_duration": 4.24, 212 "gitaly_calls": 0, 213 "gitaly_duration": 0, 214 "correlation_id": "XPUZqTukaP3" 215 } 216 217 # nginx_access.log 218 [IP] - - [18/Jul/2019:00:30:14 +0000] "POST /api/v4/internal/allowed HTTP/1.1" 401 30 "" "Ruby" 219 ``` 220 221To fix this problem, confirm that your [`gitlab-secrets.json` file](configure_gitaly.md#configure-gitaly-servers) 222on the Gitaly server matches the one on Gitaly client. If it doesn't match, 223update the secrets file on the Gitaly server to match the Gitaly client, then 224[reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure). 225 226### Repository pushes fail with a `deny updating a hidden ref` error 227 228Due to [a change](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3426) 229introduced in GitLab 13.12, Gitaly has read-only, internal GitLab references that users are not 230permitted to update. If you attempt to update internal references with `git push --mirror`, Git 231returns the rejection error, `deny updating a hidden ref`. 232 233The following references are read-only: 234 235- refs/environments/ 236- refs/keep-around/ 237- refs/merge-requests/ 238- refs/pipelines/ 239 240To mirror-push branches and tags only, and avoid attempting to mirror-push protected refs, run: 241 242```shell 243git push origin +refs/heads/*:refs/heads/* +refs/tags/*:refs/tags/* 244``` 245 246Any other namespaces that the administrator wants to push can be included there as well via additional patterns. 247 248### Command line tools cannot connect to Gitaly 249 250gRPC cannot reach your Gitaly server if: 251 252- You can't connect to a Gitaly server with command-line tools. 253- Certain actions result in a `14: Connect Failed` error message. 254 255Verify you can reach Gitaly by using TCP: 256 257```shell 258sudo gitlab-rake gitlab:tcp_check[GITALY_SERVER_IP,GITALY_LISTEN_PORT] 259``` 260 261If the TCP connection: 262 263- Fails, check your network settings and your firewall rules. 264- Succeeds, your networking and firewall rules are correct. 265 266If you use proxy servers in your command line environment such as Bash, these can interfere with 267your gRPC traffic. 268 269If you use Bash or a compatible command line environment, run the following commands to determine 270whether you have proxy servers configured: 271 272```shell 273echo $http_proxy 274echo $https_proxy 275``` 276 277If either of these variables have a value, your Gitaly CLI connections may be getting routed through 278a proxy which cannot connect to Gitaly. 279 280To remove the proxy setting, run the following commands (depending on which variables had values): 281 282```shell 283unset http_proxy 284unset https_proxy 285``` 286 287### Permission denied errors appearing in Gitaly or Praefect logs when accessing repositories 288 289You might see the following in Gitaly and Praefect logs: 290 291```shell 292{ 293 ... 294 "error":"rpc error: code = PermissionDenied desc = permission denied", 295 "grpc.code":"PermissionDenied", 296 "grpc.meta.client_name":"gitlab-web", 297 "grpc.request.fullMethod":"/gitaly.ServerService/ServerInfo", 298 "level":"warning", 299 "msg":"finished unary call with code PermissionDenied", 300 ... 301} 302``` 303 304This is a GRPC call 305[error response code](https://grpc.github.io/grpc/core/md_doc_statuscodes.html). 306 307If this error occurs, even though 308[the Gitaly auth tokens are set up correctly](#praefect-errors-in-logs), 309it's likely that the Gitaly servers are experiencing 310[clock drift](https://en.wikipedia.org/wiki/Clock_drift). 311 312Ensure the Gitaly clients and servers are synchronized, and use an NTP time 313server to keep them synchronized. 314 315### Gitaly not listening on new address after reconfiguring 316 317When updating the `gitaly['listen_addr']` or `gitaly['prometheus_listen_addr']` values, Gitaly may 318continue to listen on the old address after a `sudo gitlab-ctl reconfigure`. 319 320When this occurs, run `sudo gitlab-ctl restart` to resolve the issue. This should no longer be 321necessary because [this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2521) is resolved. 322 323### Permission denied errors appearing in Gitaly logs when accessing repositories from a standalone Gitaly node 324 325If this error occurs even though file permissions are correct, it's likely that the Gitaly node is 326experiencing [clock drift](https://en.wikipedia.org/wiki/Clock_drift). 327 328Please ensure that the GitLab and Gitaly nodes are synchronized and use an NTP time 329server to keep them synchronized if possible. 330 331### Health check warnings 332 333The following warning in `/var/log/gitlab/praefect/current` can be ignored. 334 335```plaintext 336"error":"full method name not found: /grpc.health.v1.Health/Check", 337"msg":"error when looking up method info" 338``` 339 340### File not found errors 341 342The following errors in `/var/log/gitlab/gitaly/current` can be ignored. 343They are caused by the GitLab Rails application checking for specific files 344that do not exist in a repository. 345 346```plaintext 347"error":"not found: .gitlab/route-map.yml" 348"error":"not found: Dockerfile" 349"error":"not found: .gitlab-ci.yml" 350``` 351 352## Troubleshoot Praefect (Gitaly Cluster) 353 354The following sections provide possible solutions to Gitaly Cluster errors. 355 356### Check cluster health 357 358> [Introduced](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/) in GitLab 14.6. 359 360The `check` Praefect sub-command runs a series of checks to determine the health of the Gitaly Cluster. 361 362```shell 363gitlab-ctl praefect check 364``` 365 366The following sections describe the checks that are run. 367 368#### Praefect migrations 369 370Because Database migrations must be up to date for Praefect to work correctly, checks if Praefect migrations are up to date. 371 372If this check fails: 373 3741. See the `schema_migrations` table in the database to see which migrations have run. 3751. Run `praefect sql-migrate` to bring the migrations up to date. 376 377#### Node connectivity and disk access 378 379Checks if Praefect can reach all of its Gitaly nodes, and if each Gitaly node has read and write access to all of its storages. 380 381If this check fails: 382 3831. Confirm the network addresses and tokens are set up correctly: 384 - In the Praefect configuration. 385 - In each Gitaly node's configuration. 3861. On the Gitaly nodes, check that the `gitaly` process being run as `git`. There might be a permissions issue that is preventing Gitaly from 387 accessing its storage directories. 3881. Confirm that there are no issues with the network that connects Praefect to Gitaly nodes. 389 390#### Database read and write access 391 392Checks if Praefect can read from and write to the database. 393 394If this check fails: 395 3961. See if the Praefect database is in recovery mode. In recovery mode, tables may be read only. To check, run: 397 398 ```sql 399 select pg_is_in_recovery() 400 ``` 401 4021. Confirm that the user that Praefect uses to connect to PostgreSQL has read and write access to the database. 4031. See if the database has been placed into read-only mode. To check, run: 404 405 ```sql 406 show default_transaction_read_only 407 ``` 408 409#### Inaccessible repositories 410 411Checks how many repositories are inaccessible because they are missing a primary assignment, or their primary is unavailable. 412 413If this check fails: 414 4151. See if any Gitaly nodes are down. Run `praefect ping-nodes` to check. 4161. Check if there is a high load on the Praefect database. If the Praefect database is slow to respond, it can lead health checks failing to persist 417 to the database, leading Praefect to think nodes are unhealthy. 418 419### Praefect errors in logs 420 421If you receive an error, check `/var/log/gitlab/gitlab-rails/production.log`. 422 423Here are common errors and potential causes: 424 425- 500 response code 426 - **ActionView::Template::Error (7:permission denied)** 427 - `praefect['auth_token']` and `gitlab_rails['gitaly_token']` do not match on the GitLab server. 428 - **Unable to save project. Error: 7:permission denied** 429 - Secret token in `praefect['storage_nodes']` on GitLab server does not match the 430 value in `gitaly['auth_token']` on one or more Gitaly servers. 431- 503 response code 432 - **GRPC::Unavailable (14:failed to connect to all addresses)** 433 - GitLab was unable to reach Praefect. 434 - **GRPC::Unavailable (14:all SubCons are in TransientFailure...)** 435 - Praefect cannot reach one or more of its child Gitaly nodes. Try running 436 the Praefect connection checker to diagnose. 437 438### Determine primary Gitaly node 439 440To determine the primary node of a repository: 441 442- In GitLab 14.6 and later, use the [`praefect metadata`](#view-repository-metadata) subcommand. 443- In GitLab 13.12 to GitLab 14.5 with [repository-specific primaries](praefect.md#repository-specific-primary-nodes), 444 use the [`gitlab:praefect:replicas` Rake task](../raketasks/praefect.md#replica-checksums). 445- With legacy election strategies in GitLab 13.12 and earlier, the primary was the same for all repositories in a virtual storage. 446 To determine the current primary Gitaly node for a specific virtual storage: 447 448 - Use the `Shard Primary Election` [Grafana chart](praefect.md#grafana) on the 449 [`Gitlab Omnibus - Praefect` dashboard](https://gitlab.com/gitlab-org/grafana-dashboards/-/blob/master/omnibus/praefect.json). 450 This is recommended. 451 - If you do not have Grafana set up, use the following command on each host of each 452 Praefect node: 453 454 ```shell 455 curl localhost:9652/metrics | grep gitaly_praefect_primaries` 456 ``` 457 458### View repository metadata 459 460> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3481) in GitLab 14.6. 461 462Gitaly Cluster maintains a [metadata database](index.md#components) about the repositories stored on the cluster. Use the `praefect metadata` subcommand 463to inspect the metadata for troubleshooting. 464 465You can retrieve a repository's metadata by its Praefect-assigned repository ID: 466 467```shell 468sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml metadata -repository-id <repository-id> 469``` 470 471You can also retrieve a repository's metadata by its virtual storage and relative path: 472 473```shell 474sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml metadata -virtual-storage <virtual-storage> -relative-path <relative-path> 475``` 476 477#### Examples 478 479To retrieve the metadata for a repository with a Praefect-assigned repository ID of 1: 480 481```shell 482sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml metadata -repository-id 1 483``` 484 485To retrieve the metadata for a repository with virtual storage `default` and relative path `@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git`: 486 487```shell 488sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml metadata -virtual-storage default -relative-path @hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git 489``` 490 491Either of these examples retrieve the following metadata for an example repository: 492 493```plaintext 494Repository ID: 54771 495Virtual Storage: "default" 496Relative Path: "@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git" 497Replica Path: "@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git" 498Primary: "gitaly-1" 499Generation: 1 500Replicas: 501- Storage: "gitaly-1" 502 Assigned: true 503 Generation: 1, fully up to date 504 Healthy: true 505 Valid Primary: true 506- Storage: "gitaly-2" 507 Assigned: true 508 Generation: 0, behind by 1 changes 509 Healthy: true 510 Valid Primary: false 511- Storage: "gitaly-3" 512 Assigned: true 513 Generation: replica not yet created 514 Healthy: false 515 Valid Primary: false 516``` 517 518#### Available metadata 519 520The metadata retrieved by `praefect metadata` includes the fields in the following tables. 521 522| Field | Description | 523|:------------------|:-------------------------------------------------------------------------------------------------------------------| 524| `Repository ID` | Permanent unique ID assigned to the repository by Praefect. Different to the ID GitLab uses for repositories. | 525| `Virtual Storage` | Name of the virtual storage the repository is stored in. | 526| `Relative Path` | Repository's path in the virtual storage. | 527| `Replica Path` | Where on the Gitaly node's disk the repository's replicas are stored. | 528| `Primary` | Current primary of the repository. | 529| `Generation` | Used by Praefect to track repository changes. Each write in the repository increments the repository's generation. | 530| `Replicas` | A list of replicas that exist or are expected to exist. | 531 532For each replica, the following metadata is available: 533 534| `Replicas` Field | Description | 535|:-----------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 536| `Storage` | Name of the Gitaly storage that contains the replica. | 537| `Assigned` | Indicates whether the replica is expected to exist in the storage. Can be `false` if a Gitaly node is removed from the cluster or if the storage contains an extra copy after the repository's replication factor was decreased. | 538| `Generation` | Latest confirmed generation of the replica. It indicates:<br><br>- The replica is fully up to date if the generation matches the repository's generation.<br>- The replica is outdated if the replica's generation is less than the repository's generation.<br>- `replica not yet created` if the replica does not yet exist at all on the storage. | 539| `Healthy` | Indicates whether the Gitaly node that is hosting this replica is considered healthy by the consensus of Praefect nodes. | 540| `Valid Primary` | Indicates whether the replica is fit to serve as the primary node. If the repository's primary is not a valid primary, a failover occurs on the next write to the repository if there is another replica that is a valid primary. A replica is a valid primary if:<br><br>- It is stored on a healthy Gitaly node.<br>- It is fully up to date.<br>- It is not targeted by a pending deletion job from decreasing replication factor.<br>- It is assigned. | 541 542### Check that repositories are in sync 543 544Is [some cases](index.md#known-issues) the Praefect database can get out of sync with the underlying Gitaly nodes. To check that 545a given repository is fully synced on all nodes, run the [`gitlab:praefect:replicas` Rake task](../raketasks/praefect.md#replica-checksums) 546that checksums the repository on all Gitaly nodes. 547 548The [Praefect dataloss](recovery.md#check-for-data-loss) command only checks the state of the repo in the Praefect database, and cannot 549be relied to detect sync problems in this scenario. 550 551### Relation does not exist errors 552 553By default Praefect database tables are created automatically by `gitlab-ctl reconfigure` task. 554 555However, the Praefect database tables are not created on initial reconfigure and can throw 556errors that relations do not exist if either: 557 558- The `gitlab-ctl reconfigure` command isn't executed. 559- There are errors during the execution. 560 561For example: 562 563- `ERROR: relation "node_status" does not exist at character 13` 564- `ERROR: relation "replication_queue_lock" does not exist at character 40` 565- This error: 566 567 ```json 568 {"level":"error","msg":"Error updating node: pq: relation \"node_status\" does not exist","pid":210882,"praefectName":"gitlab1x4m:0.0.0.0:2305","time":"2021-04-01T19:26:19.473Z","virtual_storage":"praefect-cluster-1"} 569 ``` 570 571To solve this, the database schema migration can be done using `sql-migrate` sub-command of 572the `praefect` command: 573 574```shell 575$ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-migrate 576praefect sql-migrate: OK (applied 21 migrations) 577``` 578 579### Requests fail with 'repository scoped: invalid Repository' errors 580 581This indicates that the virtual storage name used in the 582[Praefect configuration](praefect.md#praefect) does not match the storage name used in 583[`git_data_dirs` setting](praefect.md#gitaly) for GitLab. 584 585Resolve this by matching the virtual storage names used in Praefect and GitLab configuration. 586 587### Gitaly Cluster performance issues on cloud platforms 588 589Praefect does not require a lot of CPU or memory, and can run on small virtual machines. 590Cloud services may place other limits on the resources that small VMs can use, such as 591disk IO and network traffic. 592 593Praefect nodes generate a lot of network traffic. The following symptoms can be observed if their network bandwidth has 594been throttled by the cloud service: 595 596- Poor performance of Git operations. 597- High network latency. 598- High memory use by Praefect. 599 600Possible solutions: 601 602- Provision larger VMs to gain access to larger network traffic allowances. 603- Use your cloud service's monitoring and logging to check that the Praefect nodes are not exhausting their traffic allowances. 604