1--- 2stage: Enablement 3group: Distribution 4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments 5type: reference 6--- 7 8# Using NFS with GitLab **(FREE SELF)** 9 10NFS can be used as an alternative for object storage but this isn't typically 11recommended for performance reasons. 12 13For data objects such as LFS, Uploads, Artifacts, and so on, an [Object Storage service](object_storage.md) 14is recommended over NFS where possible, due to better performance. 15 16File system performance can impact overall GitLab performance, especially for 17actions that read or write to Git repositories. For steps you can use to test 18file system performance, see 19[File System Performance Benchmarking](operations/filesystem_benchmarking.md). 20 21## Gitaly and NFS deprecation 22 23Starting with GitLab version 14.0, support for NFS to store Git repository data is deprecated. Technical customer support and engineering support is available for the 14.x releases. Engineering is fixing bugs and security vulnerabilities consistent with our [release and maintenance policy](../policy/maintenance.md#security-releases). 24 25At the end of the 14.12 milestone (tentatively June 22nd, 2022) technical and engineering support for using NFS to store Git repository data will be officially at end-of-life. There will be no product changes or troubleshooting provided via Engineering, Security or Paid Support channels. 26 27For those customers still running earlier versions of GitLab, [our support eligibility and maintenance policy applies](https://about.gitlab.com/support/statement-of-support.html#version-support). 28 29For the 14.x releases, we continue to help with Git related tickets from customers running one or more Gitaly servers with its data stored on NFS. Examples may include: 30 31- Performance issues or timeouts accessing Git data 32- Commits or branches vanish 33- GitLab intermittently returns the wrong Git data (such as reporting that a repository has no branches) 34 35Assistance is limited to activities like: 36 37- Verifying developers' workflow uses features like protected branches 38- Reviewing GitLab event data from the database to advise if it looks like a force push over-wrote branches 39- Verifying that NFS client mount options match our [documented recommendations](#mount-options) 40- Analyzing the GitLab Workhorse and Rails logs, and determining that `500` errors being seen in the environment are caused by slow responses from Gitaly 41 42GitLab support is unable to continue with the investigation if: 43 44- The date of the request is on or after the release of GitLab version 15.0, and 45- Support Engineers and Management determine that all reasonable non-NFS root causes have been exhausted 46 47If the issue is reproducible, or if it happens intermittently but regularly, GitLab Support can investigate providing the issue reproduces without the use of NFS. In order to reproduce without NFS, the affected repositories should be migrated to a different Gitaly shard, such as Gitaly cluster or a standalone Gitaly VM, backed with block storage. 48 49### Why remove NFS for Git repository data 50 51{:.no-toc} 52 53NFS is not well-suited to a workload consisting of many small files, like Git repositories. NFS does provide a number of configuration options designed to improve performance. However, over time, a number of these mount options have proven to result in inconsistencies across multiple nodes mounting the NFS volume, up to and including data loss. Addressing these inconsistencies consume extraordinary development and support engineer time that hamper our ability to develop [Gitaly Cluster](gitaly/praefect.md), our purpose-built solution to addressing the deficiencies of NFS in this environment. 54 55Please note that Gitaly Cluster provides highly-available Git repository storage. If this is not a requirement, single-node Gitaly backed by block storage is a suitable substitute. 56 57Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be 58unavailable from GitLab 15.0. No further enhancements are planned for this feature. 59 60Read: 61 62- [Moving beyond NFS](gitaly/index.md#moving-beyond-nfs). 63- About the [correct mount options to use](#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss). 64 65## Known kernel version incompatibilities 66 67RedHat Enterprise Linux (RHEL) and CentOS v7.7 and v7.8 ship with kernel 68version `3.10.0-1127`, which [contains a 69bug](https://bugzilla.redhat.com/show_bug.cgi?id=1783554) that causes 70[uploads to fail to copy over NFS](https://gitlab.com/gitlab-org/gitlab/-/issues/218999). The 71following GitLab versions include a fix to work properly with that 72kernel version: 73 74- [12.10.12](https://about.gitlab.com/releases/2020/06/25/gitlab-12-10-12-released/) 75- [13.0.7](https://about.gitlab.com/releases/2020/06/25/gitlab-13-0-7-released/) 76- [13.1.1](https://about.gitlab.com/releases/2020/06/24/gitlab-13-1-1-released/) 77- 13.2 and up 78 79If you are using that kernel version, be sure to upgrade GitLab to avoid 80errors. 81 82## Fast lookup of authorized SSH keys 83 84The [fast SSH key lookup](operations/fast_ssh_key_lookup.md) feature can improve 85performance of GitLab instances even if they're using block storage. 86 87[Fast SSH key lookup](operations/fast_ssh_key_lookup.md) is a replacement for 88`authorized_keys` (in `/var/opt/gitlab/.ssh`) using the GitLab database. 89 90NFS increases latency, so fast lookup is recommended if `/var/opt/gitlab` 91is moved to NFS. 92 93We are investigating the use of 94[fast lookup as the default](https://gitlab.com/groups/gitlab-org/-/epics/3104). 95 96## Improving NFS performance with GitLab 97 98NFS performance with GitLab can in some cases be improved with 99[direct Git access](gitaly/index.md#direct-access-to-git-in-gitlab) using 100[Rugged](https://github.com/libgit2/rugged). 101 102From GitLab 12.1, GitLab automatically detects if Rugged can and should be used per storage. 103If you previously enabled Rugged using the feature flag and you want to use automatic detection instead, 104you must unset the feature flag: 105 106```shell 107sudo gitlab-rake gitlab:features:unset_rugged 108``` 109 110If the Rugged feature flag is explicitly set to either `true` or `false`, GitLab uses the value explicitly set. 111 112From GitLab 12.7, Rugged is only automatically enabled for use with Puma 113if the [Puma thread count is set to `1`](../install/requirements.md#puma-settings). 114 115To use Rugged with a Puma thread count of more than `1`, enable Rugged using the [feature flag](../development/gitaly.md#legacy-rugged-code). 116 117## NFS server 118 119Installing the `nfs-kernel-server` package allows you to share directories with 120the clients running the GitLab application: 121 122```shell 123sudo apt-get update 124sudo apt-get install nfs-kernel-server 125``` 126 127### Required features 128 129**File locking**: GitLab **requires** advisory file locking, which is only 130supported natively in NFS version 4. NFSv3 also supports locking as long as 131Linux Kernel 2.6.5+ is used. We recommend using version 4 and do not 132specifically test NFSv3. 133 134### Recommended options 135 136When you define your NFS exports, we recommend you also add the following 137options: 138 139- `no_root_squash` - NFS normally changes the `root` user to `nobody`. This is 140 a good security measure when NFS shares are accessed by many different 141 users. However, in this case only GitLab uses the NFS share so it 142 is safe. GitLab recommends the `no_root_squash` setting because we need to 143 manage file permissions automatically. Without the setting you may receive 144 errors when the Omnibus package tries to alter permissions. GitLab 145 and other bundled components do **not** run as `root` but as non-privileged 146 users. The recommendation for `no_root_squash` is to allow the Omnibus package 147 to set ownership and permissions on files, as needed. In some cases where the 148 `no_root_squash` option is not available, the `root` flag can achieve the same 149 result. 150- `sync` - Force synchronous behavior. Default is asynchronous and under certain 151 circumstances it could lead to data loss if a failure occurs before data has 152 synced. 153 154Due to the complexities of running Omnibus with LDAP and the complexities of 155maintaining ID mapping without LDAP, in most cases you should enable numeric UIDs 156and GIDs (which is off by default in some cases) for simplified permission 157management between systems: 158 159- [NetApp instructions](https://library.netapp.com/ecmdocs/ECMP1401220/html/GUID-24367A9F-E17B-4725-ADC1-02D86F56F78E.html) 160- For non-NetApp devices, disable NFSv4 `idmapping` by performing opposite of [enable NFSv4 idmapper](https://wiki.archlinux.org/title/NFS#Enabling_NFSv4_idmapping) 161 162### Disable NFS server delegation 163 164We recommend that all NFS users disable the NFS server delegation feature. This 165is to avoid a [Linux kernel bug](https://bugzilla.redhat.com/show_bug.cgi?id=1552203) 166which causes NFS clients to slow precipitously due to 167[excessive network traffic from numerous `TEST_STATEID` NFS messages](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/52017). 168 169To disable NFS server delegation, do the following: 170 1711. On the NFS server, run: 172 173 ```shell 174 echo 0 > /proc/sys/fs/leases-enable 175 sysctl -w fs.leases-enable=0 176 ``` 177 1781. Restart the NFS server process. For example, on CentOS run `service nfs restart`. 179 180NOTE: 181The kernel bug may be fixed in 182[more recent kernels with this commit](https://github.com/torvalds/linux/commit/95da1b3a5aded124dd1bda1e3cdb876184813140). 183Red Hat Enterprise 7 [shipped a kernel update](https://access.redhat.com/errata/RHSA-2019:2029) 184on August 6, 2019 that may also have resolved this problem. 185You may not need to disable NFS server delegation if you know you are using a version of 186the Linux kernel that has been fixed. That said, GitLab still encourages instance 187administrators to keep NFS server delegation disabled. 188 189## NFS client 190 191The `nfs-common` provides NFS functionality without installing server components which 192we don't need running on the application nodes. 193 194```shell 195apt-get update 196apt-get install nfs-common 197``` 198 199### Mount options 200 201Here is an example snippet to add to `/etc/fstab`: 202 203```plaintext 20410.1.0.1:/var/opt/gitlab/.ssh /var/opt/gitlab/.ssh nfs4 defaults,vers=4.1,hard,rsize=1048576,wsize=1048576,noatime,nofail,_netdev,lookupcache=positive 0 2 20510.1.0.1:/var/opt/gitlab/gitlab-rails/uploads /var/opt/gitlab/gitlab-rails/uploads nfs4 defaults,vers=4.1,hard,rsize=1048576,wsize=1048576,noatime,nofail,_netdev,lookupcache=positive 0 2 20610.1.0.1:/var/opt/gitlab/gitlab-rails/shared /var/opt/gitlab/gitlab-rails/shared nfs4 defaults,vers=4.1,hard,rsize=1048576,wsize=1048576,noatime,nofail,_netdev,lookupcache=positive 0 2 20710.1.0.1:/var/opt/gitlab/gitlab-ci/builds /var/opt/gitlab/gitlab-ci/builds nfs4 defaults,vers=4.1,hard,rsize=1048576,wsize=1048576,noatime,nofail,_netdev,lookupcache=positive 0 2 20810.1.0.1:/var/opt/gitlab/git-data /var/opt/gitlab/git-data nfs4 defaults,vers=4.1,hard,rsize=1048576,wsize=1048576,noatime,nofail,_netdev,lookupcache=positive 0 2 209``` 210 211You can view information and options set for each of the mounted NFS file 212systems by running `nfsstat -m` and `cat /etc/fstab`. 213 214Note there are several options that you should consider using: 215 216| Setting | Description | 217| ------- | ----------- | 218| `vers=4.1` |NFS v4.1 should be used instead of v4.0 because there is a Linux [NFS client bug in v4.0](https://gitlab.com/gitlab-org/gitaly/-/issues/1339) that can cause significant problems due to stale data. 219| `nofail` | Don't halt boot process waiting for this mount to become available 220| `lookupcache=positive` | Tells the NFS client to honor `positive` cache results but invalidates any `negative` cache results. Negative cache results cause problems with Git. Specifically, a `git push` can fail to register uniformly across all NFS clients. The negative cache causes the clients to 'remember' that the files did not exist previously. 221| `hard` | Instead of `soft`. [Further details](#soft-mount-option). 222| `cto` | `cto` is the default option, which you should use. Do not use `nocto`. [Further details](#nocto-mount-option). 223| `_netdev` | Wait to mount file system until network is online. See also the [`high_availability['mountpoint']`](https://docs.gitlab.com/omnibus/settings/configuration.html#only-start-omnibus-gitlab-services-after-a-given-file-system-is-mounted) option. 224 225#### `soft` mount option 226 227It's recommended that you use `hard` in your mount options, unless you have a specific 228reason to use `soft`. 229 230When GitLab.com used NFS, we used `soft` because there were times when we had NFS servers 231reboot and `soft` improved availability, but everyone's infrastructure is different. 232If your NFS is provided by on-premise storage arrays with redundant controllers, 233for example, you shouldn't need to worry about NFS server availability. 234 235The NFS man page states: 236 237> "soft" timeout can cause silent data corruption in certain cases 238 239Read the [Linux man page](https://linux.die.net/man/5/nfs) to understand the difference, 240and if you do use `soft`, ensure that you've taken steps to mitigate the risks. 241 242If you experience behavior that might have been caused by 243writes to disk on the NFS server not occurring, such as commits going missing, 244use the `hard` option, because (from the man page): 245 246> use the soft option only when client responsiveness is more important than data integrity 247 248Other vendors make similar recommendations, including 249[System Applications and Products in Data Processing (SAP)](http://wiki.scn.sap.com/wiki/x/PARnFQ) and NetApp's 250[knowledge base](https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/What_are_the_differences_between_hard_mount_and_soft_mount), 251they highlight that if the NFS client driver caches data, `soft` means there is no certainty if 252writes by GitLab are actually on disk. 253 254Mount points set with the option `hard` may not perform as well, and if the 255NFS server goes down, `hard` causes processes to hang when interacting with 256the mount point. Use `SIGKILL` (`kill -9`) to deal with hung processes. 257The `intr` option 258[stopped working in the 2.6 kernel](https://access.redhat.com/solutions/157873). 259 260#### `nocto` mount option 261 262Do not use `nocto`. Instead, use `cto`, which is the default. 263 264When using `nocto`, the dentry cache is always used, up to `acdirmax` seconds (attribute cache time) from the time it's created. 265 266This results in stale dentry cache issues with multiple clients, where each client can see a different (cached) 267version of a directory. 268 269From the [Linux man page](https://linux.die.net/man/5/nfs), the important parts: 270 271> If the nocto option is specified, the client uses a non-standard heuristic to determine when files on the server have changed. 272> 273> Using the nocto option may improve performance for read-only mounts, but should be used only if the data on the server changes only occasionally. 274 275We have noticed this behavior in an issue about [refs not found after a push](https://gitlab.com/gitlab-org/gitlab/-/issues/326066), 276where newly added loose refs can be seen as missing on a different client with a local dentry cache, as 277[described in this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/326066#note_539436931). 278 279### A single NFS mount 280 281It's recommended to nest all GitLab data directories within a mount, that allows automatic 282restore of backups without manually moving existing data. 283 284```plaintext 285mountpoint 286└── gitlab-data 287 ├── builds 288 ├── git-data 289 ├── shared 290 └── uploads 291``` 292 293To do so, configure Omnibus with the paths to each directory nested 294in the mount point as follows: 295 296Mount `/gitlab-nfs` then use the following Omnibus 297configuration to move each data location to a subdirectory: 298 299```ruby 300git_data_dirs({"default" => { "path" => "/gitlab-nfs/gitlab-data/git-data"} }) 301gitlab_rails['uploads_directory'] = '/gitlab-nfs/gitlab-data/uploads' 302gitlab_rails['shared_path'] = '/gitlab-nfs/gitlab-data/shared' 303gitlab_ci['builds_directory'] = '/gitlab-nfs/gitlab-data/builds' 304``` 305 306Run `sudo gitlab-ctl reconfigure` to start using the central location. Be aware 307that if you had existing data, you need to manually copy or rsync it to 308these new locations, and then restart GitLab. 309 310### Bind mounts 311 312Alternatively to changing the configuration in Omnibus, bind mounts can be used 313to store the data on an NFS mount. 314 315Bind mounts provide a way to specify just one NFS mount and then 316bind the default GitLab data locations to the NFS mount. Start by defining your 317single NFS mount point as you normally would in `/etc/fstab`. Let's assume your 318NFS mount point is `/gitlab-nfs`. Then, add the following bind mounts in 319`/etc/fstab`: 320 321```shell 322/gitlab-nfs/gitlab-data/git-data /var/opt/gitlab/git-data none bind 0 0 323/gitlab-nfs/gitlab-data/.ssh /var/opt/gitlab/.ssh none bind 0 0 324/gitlab-nfs/gitlab-data/uploads /var/opt/gitlab/gitlab-rails/uploads none bind 0 0 325/gitlab-nfs/gitlab-data/shared /var/opt/gitlab/gitlab-rails/shared none bind 0 0 326/gitlab-nfs/gitlab-data/builds /var/opt/gitlab/gitlab-ci/builds none bind 0 0 327``` 328 329Using bind mounts requires you to manually make sure the data directories 330are empty before attempting a restore. Read more about the 331[restore prerequisites](../raketasks/backup_restore.md). 332 333### Multiple NFS mounts 334 335When using default Omnibus configuration you need to share 4 data locations 336between all GitLab cluster nodes. No other locations should be shared. The 337following are the 4 locations need to be shared: 338 339| Location | Description | Default configuration | 340| -------- | ----------- | --------------------- | 341| `/var/opt/gitlab/git-data` | Git repository data. This accounts for a large portion of your data | `git_data_dirs({"default" => { "path" => "/var/opt/gitlab/git-data"} })` 342| `/var/opt/gitlab/gitlab-rails/uploads` | User uploaded attachments | `gitlab_rails['uploads_directory'] = '/var/opt/gitlab/gitlab-rails/uploads'` 343| `/var/opt/gitlab/gitlab-rails/shared` | Build artifacts, GitLab Pages, LFS objects, temp files, and so on. If you're using LFS this may also account for a large portion of your data | `gitlab_rails['shared_path'] = '/var/opt/gitlab/gitlab-rails/shared'` 344| `/var/opt/gitlab/gitlab-ci/builds` | GitLab CI/CD build traces | `gitlab_ci['builds_directory'] = '/var/opt/gitlab/gitlab-ci/builds'` 345 346Other GitLab directories should not be shared between nodes. They contain 347node-specific files and GitLab code that does not need to be shared. To ship 348logs to a central location consider using remote syslog. Omnibus GitLab packages 349provide configuration for [UDP log shipping](https://docs.gitlab.com/omnibus/settings/logs.html#udp-log-shipping-gitlab-enterprise-edition-only). 350 351Having multiple NFS mounts requires you to manually make sure the data directories 352are empty before attempting a restore. Read more about the 353[restore prerequisites](../raketasks/backup_restore.md). 354 355## Testing NFS 356 357Once you've set up the NFS server and client, you can verify NFS is configured correctly 358by testing the following commands: 359 360```shell 361sudo mkdir /gitlab-nfs/test-dir 362sudo chown git /gitlab-nfs/test-dir 363sudo chgrp root /gitlab-nfs/test-dir 364sudo chmod 0700 /gitlab-nfs/test-dir 365sudo chgrp gitlab-www /gitlab-nfs/test-dir 366sudo chmod 0751 /gitlab-nfs/test-dir 367sudo chgrp git /gitlab-nfs/test-dir 368sudo chmod 2770 /gitlab-nfs/test-dir 369sudo chmod 2755 /gitlab-nfs/test-dir 370sudo -u git mkdir /gitlab-nfs/test-dir/test2 371sudo -u git chmod 2755 /gitlab-nfs/test-dir/test2 372sudo ls -lah /gitlab-nfs/test-dir/test2 373sudo -u git rm -r /gitlab-nfs/test-dir 374``` 375 376Any `Operation not permitted` errors means you should investigate your NFS server export options. 377 378## NFS in a Firewalled Environment 379 380If the traffic between your NFS server and NFS client(s) is subject to port filtering 381by a firewall, then you need to reconfigure that firewall to allow NFS communication. 382 383[This guide from The Linux Documentation Project (TDLP)](https://tldp.org/HOWTO/NFS-HOWTO/security.html#FIREWALLS) 384covers the basics of using NFS in a firewalled environment. Additionally, we encourage you to 385search for and review the specific documentation for your operating system or distribution and your firewall software. 386 387Example for Ubuntu: 388 389Check that NFS traffic from the client is allowed by the firewall on the host by running 390the command: `sudo ufw status`. If it's being blocked, then you can allow traffic from a specific 391client with the command below. 392 393```shell 394sudo ufw allow from <client_ip_address> to any port nfs 395``` 396 397## Known issues 398 399### Upgrade to Gitaly Cluster or disable caching if experiencing data loss 400 401WARNING: 402Engineering support for NFS for Git repositories is deprecated. Read about 403[moving beyond NFS](gitaly/index.md#moving-beyond-nfs). 404 405Customers and users have reported data loss on high-traffic repositories when using NFS for Git repositories. 406For example, we have seen: 407 408- [Inconsistent updates after a push](https://gitlab.com/gitlab-org/gitaly/-/issues/2589). 409- `git ls-remote` [returning the wrong (or no branches)](https://gitlab.com/gitlab-org/gitaly/-/issues/3083) 410causing Jenkins to intermittently re-run all pipelines for a repository. 411 412The problem may be partially mitigated by adjusting caching using the following NFS client mount options: 413 414| Setting | Description | 415| ------- | ----------- | 416| `lookupcache=positive` | Tells the NFS client to honor `positive` cache results but invalidates any `negative` cache results. Negative cache results cause problems with Git. Specifically, a `git push` can fail to register uniformly across all NFS clients. The negative cache causes the clients to 'remember' that the files did not exist previously. 417| `actimeo=0` | Sets the time to zero that the NFS client caches files and directories before requesting fresh information from a server. | 418| `noac` | Tells the NFS client not to cache file attributes and forces application writes to become synchronous so that local changes to a file become visible on the server immediately. | 419 420WARNING: 421The `actimeo=0` and `noac` options both result in a significant reduction in performance, possibly leading to timeouts. 422You may be able to avoid timeouts and data loss using `actimeo=0` and `lookupcache=positive` _without_ `noac`, however 423we expect the performance reduction is still significant. Upgrade to 424[Gitaly Cluster](gitaly/praefect.md) as soon as possible. 425 426### Avoid using cloud-based file systems 427 428GitLab strongly recommends against using cloud-based file systems such as: 429 430- AWS Elastic File System (EFS). 431- Google Cloud Filestore. 432- Azure Files. 433 434Our support team cannot assist with performance issues related to cloud-based file system access. 435 436Customers and users have reported that these file systems don't perform well for 437the file system access GitLab requires. Workloads where many small files are written in 438a serialized manner, like `git`, are not well suited to cloud-based file systems. 439 440If you do choose to use these, avoid storing GitLab log files (for example, those in `/var/log/gitlab`) 441there because this also affects performance. We recommend that the log files be 442stored on a local volume. 443 444For more details on the experience of using a cloud-based file systems with GitLab, 445see this [Commit Brooklyn 2019 video](https://youtu.be/K6OS8WodRBQ?t=313). 446 447### Avoid using CephFS and GlusterFS 448 449GitLab strongly recommends against using CephFS and GlusterFS. 450These distributed file systems are not well-suited for the GitLab input/output access patterns because Git uses many small files and access times and file locking times to propagate makes Git activity very slow. 451 452### Avoid using PostgreSQL with NFS 453 454GitLab strongly recommends against running your PostgreSQL database 455across NFS. The GitLab support team is not able to assist on performance issues related to 456this configuration. 457 458Additionally, this configuration is specifically warned against in the 459[PostgreSQL Documentation](https://www.postgresql.org/docs/current/creating-cluster.html#CREATING-CLUSTER-NFS): 460 461>PostgreSQL does nothing special for NFS file systems, meaning it assumes NFS behaves exactly like 462>locally-connected drives. If the client or server NFS implementation does not provide standard file 463>system semantics, this can cause reliability problems. Specifically, delayed (asynchronous) writes 464>to the NFS server can cause data corruption problems. 465 466For supported database architecture, see our documentation about 467[configuring a database for replication and failover](postgresql/replication_and_failover.md). 468 469## Troubleshooting 470 471### Finding the requests that are being made to NFS 472 473In case of NFS-related problems, it can be helpful to trace 474the file system requests that are being made by using `perf`: 475 476```shell 477sudo perf trace -e 'nfs4:*' -p $(pgrep -fd ',' puma) 478``` 479 480On Ubuntu 16.04, use: 481 482```shell 483sudo perf trace --no-syscalls --event 'nfs4:*' -p $(pgrep -fd ',' puma) 484``` 485