1# Git object quarantine during git push 2 3While receiving a Git push, GitLab can reject pushes using the 4`pre-receive` Git hook. Git has a special "object quarantine" 5mechanism that allows it to eagerly delete rejected Git objects. 6 7In this document we will explain how Git object quarantine works, and 8how GitLab is able to see quarantined objects. 9 10## Git object quarantine 11 12Git object quarantine was introduced in Git 2.11.0 via 13https://gitlab.com/gitlab-org/git/-/commit/25ab004c53cdcfea485e5bf437aeaa74df47196d. 14To understand what it does we need to know how Git receives pushes on 15the server. 16 17### How Git receives a push 18 19On a Git server, a push goes into `git receive-pack`. This process does the following things: 20 211. receive the Git objects pushed by the client and write them to disk 221. receive the ref update commands from the client and keep them in memory 231. check connectivity (no missing objects) 241. run `pre-receive` and feed it the intended ref update commands 251. if `pre-receive` rejects the push, clean up and stop 261. apply ref update commands one by one. For each command, run the `update` hook which can reject the ref update. 271. after all ref updates have been applied run the `post-receive` hook 281. report success to the client and end the session 29 30Object quarantine exists for the sake of the cleanup that happens when 31`pre-receive` rejects the push (step 5 above). It changes the _timing_ of the 32cleanup. Without object quarantine, objects that were part of a 33rejected push would sit around until `git gc` would judge them as both 34unused and "old". How long that takes depends on how often `git gc` 35runs (or `git prune`), and on the configuration of when objects are 36"old". Because of object quarantine, rejected objects can be deleted 37immediately: Git can just `rm -rf` the quarantine directory and 38they're gone. 39 40### Git implementation 41 42The Git implementation of this mechanism rests on two things. 43 44#### 1. Alternate object directories 45 46The objects in a Git repository can be stored across multiple 47directories: 1 main directory, usually `/objects`, and 0 or more 48alternate directories. Together these act like a search path: when 49looking for an object Git first checks the main directory, then each 50alternate, until it finds the object. 51 52#### 2. Config overrides via environment variables 53 54Git can inject custom config into subprocesses via environment 55variables. In the case of Git object directories, these are 56`GIT_OBJECT_DIRECTORY` (the main object directory) and 57`GIT_ALTERNATE_OBJECT_DIRECTORIES` (a search path of `:`-separated 58alternate object directories). 59 60#### Putting it all together 61 621. `git receive-pack` receives a push 631. `git receive-pack` [creates a quarantine directory `objects/incoming-$RANDOM`](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1715) 641. `git receive-pack` [configures the unpack process](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1721) to write objects into the quarantine directory 651. `git receive-pack` unpacks the objects into the quarantine directory 661. `git receive-pack` [runs the `pre-receive` hook](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1498) with special `GIT_OBJECT_DIRECTORY` and `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variables that add the quarantine directory to the search path 671. If the `pre-receive` hook rejects the push, `git receive-pack` removes the quarantine directory and its contents. The push is aborted. 681. If the `pre-receive` hook passes, `git receive-pack` [merges the quarantine directory into the main object directory](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1510). 691. `git receive-pack` enters the ref update transaction 70 71Note that by the time the `update` hook runs, the quarantine directory 72has already been merged into the main object directory so it no longer 73matters. The same goes for the `post-receive` hook which runs even 74later. 75 76Because `pre-receive` has the special quarantine configuration data in 77environment variables, any `git` process spawned by `pre-receive` will 78inherit the quarantine config and will be able to see the objects that 79are being pushed. 80 81## GitLab and Git object quarantine 82 83### Why does all this matter to GitLab 84 85GitLab uses Git hooks, among other things, to implement features that 86can reject Git pushes. For example, you can mark a branch as 87"protected" in the GitLab web UI, and then certain types of users can 88no longer push to that branch. That feature is implemented via the [Git 89`pre-receive` hook](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitaly/service/hook/pre_receive.go). 90 91As mentioned above, Git object quarantine normally works more or less 92automatically because `git` commands spawned by the `pre-receive` hook 93inherit the special environment variables that contain the path to the 94quarantine directory. In the case of GitLab's hooks we have a problem, 95however, because the GitLab hooks are "dumb". All the GitLab hooks do 96is take the inputs of the hook executable (the list of ref update 97commands) and send them to the GitLab Rails internal API via a POST 98request. The application logic that decides whether the push is 99allowed resides in Rails. The hook just waits and reports back result 100of the POST API request to GitLab. 101 102During the POST, the internal GitLab API makes Gitaly calls back into the repo to 103examine the objects being pushed. For example, if force pushes are not 104allowed, GitLab will call the IsAncestor RPC. That RPC call then wants 105to look at a commit that is in the process of being pushed. But 106because that commit is in quarantine, the RPC will fail because the 107commit cannot be found. 108 109### How GitLab passes the object quarantine information around 110 111To overcome this problem, the GitLab `pre-receive` hook [reads the 112object directory configuration from its 113environment](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitlabshell/env.go#L9). 114and passes this information [along with the HTTP API 115call](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitaly/hook/manager.go#L30-46). 116On the Rails side, we then [put the object directory information in 117the "request 118store"](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/api/internal/base.rb#L43) 119(i.e., request-scoped thread-local storage). And then during that 120Rails request, when Rails makes Gitaly requests on this repo, we send 121back the quarantine information [in the Gitaly `Repository` 122struct](https://gitlab.com/gitlab-org/gitlab/-/blob/f81f30c29a0edce20f6737fdccc3315c8baab9d1/lib/gitlab/gitaly_client/util.rb#L8-17). 123And finally, inside Gitaly, when we spawn a Git process, we [re-create 124the environment 125variables](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L21-34) 126that were present on the `pre-receive` hook, so that we can see the 127quarantined objects. We do the same when we [instantiate a 128Gitlab::Git::Repository in 129gitaly-ruby](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/lib/gitlab/git/repository.rb#L44). 130 131### Relative paths 132 133During the Gitaly migration we had to handle a complication with the 134object quarantine information: Git uses absolute paths for this. These 135paths get generated wherever `git receive-pack` runs, i.e., on the 136Gitaly server. During the migration, the repositories were also 137accessible via NFS at the Rails side, but at a different path. That 138meant that the absolute paths supplied by Git would be invalid part of 139the time. 140 141To work around this, the GitLab `pre-receive` hook [converts the 142absolute paths from Git into relative 143paths](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/gitlab-shell/lib/object_dirs_helper.rb#L16), 144relative to the repository directory. These relative paths then get 145passed around inside GitLab. At the time Gitaly recreates the object 146directory variables, it [converts the paths back from relative to 147absolute](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L23). 148