1# Git object quarantine during git push
2
3While receiving a Git push, GitLab can reject pushes using the
4`pre-receive` Git hook. Git has a special "object quarantine"
5mechanism that allows it to eagerly delete rejected Git objects.
6
7In this document we will explain how Git object quarantine works, and
8how GitLab is able to see quarantined objects.
9
10## Git object quarantine
11
12Git object quarantine was introduced in Git 2.11.0 via
13https://gitlab.com/gitlab-org/git/-/commit/25ab004c53cdcfea485e5bf437aeaa74df47196d.
14To understand what it does we need to know how Git receives pushes on
15the server.
16
17### How Git receives a push
18
19On a Git server, a push goes into `git receive-pack`. This process does the following things:
20
211. receive the Git objects pushed by the client and write them to disk
221. receive the ref update commands from the client and keep them in memory
231. check connectivity (no missing objects)
241. run `pre-receive` and feed it the intended ref update commands
251. if `pre-receive` rejects the push, clean up and stop
261. apply ref update commands one by one. For each command, run the `update` hook which can reject the ref update.
271. after all ref updates have been applied run the `post-receive` hook
281. report success to the client and end the session
29
30Object quarantine exists for the sake of the cleanup that happens when
31`pre-receive` rejects the push (step 5 above). It changes the _timing_ of the
32cleanup. Without object quarantine, objects that were part of a
33rejected push would sit around until `git gc` would judge them as both
34unused and "old". How long that takes depends on how often `git gc`
35runs (or `git prune`), and on the configuration of when objects are
36"old". Because of object quarantine, rejected objects can be deleted
37immediately: Git can just `rm -rf` the quarantine directory and
38they're gone.
39
40### Git implementation
41
42The Git implementation of this mechanism rests on two things.
43
44#### 1. Alternate object directories
45
46The objects in a Git repository can be stored across multiple
47directories: 1 main directory, usually `/objects`, and 0 or more
48alternate directories. Together these act like a search path: when
49looking for an object Git first checks the main directory, then each
50alternate, until it finds the object.
51
52#### 2. Config overrides via environment variables
53
54Git can inject custom config into subprocesses via environment
55variables. In the case of Git object directories, these are
56`GIT_OBJECT_DIRECTORY` (the main object directory) and
57`GIT_ALTERNATE_OBJECT_DIRECTORIES` (a search path of `:`-separated
58alternate object directories).
59
60#### Putting it all together
61
621. `git receive-pack` receives a push
631. `git receive-pack` [creates a quarantine directory `objects/incoming-$RANDOM`](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1715)
641. `git receive-pack` [configures the unpack process](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1721) to write objects into the quarantine directory
651. `git receive-pack` unpacks the objects into the quarantine directory
661. `git receive-pack` [runs the `pre-receive` hook](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1498) with special `GIT_OBJECT_DIRECTORY` and `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variables that add the quarantine directory to the search path
671. If the `pre-receive` hook rejects the push, `git receive-pack` removes the quarantine directory and its contents. The push is aborted.
681. If the `pre-receive` hook passes, `git receive-pack` [merges the quarantine directory into the main object directory](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1510).
691. `git receive-pack` enters the ref update transaction
70
71Note that by the time the `update` hook runs, the quarantine directory
72has already been merged into the main object directory so it no longer
73matters. The same goes for the `post-receive` hook which runs even
74later.
75
76Because `pre-receive` has the special quarantine configuration data in
77environment variables, any `git` process spawned by `pre-receive` will
78inherit the quarantine config and will be able to see the objects that
79are being pushed.
80
81## GitLab and Git object quarantine
82
83### Why does all this matter to GitLab
84
85GitLab uses Git hooks, among other things, to implement features that
86can reject Git pushes. For example, you can mark a branch as
87"protected" in the GitLab web UI, and then certain types of users can
88no longer push to that branch. That feature is implemented via the [Git
89`pre-receive` hook](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitaly/service/hook/pre_receive.go).
90
91As mentioned above, Git object quarantine normally works more or less
92automatically because `git` commands spawned by the `pre-receive` hook
93inherit the special environment variables that contain the path to the
94quarantine directory. In the case of GitLab's hooks we have a problem,
95however, because the GitLab hooks are "dumb". All the GitLab hooks do
96is take the inputs of the hook executable (the list of ref update
97commands) and send them to the GitLab Rails internal API via a POST
98request. The application logic that decides whether the push is
99allowed resides in Rails. The hook just waits and reports back result
100of the POST API request to GitLab.
101
102During the POST, the internal GitLab API makes Gitaly calls back into the repo to
103examine the objects being pushed. For example, if force pushes are not
104allowed, GitLab will call the IsAncestor RPC. That RPC call then wants
105to look at a commit that is in the process of being pushed. But
106because that commit is in quarantine, the RPC will fail because the
107commit cannot be found.
108
109### How GitLab passes the object quarantine information around
110
111To overcome this problem, the GitLab `pre-receive` hook [reads the
112object directory configuration from its
113environment](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitlabshell/env.go#L9).
114and passes this information [along with the HTTP API
115call](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitaly/hook/manager.go#L30-46).
116On the Rails side, we then [put the object directory information in
117the "request
118store"](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/api/internal/base.rb#L43)
119(i.e., request-scoped thread-local storage). And then during that
120Rails request, when Rails makes Gitaly requests on this repo, we send
121back the quarantine information [in the Gitaly `Repository`
122struct](https://gitlab.com/gitlab-org/gitlab/-/blob/f81f30c29a0edce20f6737fdccc3315c8baab9d1/lib/gitlab/gitaly_client/util.rb#L8-17).
123And finally, inside Gitaly, when we spawn a Git process, we [re-create
124the environment
125variables](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L21-34)
126that were present on the `pre-receive` hook, so that we can see the
127quarantined objects. We do the same when we [instantiate a
128Gitlab::Git::Repository in
129gitaly-ruby](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/lib/gitlab/git/repository.rb#L44).
130
131### Relative paths
132
133During the Gitaly migration we had to handle a complication with the
134object quarantine information: Git uses absolute paths for this. These
135paths get generated wherever `git receive-pack` runs, i.e., on the
136Gitaly server. During the migration, the repositories were also
137accessible via NFS at the Rails side, but at a different path. That
138meant that the absolute paths supplied by Git would be invalid part of
139the time.
140
141To work around this, the GitLab `pre-receive` hook [converts the
142absolute paths from Git into relative
143paths](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/gitlab-shell/lib/object_dirs_helper.rb#L16),
144relative to the repository directory. These relative paths then get
145passed around inside GitLab. At the time Gitaly recreates the object
146directory variables, it [converts the paths back from relative to
147absolute](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L23).
148