1---
2stage: none
3group: unassigned
4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
5---
6
7# Working with diffs
8
9We rely on different sources to present diffs. These include:
10
11- Gitaly service
12- Database (through `merge_request_diff_files`)
13- Redis (cached highlighted diffs)
14
15## Deep Dive
16
17<!-- vale gitlab.Spelling = NO -->
18
19In January 2019, Oswaldo Ferreira hosted a Deep Dive (GitLab team members only:
20`https://gitlab.com/gitlab-org/create-stage/issues/1`) on GitLab Diffs and Commenting on Diffs
21functionality to share domain-specific knowledge with anyone who may work in this part of the
22codebase in the future:
23
24<!-- vale gitlab.Spelling = YES -->
25
26- <i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
27  [Recording on YouTube](https://www.youtube.com/watch?v=K6G3gMcFyek)
28- Slides on [Google Slides](https://docs.google.com/presentation/d/1bGutFH2AT3bxOPZuLMGl1ANWHqFnrxwQwjiwAZkF-TU/edit)
29- [PDF slides](https://gitlab.com/gitlab-org/create-stage/uploads/b5ad2f336e0afcfe0f99db0af0ccc71a/)
30
31Everything covered in this deep dive was accurate as of GitLab 11.7, and while specific details may
32have changed since then, it should still serve as a good introduction.
33
34## Architecture overview
35
36### Merge request diffs
37
38When refreshing a Merge Request (pushing to a source branch, force-pushing to target branch, or if the target branch now contains any commits from the MR)
39we fetch the comparison information using `Gitlab::Git::Compare`, which fetches `base` and `head` data using Gitaly and diff between them through
40`Gitlab::Git::Diff.between`.
41The diffs fetching process _limits_ single file diff sizes and the overall size of the whole diff through a series of constant values. Raw diff files are
42then persisted on `merge_request_diff_files` table.
43
44Even though diffs larger than 10% of the value of `ApplicationSettings#diff_max_patch_bytes` are collapsed,
45we still keep them on PostgreSQL. However, diff files larger than defined _safety limits_
46(see the [Diff limits section](#diff-limits)) are _not_ persisted in the database.
47
48In order to present diffs information on the Merge Request diffs page, we:
49
501. Fetch all diff files from database `merge_request_diff_files`
511. Fetch the _old_ and _new_ file blobs in batch to:
52   - Highlight old and new file content
53   - Know which viewer it should use for each file (text, image, deleted, etc)
54   - Know if the file content changed
55   - Know if it was stored externally
56   - Know if it had storage errors
571. If the diff file is cacheable (text-based), it's cached on Redis
58   using `Gitlab::Diff::FileCollection::MergeRequestDiff`
59
60### Note diffs
61
62When commenting on a diff (any comparison), we persist a truncated diff version
63on `NoteDiffFile` (which is associated with the actual `DiffNote`). So instead
64of hitting the repository every time we need the diff of the file, we:
65
661. Check whether we have the `NoteDiffFile#diff` persisted and use it
671. Otherwise, if it's a current MR revision, use the persisted
68   `MergeRequestDiffFile#diff`
691. In the last scenario, go the repository and fetch the diff
70
71## Diff limits
72
73As explained above, we limit single diff files and the size of the whole diff. There are scenarios where we collapse the diff file,
74and cases where the diff file is not presented at all, and the user is guided to the Blob view.
75
76### Diff collection limits
77
78Limits that act onto all diff files collection. Files number, lines number and files size are considered.
79
80```ruby
81Gitlab::Git::DiffCollection.collection_limits[:safe_max_files] = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_files] = 100
82```
83
84File diffs are collapsed (but are expandable) if 100 files have already been rendered.
85
86```ruby
87Gitlab::Git::DiffCollection.collection_limits[:safe_max_lines] = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_lines] = 5000
88```
89
90File diffs are collapsed (but be expandable) if 5000 lines have already been rendered.
91
92```ruby
93Gitlab::Git::DiffCollection.collection_limits[:safe_max_bytes] = Gitlab::Git::DiffCollection.collection_limits[:safe_max_files] * 5.kilobytes = 500.kilobytes
94```
95
96File diffs are collapsed (but be expandable) if 500 kilobytes have already been rendered.
97
98```ruby
99Gitlab::Git::DiffCollection.collection_limits[:max_files] = Commit::DIFF_HARD_LIMIT_FILES = 1000
100```
101
102No more files are rendered at all if 1000 files have already been rendered.
103
104```ruby
105Gitlab::Git::DiffCollection.collection_limits[:max_lines] = Commit::DIFF_HARD_LIMIT_LINES = 50000
106```
107
108No more files are rendered at all if 50,000 lines have already been rendered.
109
110```ruby
111Gitlab::Git::DiffCollection.collection_limits[:max_bytes] = Gitlab::Git::DiffCollection.collection_limits[:max_files] * 5.kilobytes = 5000.kilobytes
112```
113
114No more files are rendered at all if 5 megabytes have already been rendered.
115
116All collection limit parameters are sent and applied on Gitaly. That is, after the limit is surpassed,
117Gitaly only returns the safe amount of data to be persisted on `merge_request_diff_files`.
118
119### Individual diff file limits
120
121Limits that act onto each diff file of a collection. Files number, lines number and files size are considered.
122
123#### Expandable patches (collapsed)
124
125Diff patches are collapsed when surpassing 10% of the value set in `ApplicationSettings#diff_max_patch_bytes`.
126That is, it's equivalent to 10kb if the maximum allowed value is 100kb.
127The diff is persisted and expandable if the patch size doesn't
128surpass `ApplicationSettings#diff_max_patch_bytes`.
129
130Although this nomenclature (Collapsing) is also used on Gitaly, this limit is only used on GitLab (hardcoded - not sent to Gitaly).
131Gitaly only returns `Diff.Collapsed` (RPC) when surpassing collection limits.
132
133#### Not expandable patches (too large)
134
135The patch not be rendered if it's larger than `ApplicationSettings#diff_max_patch_bytes`.
136Users see a `Changes are too large to be shown.` message and a button to view only that file in that commit.
137
138```ruby
139Commit::DIFF_SAFE_LINES = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_lines] = 5000
140```
141
142File diff is suppressed (technically different from collapsed, but behaves the same, and is expandable) if it has more than 5000 lines.
143
144This limit is hardcoded and only applied on GitLab.
145
146## Viewers
147
148Diff Viewers, which can be found on `models/diff_viewer/*` are classes used to map metadata about each type of Diff File. It has information
149whether it's a binary, which partial should be used to render it or which File extensions this class accounts for.
150
151`DiffViewer::Base` validates _blobs_ (old and new versions) content, extension and file type in order to check if it can be rendered.
152
153## Merge request diffs against the `HEAD` of the target branch
154
155Historically, merge request diffs have been calculated by `git diff target...source` which compares the
156`HEAD` of the source branch with the merge base (or a common ancestor) of the target branch and the source's.
157This solution works well until the target branch starts containing some of the
158changes introduced by the source branch: Consider the following case, in which the source branch
159is `feature_a` and the target is `main`:
160
1611. Checkout a new branch `feature_a` from `main` and remove `file_a` and `file_b` in it.
1621. Add a commit that removes `file_a` to `main`.
163
164The merge request diff still contains the `file_a` removal while the actual diff compared to
165`main`'s `HEAD` has only the `file_b` removal. The diff with such redundant
166changes is harder to review.
167
168In order to display an up-to-date diff, in GitLab 12.9 we
169[introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/27008) merge request
170diffs compared against `HEAD` of the target branch: the
171target branch is artificially merged into the source branch, then the resulting
172merge ref is compared to the source branch in order to calculate an accurate
173diff.
174
175Until we complete the epics ["use merge refs for diffs"](https://gitlab.com/groups/gitlab-org/-/epics/854)
176and ["merge conflicts in diffs"](https://gitlab.com/groups/gitlab-org/-/epics/4893),
177both options `main (base)` and `main (HEAD)` are available to be displayed in merge requests:
178
179![Merge ref head options](img/merge_ref_head_options_v13_6.png)
180
181The `main (HEAD)` option is meant to replace `main (base)` in the future.
182
183In order to support comments for both options, diff note positions are stored for
184both `main (base)` and `main (HEAD)` versions ([introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/198457) in 12.10).
185The position for `main (base)` version is stored in `Note#position` and
186`Note#original_position` columns, for `main (HEAD)` version `DiffNotePosition`
187has been introduced.
188
189One of the key challenges to deal with when working on merge ref diffs are merge
190conflicts. If the target and source branch contains a merge conflict, the branches
191cannot be automatically merged. The
192<i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [recording on YouTube](https://www.youtube.com/watch?v=GFXIFA4ZuZw&feature=youtu.be&ab_channel=GitLabUnfiltered)
193is a quick introduction to the problem and the motivation behind the [epic](https://gitlab.com/groups/gitlab-org/-/epics/854).
194
195In 13.5 a solution for both-modified merge
196conflict has been
197[introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/232484). However,
198there are more classes of merge conflicts that are to be
199[addressed](https://gitlab.com/groups/gitlab-org/-/epics/4893) in the future.
200