1--- 2stage: none 3group: unassigned 4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments 5--- 6 7# Working with diffs 8 9We rely on different sources to present diffs. These include: 10 11- Gitaly service 12- Database (through `merge_request_diff_files`) 13- Redis (cached highlighted diffs) 14 15## Deep Dive 16 17<!-- vale gitlab.Spelling = NO --> 18 19In January 2019, Oswaldo Ferreira hosted a Deep Dive (GitLab team members only: 20`https://gitlab.com/gitlab-org/create-stage/issues/1`) on GitLab Diffs and Commenting on Diffs 21functionality to share domain-specific knowledge with anyone who may work in this part of the 22codebase in the future: 23 24<!-- vale gitlab.Spelling = YES --> 25 26- <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> 27 [Recording on YouTube](https://www.youtube.com/watch?v=K6G3gMcFyek) 28- Slides on [Google Slides](https://docs.google.com/presentation/d/1bGutFH2AT3bxOPZuLMGl1ANWHqFnrxwQwjiwAZkF-TU/edit) 29- [PDF slides](https://gitlab.com/gitlab-org/create-stage/uploads/b5ad2f336e0afcfe0f99db0af0ccc71a/) 30 31Everything covered in this deep dive was accurate as of GitLab 11.7, and while specific details may 32have changed since then, it should still serve as a good introduction. 33 34## Architecture overview 35 36### Merge request diffs 37 38When refreshing a Merge Request (pushing to a source branch, force-pushing to target branch, or if the target branch now contains any commits from the MR) 39we fetch the comparison information using `Gitlab::Git::Compare`, which fetches `base` and `head` data using Gitaly and diff between them through 40`Gitlab::Git::Diff.between`. 41The diffs fetching process _limits_ single file diff sizes and the overall size of the whole diff through a series of constant values. Raw diff files are 42then persisted on `merge_request_diff_files` table. 43 44Even though diffs larger than 10% of the value of `ApplicationSettings#diff_max_patch_bytes` are collapsed, 45we still keep them on PostgreSQL. However, diff files larger than defined _safety limits_ 46(see the [Diff limits section](#diff-limits)) are _not_ persisted in the database. 47 48In order to present diffs information on the Merge Request diffs page, we: 49 501. Fetch all diff files from database `merge_request_diff_files` 511. Fetch the _old_ and _new_ file blobs in batch to: 52 - Highlight old and new file content 53 - Know which viewer it should use for each file (text, image, deleted, etc) 54 - Know if the file content changed 55 - Know if it was stored externally 56 - Know if it had storage errors 571. If the diff file is cacheable (text-based), it's cached on Redis 58 using `Gitlab::Diff::FileCollection::MergeRequestDiff` 59 60### Note diffs 61 62When commenting on a diff (any comparison), we persist a truncated diff version 63on `NoteDiffFile` (which is associated with the actual `DiffNote`). So instead 64of hitting the repository every time we need the diff of the file, we: 65 661. Check whether we have the `NoteDiffFile#diff` persisted and use it 671. Otherwise, if it's a current MR revision, use the persisted 68 `MergeRequestDiffFile#diff` 691. In the last scenario, go the repository and fetch the diff 70 71## Diff limits 72 73As explained above, we limit single diff files and the size of the whole diff. There are scenarios where we collapse the diff file, 74and cases where the diff file is not presented at all, and the user is guided to the Blob view. 75 76### Diff collection limits 77 78Limits that act onto all diff files collection. Files number, lines number and files size are considered. 79 80```ruby 81Gitlab::Git::DiffCollection.collection_limits[:safe_max_files] = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_files] = 100 82``` 83 84File diffs are collapsed (but are expandable) if 100 files have already been rendered. 85 86```ruby 87Gitlab::Git::DiffCollection.collection_limits[:safe_max_lines] = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_lines] = 5000 88``` 89 90File diffs are collapsed (but be expandable) if 5000 lines have already been rendered. 91 92```ruby 93Gitlab::Git::DiffCollection.collection_limits[:safe_max_bytes] = Gitlab::Git::DiffCollection.collection_limits[:safe_max_files] * 5.kilobytes = 500.kilobytes 94``` 95 96File diffs are collapsed (but be expandable) if 500 kilobytes have already been rendered. 97 98```ruby 99Gitlab::Git::DiffCollection.collection_limits[:max_files] = Commit::DIFF_HARD_LIMIT_FILES = 1000 100``` 101 102No more files are rendered at all if 1000 files have already been rendered. 103 104```ruby 105Gitlab::Git::DiffCollection.collection_limits[:max_lines] = Commit::DIFF_HARD_LIMIT_LINES = 50000 106``` 107 108No more files are rendered at all if 50,000 lines have already been rendered. 109 110```ruby 111Gitlab::Git::DiffCollection.collection_limits[:max_bytes] = Gitlab::Git::DiffCollection.collection_limits[:max_files] * 5.kilobytes = 5000.kilobytes 112``` 113 114No more files are rendered at all if 5 megabytes have already been rendered. 115 116All collection limit parameters are sent and applied on Gitaly. That is, after the limit is surpassed, 117Gitaly only returns the safe amount of data to be persisted on `merge_request_diff_files`. 118 119### Individual diff file limits 120 121Limits that act onto each diff file of a collection. Files number, lines number and files size are considered. 122 123#### Expandable patches (collapsed) 124 125Diff patches are collapsed when surpassing 10% of the value set in `ApplicationSettings#diff_max_patch_bytes`. 126That is, it's equivalent to 10kb if the maximum allowed value is 100kb. 127The diff is persisted and expandable if the patch size doesn't 128surpass `ApplicationSettings#diff_max_patch_bytes`. 129 130Although this nomenclature (Collapsing) is also used on Gitaly, this limit is only used on GitLab (hardcoded - not sent to Gitaly). 131Gitaly only returns `Diff.Collapsed` (RPC) when surpassing collection limits. 132 133#### Not expandable patches (too large) 134 135The patch not be rendered if it's larger than `ApplicationSettings#diff_max_patch_bytes`. 136Users see a `Changes are too large to be shown.` message and a button to view only that file in that commit. 137 138```ruby 139Commit::DIFF_SAFE_LINES = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_lines] = 5000 140``` 141 142File diff is suppressed (technically different from collapsed, but behaves the same, and is expandable) if it has more than 5000 lines. 143 144This limit is hardcoded and only applied on GitLab. 145 146## Viewers 147 148Diff Viewers, which can be found on `models/diff_viewer/*` are classes used to map metadata about each type of Diff File. It has information 149whether it's a binary, which partial should be used to render it or which File extensions this class accounts for. 150 151`DiffViewer::Base` validates _blobs_ (old and new versions) content, extension and file type in order to check if it can be rendered. 152 153## Merge request diffs against the `HEAD` of the target branch 154 155Historically, merge request diffs have been calculated by `git diff target...source` which compares the 156`HEAD` of the source branch with the merge base (or a common ancestor) of the target branch and the source's. 157This solution works well until the target branch starts containing some of the 158changes introduced by the source branch: Consider the following case, in which the source branch 159is `feature_a` and the target is `main`: 160 1611. Checkout a new branch `feature_a` from `main` and remove `file_a` and `file_b` in it. 1621. Add a commit that removes `file_a` to `main`. 163 164The merge request diff still contains the `file_a` removal while the actual diff compared to 165`main`'s `HEAD` has only the `file_b` removal. The diff with such redundant 166changes is harder to review. 167 168In order to display an up-to-date diff, in GitLab 12.9 we 169[introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/27008) merge request 170diffs compared against `HEAD` of the target branch: the 171target branch is artificially merged into the source branch, then the resulting 172merge ref is compared to the source branch in order to calculate an accurate 173diff. 174 175Until we complete the epics ["use merge refs for diffs"](https://gitlab.com/groups/gitlab-org/-/epics/854) 176and ["merge conflicts in diffs"](https://gitlab.com/groups/gitlab-org/-/epics/4893), 177both options `main (base)` and `main (HEAD)` are available to be displayed in merge requests: 178 179![Merge ref head options](img/merge_ref_head_options_v13_6.png) 180 181The `main (HEAD)` option is meant to replace `main (base)` in the future. 182 183In order to support comments for both options, diff note positions are stored for 184both `main (base)` and `main (HEAD)` versions ([introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/198457) in 12.10). 185The position for `main (base)` version is stored in `Note#position` and 186`Note#original_position` columns, for `main (HEAD)` version `DiffNotePosition` 187has been introduced. 188 189One of the key challenges to deal with when working on merge ref diffs are merge 190conflicts. If the target and source branch contains a merge conflict, the branches 191cannot be automatically merged. The 192<i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [recording on YouTube](https://www.youtube.com/watch?v=GFXIFA4ZuZw&feature=youtu.be&ab_channel=GitLabUnfiltered) 193is a quick introduction to the problem and the motivation behind the [epic](https://gitlab.com/groups/gitlab-org/-/epics/854). 194 195In 13.5 a solution for both-modified merge 196conflict has been 197[introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/232484). However, 198there are more classes of merge conflicts that are to be 199[addressed](https://gitlab.com/groups/gitlab-org/-/epics/4893) in the future. 200