1# Tips for reading Git source code
2
3Although Git has good documentation, sometimes you just need to read the
4code to understand how it works. This document collects some tips on how
5to approach [Git's source code](https://gitlab.com/gitlab-org/git).
6
7## Audience
8
9This is written for Gitaly developers and GitLab troubleshooters (SRE, support engineer).
10
11## Look at the right version
12
13If you want to understand Git's behavior by reading the source, make
14sure you are reading the right source. Find out the Git version of the
15system you're investigating and select or check out the appropriate tag
16in Git.
17
18## Use a viewer with code search
19
20Online code search is usually not that great compared with code search
21in an offline text editor, or on the terminal with `git grep`.
22
23## Look at the tests
24
25If you want to know something that is not clear from the documentation,
26sometimes the answer is in the tests. These can be found in the
27[`t/` subdirectory](https://gitlab.com/gitlab-org/git/tree/master/t).
28
29In [`t/helper`](https://gitlab.com/gitlab-org/git/tree/master/t/helper)
30you can find C executables that expose some Git internal functions that
31you normally cannot call directly.
32
33The tests themselves are written in shell script. Instructions for
34running them are in
35[`t/README`](https://gitlab.com/gitlab-org/git/blob/master/t/README).
36However, often you don't have to run a test in order to understand what
37it are does.
38
39If you're interested in the workings a particular Git command, try
40searching the `t/` directory for it.
41
42## Look at the technical documentation
43
44There is a lot of [technical
45documentation](https://gitlab.com/gitlab-org/git/tree/master/Documentation/technical)
46in the Git source. If you want to know more about file formats, internal Git API's or
47network protocols, this is a good place to start.
48
49## Code organization
50
51The Git subcommands we use to interact with Git are mostly (all?) found
52in the `builtin/`
53[directory](https://gitlab.com/gitlab-org/git/tree/master/builtin). For
54example, `git log` is
55[`builtin/log.c`](https://gitlab.com/gitlab-org/git/blob/master/builtin/log.c).
56
57The `.c` files at the top level of the Git repository contain code that
58is shared across sub-commands. For example,
59[`config.c`](https://gitlab.com/gitlab-org/git/blob/master/config.c)
60contains code related to getting and setting Git configuration values.
61Contrast this with `builtin/config.c`, which is the sub-command code for
62`git config`.
63
64When doing a code search for an error message you sometimes get false
65matches in the `po/` directory which contains localizations. You may
66want to ignore those or filter them out of your search.
67
68If you are trying to make sense of what some internal Git function does
69you can read its definition somewhere in a `*.c` file in the root. There
70may also be some extra explanation in the corresponding `*.h` (header)
71file; the header files define the API of the corresponding `*.c` file.
72
73## Sub-command source files
74
75### Not all sub-commands are written in C
76
77At the top level of the repository, you will find `*.sh` and `*.perl`
78files that implement some of Git's sub-commands. For example,
79[`git-bisect.sh`](https://gitlab.com/gitlab-org/git/blob/v2.22.0/git-bisect.sh).
80
81### Main function
82
83If you're used to reading Ruby or Go, the `builtin/*.c` files could be a
84little disorienting. This is because the function call graph is ordered
85with leaf functions at the top, and the main entrypoint will be at the
86bottom. This allows the Git source code to have fewer (or no) forward
87declarations of functions.
88
89So if you want to do a top-down walk of a Git sub-command, expect to
90find the main entry point at the bottom of the corresponding
91`builtin/*.c` file. The entry point for e.g. `git blame` will be called
92[`cmd_blame` in
93`builtin/blame.c`](https://gitlab.com/gitlab-org/git/blob/v2.22.0/builtin/blame.c#L778).
94Recall that hyphens are not allowed in function names, so the entry
95point for `git upload-pack` is `cmd_upload_pack`.
96
97Some functions are not where you expect them. For example,
98`cmd_format_patch` is in `builtin/log.c`. Use code search!
99
100### Global state
101
102The way we write Ruby and Go at GitLab, it is common to bundle and hide
103state in classes (Ruby) or structs (Go). Global state is rare.
104
105Things are different in Git. Builtin commands often use `static`
106(i.e. file-scoped) global state. This reduces the number of arguments
107that have to be passed to functions, just like having state in a Ruby
108class does.
109
110You usually find the global variables at the top of the file.
111
112## C trivia
113
114If you don't use C every day some things about it might be surprising.
115
116### Implicit use of "zero means false"
117
118In Ruby, you will never write `if some_number` because if `some_number`
119is a variable containing a number, that `if` is equivalent to `if true`.
120In Go, you are not allowed by the compiler to write `if someNumber {`.
121
122However, in C, it is OK to write `if (some_number)`: this is equivalent
123to `if (some_number != 0)`. Whether that is OK is a matter of style, and
124in Git, you will see that `if (some_number)` is common.
125
126A variation of this has to do with zero-terminated data structures such
127as classic C strings, and linked lists. The loop below will visit each
128character in the string. Note that the test condition of the loop, `*s`,
129will be `0` at the end of the string, and the loop will break.
130
131```C
132for (s = "some string"; *s; s++)
133```
134
135You will see the same pattern with linked lists, where the test
136condition is the pointer to the current element.
137
138```C
139for (x = my_list; x; x = x-> next)
140```
141
142This becomes even more cryptic if you are dealing with a `while` loop.
143
144```C
145while (x)
146```
147