1# Tips for reading Git source code 2 3Although Git has good documentation, sometimes you just need to read the 4code to understand how it works. This document collects some tips on how 5to approach [Git's source code](https://gitlab.com/gitlab-org/git). 6 7## Audience 8 9This is written for Gitaly developers and GitLab troubleshooters (SRE, support engineer). 10 11## Look at the right version 12 13If you want to understand Git's behavior by reading the source, make 14sure you are reading the right source. Find out the Git version of the 15system you're investigating and select or check out the appropriate tag 16in Git. 17 18## Use a viewer with code search 19 20Online code search is usually not that great compared with code search 21in an offline text editor, or on the terminal with `git grep`. 22 23## Look at the tests 24 25If you want to know something that is not clear from the documentation, 26sometimes the answer is in the tests. These can be found in the 27[`t/` subdirectory](https://gitlab.com/gitlab-org/git/tree/master/t). 28 29In [`t/helper`](https://gitlab.com/gitlab-org/git/tree/master/t/helper) 30you can find C executables that expose some Git internal functions that 31you normally cannot call directly. 32 33The tests themselves are written in shell script. Instructions for 34running them are in 35[`t/README`](https://gitlab.com/gitlab-org/git/blob/master/t/README). 36However, often you don't have to run a test in order to understand what 37it are does. 38 39If you're interested in the workings a particular Git command, try 40searching the `t/` directory for it. 41 42## Look at the technical documentation 43 44There is a lot of [technical 45documentation](https://gitlab.com/gitlab-org/git/tree/master/Documentation/technical) 46in the Git source. If you want to know more about file formats, internal Git API's or 47network protocols, this is a good place to start. 48 49## Code organization 50 51The Git subcommands we use to interact with Git are mostly (all?) found 52in the `builtin/` 53[directory](https://gitlab.com/gitlab-org/git/tree/master/builtin). For 54example, `git log` is 55[`builtin/log.c`](https://gitlab.com/gitlab-org/git/blob/master/builtin/log.c). 56 57The `.c` files at the top level of the Git repository contain code that 58is shared across sub-commands. For example, 59[`config.c`](https://gitlab.com/gitlab-org/git/blob/master/config.c) 60contains code related to getting and setting Git configuration values. 61Contrast this with `builtin/config.c`, which is the sub-command code for 62`git config`. 63 64When doing a code search for an error message you sometimes get false 65matches in the `po/` directory which contains localizations. You may 66want to ignore those or filter them out of your search. 67 68If you are trying to make sense of what some internal Git function does 69you can read its definition somewhere in a `*.c` file in the root. There 70may also be some extra explanation in the corresponding `*.h` (header) 71file; the header files define the API of the corresponding `*.c` file. 72 73## Sub-command source files 74 75### Not all sub-commands are written in C 76 77At the top level of the repository, you will find `*.sh` and `*.perl` 78files that implement some of Git's sub-commands. For example, 79[`git-bisect.sh`](https://gitlab.com/gitlab-org/git/blob/v2.22.0/git-bisect.sh). 80 81### Main function 82 83If you're used to reading Ruby or Go, the `builtin/*.c` files could be a 84little disorienting. This is because the function call graph is ordered 85with leaf functions at the top, and the main entrypoint will be at the 86bottom. This allows the Git source code to have fewer (or no) forward 87declarations of functions. 88 89So if you want to do a top-down walk of a Git sub-command, expect to 90find the main entry point at the bottom of the corresponding 91`builtin/*.c` file. The entry point for e.g. `git blame` will be called 92[`cmd_blame` in 93`builtin/blame.c`](https://gitlab.com/gitlab-org/git/blob/v2.22.0/builtin/blame.c#L778). 94Recall that hyphens are not allowed in function names, so the entry 95point for `git upload-pack` is `cmd_upload_pack`. 96 97Some functions are not where you expect them. For example, 98`cmd_format_patch` is in `builtin/log.c`. Use code search! 99 100### Global state 101 102The way we write Ruby and Go at GitLab, it is common to bundle and hide 103state in classes (Ruby) or structs (Go). Global state is rare. 104 105Things are different in Git. Builtin commands often use `static` 106(i.e. file-scoped) global state. This reduces the number of arguments 107that have to be passed to functions, just like having state in a Ruby 108class does. 109 110You usually find the global variables at the top of the file. 111 112## C trivia 113 114If you don't use C every day some things about it might be surprising. 115 116### Implicit use of "zero means false" 117 118In Ruby, you will never write `if some_number` because if `some_number` 119is a variable containing a number, that `if` is equivalent to `if true`. 120In Go, you are not allowed by the compiler to write `if someNumber {`. 121 122However, in C, it is OK to write `if (some_number)`: this is equivalent 123to `if (some_number != 0)`. Whether that is OK is a matter of style, and 124in Git, you will see that `if (some_number)` is common. 125 126A variation of this has to do with zero-terminated data structures such 127as classic C strings, and linked lists. The loop below will visit each 128character in the string. Note that the test condition of the loop, `*s`, 129will be `0` at the end of the string, and the loop will break. 130 131```C 132for (s = "some string"; *s; s++) 133``` 134 135You will see the same pattern with linked lists, where the test 136condition is the pointer to the current element. 137 138```C 139for (x = my_list; x; x = x-> next) 140``` 141 142This becomes even more cryptic if you are dealing with a `while` loop. 143 144```C 145while (x) 146``` 147