1==============================
2Moving LLVM Projects to GitHub
3==============================
4
5Current Status
6==============
7
8We are planning to complete the transition to GitHub by Oct 21, 2019.  See
9the GitHub migration `status page <https://llvm.org/GitHubMigrationStatus.html>`_
10for the latest updates and instructions for how to migrate your workflows.
11
12.. contents:: Table of Contents
13  :depth: 4
14  :local:
15
16Introduction
17============
18
19This is a proposal to move our current revision control system from our own
20hosted Subversion to GitHub. Below are the financial and technical arguments as
21to why we are proposing such a move and how people (and validation
22infrastructure) will continue to work with a Git-based LLVM.
23
24What This Proposal is *Not* About
25=================================
26
27Changing the development policy.
28
29This proposal relates only to moving the hosting of our source-code repository
30from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
31using GitHub's issue tracker, pull-requests, or code-review.
32
33Contributors will continue to earn commit access on demand under the Developer
34Policy, except that that a GitHub account will be required instead of SVN
35username/password-hash.
36
37Why Git, and Why GitHub?
38========================
39
40Why Move At All?
41----------------
42
43This discussion began because we currently host our own Subversion server
44and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
45provides limited support, but there is only so much it can do.
46
47Volunteers are not sysadmins themselves, but compiler engineers that happen
48to know a thing or two about hosting servers. We also don't have 24/7 support,
49and we sometimes wake up to see that continuous integration is broken because
50the SVN server is either down or unresponsive.
51
52We should take advantage of one of the services out there (GitHub, GitLab,
53and BitBucket, among others) that offer better service (24/7 stability, disk
54space, Git server, code browsing, forking facilities, etc) for free.
55
56Why Git?
57--------
58
59Many new coders nowadays start with Git, and a lot of people have never used
60SVN, CVS, or anything else. Websites like GitHub have changed the landscape
61of open source contributions, reducing the cost of first contribution and
62fostering collaboration.
63
64Git is also the version control many LLVM developers use. Despite the
65sources being stored in a SVN server, these developers are already using Git
66through the Git-SVN integration.
67
68Git allows you to:
69
70* Commit, squash, merge, and fork locally without touching the remote server.
71* Maintain local branches, enabling multiple threads of development.
72* Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
73* Inspect the repository history (blame, log, bisect) without Internet access.
74* Maintain remote forks and branches on Git hosting services and
75  integrate back to the main repository.
76
77In addition, because Git seems to be replacing many OSS projects' version
78control systems, there are many tools that are built over Git.
79Future tooling may support Git first (if not only).
80
81Why GitHub?
82-----------
83
84GitHub, like GitLab and BitBucket, provides free code hosting for open source
85projects. Any of these could replace the code-hosting infrastructure that we
86have today.
87
88These services also have a dedicated team to monitor, migrate, improve and
89distribute the contents of the repositories depending on region and load.
90
91GitHub has one important advantage over GitLab and
92BitBucket: it offers read-write **SVN** access to the repository
93(https://github.com/blog/626-announcing-svn-support).
94This would enable people to continue working post-migration as though our code
95were still canonically in an SVN repository.
96
97In addition, there are already multiple LLVM mirrors on GitHub, indicating that
98part of our community has already settled there.
99
100On Managing Revision Numbers with Git
101-------------------------------------
102
103The current SVN repository hosts all the LLVM sub-projects alongside each other.
104A single revision number (e.g. r123456) thus identifies a consistent version of
105all LLVM sub-projects.
106
107Git does not use sequential integer revision number but instead uses a hash to
108identify each commit.
109
110The loss of a sequential integer revision number has been a sticking point in
111past discussions about Git:
112
113- "The 'branch' I most care about is mainline, and losing the ability to say
114  'fixed in r1234' (with some sort of monotonically increasing number) would
115  be a tragic loss." [LattnerRevNum]_
116- "I like those results sorted by time and the chronology should be obvious, but
117  timestamps are incredibly cumbersome and make it difficult to verify that a
118  given checkout matches a given set of results." [TrickRevNum]_
119- "There is still the major regression with unreadable version numbers.
120  Given the amount of Bugzilla traffic with 'Fixed in...', that's a
121  non-trivial issue." [JSonnRevNum]_
122- "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
123
124However, Git can emulate this increasing revision number:
125``git rev-list --count <commit-hash>``. This identifier is unique only
126within a single branch, but this means the tuple `(num, branch-name)` uniquely
127identifies a commit.
128
129We can thus use this revision number to ensure that e.g. `clang -v` reports a
130user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
131the objections raised above with respect to this aspect of Git.
132
133What About Branches and Merges?
134-------------------------------
135
136In contrast to SVN, Git makes branching easy. Git's commit history is
137represented as a DAG, a departure from SVN's linear history. However, we propose
138to mandate making merge commits illegal in our canonical Git repository.
139
140Unfortunately, GitHub does not support server side hooks to enforce such a
141policy.  We must rely on the community to avoid pushing merge commits.
142
143GitHub offers a feature called `Status Checks`: a branch protected by
144`status checks` requires commits to be explicitly allowed before the push can happen.
145We could supply a pre-push hook on the client side that would run and check the
146history, before allowing the commit being pushed [statuschecks]_.
147However this solution would be somewhat fragile (how do you update a script
148installed on every developer machine?) and prevents SVN access to the
149repository.
150
151What About Commit Emails?
152-------------------------
153
154We will need a new bot to send emails for each commit. This proposal leaves the
155email format unchanged besides the commit URL.
156
157Straw Man Migration Plan
158========================
159
160Step #1 : Before The Move
161-------------------------
162
1631. Update docs to mention the move, so people are aware of what is going on.
1642. Set up a read-only version of the GitHub project, mirroring our current SVN
165   repository.
1663. Add the required bots to implement the commit emails, as well as the
167   umbrella repository update (if the multirepo is selected) or the read-only
168   Git views for the sub-projects (if the monorepo is selected).
169
170Step #2 : Git Move
171------------------
172
1734. Update the buildbots to pick up updates and commits from the GitHub
174   repository. Not all bots have to migrate at this point, but it'll help
175   provide infrastructure testing.
1765. Update Phabricator to pick up commits from the GitHub repository.
1776. LNT and llvmlab have to be updated: they rely on unique monotonically
178   increasing integer across branch [MatthewsRevNum]_.
1797. Instruct downstream integrators to pick up commits from the GitHub
180   repository.
1818. Review and prepare an update for the LLVM documentation.
182
183Until this point nothing has changed for developers, it will just
184boil down to a lot of work for buildbot and other infrastructure
185owners.
186
187The migration will pause here until all dependencies have cleared, and all
188problems have been solved.
189
190Step #3: Write Access Move
191--------------------------
192
1939. Collect developers' GitHub account information, and add them to the project.
19410. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
19511. Update the documentation.
19612. Mirror Git to SVN.
197
198Step #4 : Post Move
199-------------------
200
20113. Archive the SVN repository.
20214. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
203    point to GitHub instead.
204
205GitHub Repository Description
206=============================
207
208Monorepo
209----------------
210
211The LLVM git repository hosted at https://github.com/llvm/llvm-project contains all
212sub-projects in a single source tree.  It is often referred to as a monorepo and
213mimics an export of the current SVN repository, with each sub-project having its
214own top-level directory. Not all sub-projects are used for building toolchains.
215For example, www/ and test-suite/ are not part of the monorepo.
216
217Putting all sub-projects in a single checkout makes cross-project refactoring
218naturally simple:
219
220 * New sub-projects can be trivially split out for better reuse and/or layering
221   (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
222   dependency on LLVM).
223 * Changing an API in LLVM and upgrading the sub-projects will always be done in
224   a single commit, designing away a common source of temporary build breakage.
225 * Moving code across sub-project (during refactoring for instance) in a single
226   commit enables accurate `git blame` when tracking code change history.
227 * Tooling based on `git grep` works natively across sub-projects, allowing to
228   easier find refactoring opportunities across projects (for example reusing a
229   datastructure initially in LLDB by moving it into libSupport).
230 * Having all the sources present encourages maintaining the other sub-projects
231   when changing API.
232
233Finally, the monorepo maintains the property of the existing SVN repository that
234the sub-projects move synchronously, and a single revision number (or commit
235hash) identifies the state of the development across all projects.
236
237.. _build_single_project:
238
239Building a single sub-project
240^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
241
242Even though there is a single source tree, you are not required to build
243all sub-projects together.  It is trivial to configure builds for a single
244sub-project.
245
246For example::
247
248  mkdir build && cd build
249  # Configure only LLVM (default)
250  cmake path/to/monorepo
251  # Configure LLVM and lld
252  cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
253  # Configure LLVM and clang
254  cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
255
256.. _git-svn-mirror:
257
258Outstanding Questions
259---------------------
260
261Read-only sub-project mirrors
262^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
263
264With the Monorepo, it is undecided whether the existing single-subproject
265mirrors (e.g. https://git.llvm.org/git/compiler-rt.git) will continue to
266be maintained.
267
268Read/write SVN bridge
269^^^^^^^^^^^^^^^^^^^^^
270
271GitHub supports a read/write SVN bridge for its repositories.  However,
272there have been issues with this bridge working correctly in the past,
273so it's not clear if this is something that will be supported going forward.
274
275Monorepo Drawbacks
276------------------
277
278 * Using the monolithic repository may add overhead for those contributing to a
279   standalone sub-project, particularly on runtimes like libcxx and compiler-rt
280   that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
281   1GB for the monorepo), and the commit rate of LLVM may cause more frequent
282   `git push` collisions when upstreaming. Affected contributors may be able to
283   use the SVN bridge or the single-subproject Git mirrors. However, it's
284   undecided if these projects will continue to be maintained.
285 * Using the monolithic repository may add overhead for those *integrating* a
286   standalone sub-project, even if they aren't contributing to it, due to the
287   same disk space concern as the point above. The availability of the
288   sub-project Git mirrors would addresses this.
289 * Preservation of the existing read/write SVN-based workflows relies on the
290   GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
291   into GitHub and could restrict future workflow changes.
292
293Workflows
294^^^^^^^^^
295
296 * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
297 * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
298 * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
299 * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
300 * :ref:`Bisecting <workflow-mono-bisecting>`.
301
302Workflow Before/After
303=====================
304
305This section goes through a few examples of workflows, intended to illustrate
306how end-users or developers would interact with the repository for
307various use-cases.
308
309.. _workflow-checkout-commit:
310
311Checkout/Clone a Single Project, with Commit Access
312---------------------------------------------------
313
314Currently
315^^^^^^^^^
316
317::
318
319  # direct SVN checkout
320  svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
321  # or using the read-only Git view, with git-svn
322  git clone https://llvm.org/git/llvm.git
323  cd llvm
324  git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
325  git config svn-remote.svn.fetch :refs/remotes/origin/master
326  git svn rebase -l  # -l avoids fetching ahead of the git mirror.
327
328Commits are performed using `svn commit` or with the sequence `git commit` and
329`git svn dcommit`.
330
331.. _workflow-multicheckout-nocommit:
332
333Monorepo Variant
334^^^^^^^^^^^^^^^^
335
336With the monorepo variant, there are a few options, depending on your
337constraints. First, you could just clone the full repository:
338
339git clone https://github.com/llvm/llvm-project.git
340
341At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
342:ref:`doesn't imply you have to build all of them <build_single_project>`. You
343can still build only compiler-rt for instance. In this way it's not different
344from someone who would check out all the projects with SVN today.
345
346If you want to avoid checking out all the sources, you can hide the other
347directories using a Git sparse checkout::
348
349  git config core.sparseCheckout true
350  echo /compiler-rt > .git/info/sparse-checkout
351  git read-tree -mu HEAD
352
353The data for all sub-projects is still in your `.git` directory, but in your
354checkout, you only see `compiler-rt`.
355Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
356usual.
357
358Note that when you fetch you'll likely pull in changes to sub-projects you don't
359care about. If you are using sparse checkout, the files from other projects
360won't appear on your disk. The only effect is that your commit hash changes.
361
362You can check whether the changes in the last fetch are relevant to your commit
363by running::
364
365  git log origin/master@{1}..origin/master -- libcxx
366
367This command can be hidden in a script so that `git llvmpush` would perform all
368these steps, fail only if such a dependent change exists, and show immediately
369the change that prevented the push. An immediate repeat of the command would
370(almost) certainly result in a successful push.
371Note that today with SVN or git-svn, this step is not possible since the
372"rebase" implicitly happens while committing (unless a conflict occurs).
373
374Checkout/Clone Multiple Projects, with Commit Access
375----------------------------------------------------
376
377Let's look how to assemble llvm+clang+libcxx at a given revision.
378
379Currently
380^^^^^^^^^
381
382::
383
384  svn co https://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
385  cd llvm/tools
386  svn co https://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
387  cd ../projects
388  svn co https://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
389
390Or using git-svn::
391
392  git clone https://llvm.org/git/llvm.git
393  cd llvm/
394  git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
395  git config svn-remote.svn.fetch :refs/remotes/origin/master
396  git svn rebase -l
397  git checkout `git svn find-rev -B r258109`
398  cd tools
399  git clone https://llvm.org/git/clang.git
400  cd clang/
401  git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
402  git config svn-remote.svn.fetch :refs/remotes/origin/master
403  git svn rebase -l
404  git checkout `git svn find-rev -B r258109`
405  cd ../../projects/
406  git clone https://llvm.org/git/libcxx.git
407  cd libcxx
408  git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
409  git config svn-remote.svn.fetch :refs/remotes/origin/master
410  git svn rebase -l
411  git checkout `git svn find-rev -B r258109`
412
413Note that the list would be longer with more sub-projects.
414
415.. _workflow-monocheckout-multicommit:
416
417Monorepo Variant
418^^^^^^^^^^^^^^^^
419
420The repository contains natively the source for every sub-projects at the right
421revision, which makes this straightforward::
422
423  git clone https://github.com/llvm/llvm-project.git
424  cd llvm-projects
425  git checkout $REVISION
426
427As before, at this point clang, llvm, and libcxx are stored in directories
428alongside each other.
429
430.. _workflow-cross-repo-commit:
431
432Commit an API Change in LLVM and Update the Sub-projects
433--------------------------------------------------------
434
435Today this is possible, even though not common (at least not documented) for
436subversion users and for git-svn users. For example, few Git users try to update
437LLD or Clang in the same commit as they change an LLVM API.
438
439The multirepo variant does not address this: one would have to commit and push
440separately in every individual repository. It would be possible to establish a
441protocol whereby users add a special token to their commit messages that causes
442the umbrella repo's updater bot to group all of them into a single revision.
443
444The monorepo variant handles this natively.
445
446Branching/Stashing/Updating for Local Development or Experiments
447----------------------------------------------------------------
448
449Currently
450^^^^^^^^^
451
452SVN does not allow this use case, but developers that are currently using
453git-svn can do it. Let's look in practice what it means when dealing with
454multiple sub-projects.
455
456To update the repository to tip of trunk::
457
458  git pull
459  cd tools/clang
460  git pull
461  cd ../../projects/libcxx
462  git pull
463
464To create a new branch::
465
466  git checkout -b MyBranch
467  cd tools/clang
468  git checkout -b MyBranch
469  cd ../../projects/libcxx
470  git checkout -b MyBranch
471
472To switch branches::
473
474  git checkout AnotherBranch
475  cd tools/clang
476  git checkout AnotherBranch
477  cd ../../projects/libcxx
478  git checkout AnotherBranch
479
480.. _workflow-mono-branching:
481
482Monorepo Variant
483^^^^^^^^^^^^^^^^
484
485Regular Git commands are sufficient, because everything is in a single
486repository:
487
488To update the repository to tip of trunk::
489
490  git pull
491
492To create a new branch::
493
494  git checkout -b MyBranch
495
496To switch branches::
497
498  git checkout AnotherBranch
499
500Bisecting
501---------
502
503Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
504
505Currently
506^^^^^^^^^
507
508SVN does not have builtin bisection support, but the single revision across
509sub-projects makes it possible to script around.
510
511Using the existing Git read-only view of the repositories, it is possible to use
512the native Git bisection script over the llvm repository, and use some scripting
513to synchronize the clang repository to match the llvm revision.
514
515.. _workflow-mono-bisecting:
516
517Monorepo Variant
518^^^^^^^^^^^^^^^^
519
520Bisecting on the monorepo is straightforward, and very similar to the above,
521except that the bisection script does not need to include the
522`git submodule update` step.
523
524The same example, finding which commit introduces a regression where clang-3.9
525crashes but not clang-3.8 passes, will look like::
526
527  git bisect start releases/3.9.x releases/3.8.x
528  git bisect run ./bisect_script.sh
529
530With the `bisect_script.sh` script being::
531
532  #!/bin/sh
533  cd $BUILD_DIR
534
535  ninja clang || exit 125   # an exit code of 125 asks "git bisect"
536                            # to "skip" the current commit
537
538  ./bin/clang some_crash_test.cpp
539
540Also, since the monorepo handles commits update across multiple projects, you're
541less like to encounter a build failure where a commit change an API in LLVM and
542another later one "fixes" the build in clang.
543
544Moving Local Branches to the Monorepo
545=====================================
546
547Suppose you have been developing against the existing LLVM git
548mirrors.  You have one or more git branches that you want to migrate
549to the "final monorepo".
550
551The simplest way to migrate such branches is with the
552``migrate-downstream-fork.py`` tool at
553https://github.com/jyknight/llvm-git-migration.
554
555Basic migration
556---------------
557
558Basic instructions for ``migrate-downstream-fork.py`` are in the
559Python script and are expanded on below to a more general recipe::
560
561  # Make a repository which will become your final local mirror of the
562  # monorepo.
563  mkdir my-monorepo
564  git -C my-monorepo init
565
566  # Add a remote to the monorepo.
567  git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git
568
569  # Add remotes for each git mirror you use, from upstream as well as
570  # your local mirror.  All projects are listed here but you need only
571  # import those for which you have local branches.
572  my_projects=( clang
573                clang-tools-extra
574                compiler-rt
575                debuginfo-tests
576                libcxx
577                libcxxabi
578                libunwind
579                lld
580                lldb
581                llvm
582                openmp
583                polly )
584  for p in ${my_projects[@]}; do
585    git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git
586    git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git
587  done
588
589  # Pull in all the commits.
590  git -C my-monorepo fetch --all
591
592  # Run migrate-downstream-fork to rewrite local branches on top of
593  # the upstream monorepo.
594  (
595     cd my-monorepo
596     migrate-downstream-fork.py \
597       refs/remotes/local \
598       refs/tags \
599       --new-repo-prefix=refs/remotes/upstream/monorepo \
600       --old-repo-prefix=refs/remotes/upstream/split \
601       --source-kind=split \
602       --revmap-out=monorepo-map.txt
603  )
604
605  # Octopus-merge the resulting local split histories to unify them.
606
607  # Assumes local work on local split mirrors is on master (and
608  # upstream is presumably represented by some other branch like
609  # upstream/master).
610  my_local_branch="master"
611
612  git -C my-monorepo branch --no-track local/octopus/master \
613    $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \
614                                    refs/remotes/local/split/llvm/${my_local_branch})
615  git -C my-monorepo checkout local/octopus/${my_local_branch}
616
617  subproject_branches=()
618  for p in ${my_projects[@]}; do
619    subproject_branch=${p}/local/monorepo/${my_local_branch}
620    git -C my-monorepo branch ${subproject_branch} \
621      refs/remotes/local/split/${p}/${my_local_branch}
622    if [[ "${p}" != "llvm" ]]; then
623      subproject_branches+=( ${subproject_branch} )
624    fi
625  done
626
627  git -C my-monorepo merge ${subproject_branches[@]}
628
629  for p in ${my_projects[@]}; do
630    subproject_branch=${p}/local/monorepo/${my_local_branch}
631    git -C my-monorepo branch -d ${subproject_branch}
632  done
633
634  # Create local branches for upstream monorepo branches.
635  for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
636                   refs/remotes/upstream/monorepo); do
637    upstream_branch=${ref#refs/remotes/upstream/monorepo/}
638    git -C my-monorepo branch upstream/${upstream_branch} ${ref}
639  done
640
641The above gets you to a state like the following::
642
643  U1 - U2 - U3 <- upstream/master
644    \   \    \
645     \   \    - Llld1 - Llld2 -
646      \   \                    \
647       \   - Lclang1 - Lclang2-- Lmerge <- local/octopus/master
648        \                      /
649         - Lllvm1 - Lllvm2-----
650
651Each branched component has its branch rewritten on top of the
652monorepo and all components are unified by a giant octopus merge.
653
654If additional active local branches need to be preserved, the above
655operations following the assignment to ``my_local_branch`` should be
656done for each branch.  Ref paths will need to be updated to map the
657local branch to the corresponding upstream branch.  If local branches
658have no corresponding upstream branch, then the creation of
659``local/octopus/<local branch>`` need not use ``git-merge-base`` to
660pinpoint its root commit; it may simply be branched from the
661appropriate component branch (say, ``llvm/local_release_X``).
662
663Zipping local history
664---------------------
665
666The octopus merge is suboptimal for many cases, because walking back
667through the history of one component leaves the other components fixed
668at a history that likely makes things unbuildable.
669
670Some downstream users track the order commits were made to subprojects
671with some kind of "umbrella" project that imports the project git
672mirrors as submodules, similar to the multirepo umbrella proposed
673above.  Such an umbrella repository looks something like this::
674
675   UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master
676   |        |             |        |        |        |        |
677  Lllvm1   Llld1         Lclang1  Lclang2  Lllvm2   Llld2     Lmyproj1
678
679The vertical bars represent submodule updates to a particular local
680commit in the project mirror.  ``UM3`` in this case is a commit of
681some local umbrella repository state that is not a submodule update,
682perhaps a ``README`` or project build script update.  Commit ``UM8``
683updates a submodule of local project ``myproj``.
684
685The tool ``zip-downstream-fork.py`` at
686https://github.com/greened/llvm-git-migration/tree/zip can be used to
687convert the umbrella history into a monorepo-based history with
688commits in the order implied by submodule updates::
689
690  U1 - U2 - U3 <- upstream/master
691   \    \    \
692    \    -----\---------------                                    local/zip--.
693     \         \              \                                               |
694    - Lllvm1 - Llld1 - UM3 -  Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-'
695
696
697The ``U*`` commits represent upstream commits to the monorepo master
698branch.  Each submodule update in the local ``UM*`` commits brought in
699a subproject tree at some local commit.  The trees in the ``L*1``
700commits represent merges from upstream.  These result in edges from
701the ``U*`` commits to their corresponding rewritten ``L*1`` commits.
702The ``L*2`` commits did not do any merges from upstream.
703
704Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but
705if, say, ``U3`` changed some files in upstream clang, the ``Lclang1``
706commit appearing after the ``Llld1`` commit would actually represent a
707clang tree *earlier* in the upstream clang history.  We want the
708``local/zip`` branch to accurately represent the state of our umbrella
709history and so the edge ``U2 -> Lclang1`` is a visual reminder of what
710clang's tree actually looks like in ``Lclang1``.
711
712Even so, the edge ``U3 -> Llld1`` could be problematic for future
713merges from upstream.  git will think that we've already merged from
714``U3``, and we have, except for the state of the clang tree.  One
715possible mitigation strategy is to manually diff clang between ``U2``
716and ``U3`` and apply those updates to ``local/zip``.  Another,
717possibly simpler strategy is to freeze local work on downstream
718branches and merge all submodules from the latest upstream before
719running ``zip-downstream-fork.py``.  If downstream merged each project
720from upstream in lockstep without any intervening local commits, then
721things should be fine without any special action.  We anticipate this
722to be the common case.
723
724The tree for ``Lclang1`` outside of clang will represent the state of
725things at ``U3`` since all of the upstream projects not participating
726in the umbrella history should be in a state respecting the commit
727``U3``.  The trees for llvm and lld should correctly represent commits
728``Lllvm1`` and ``Llld1``, respectively.
729
730Commit ``UM3`` changed files not related to submodules and we need
731somewhere to put them.  It is not safe in general to put them in the
732monorepo root directory because they may conflict with files in the
733monorepo.  Let's assume we want them in a directory ``local`` in the
734monorepo.
735
736**Example 1: Umbrella looks like the monorepo**
737
738For this example, we'll assume that each subproject appears in its own
739top-level directory in the umbrella, just as they do in the monorepo .
740Let's also assume that we want the files in directory ``myproj`` to
741appear in ``local/myproj``.
742
743Given the above run of ``migrate-downstream-fork.py``, a recipe to
744create the zipped history is below::
745
746  # Import any non-LLVM repositories the umbrella references.
747  git -C my-monorepo remote add localrepo \
748                                https://my.local.mirror.org/localrepo.git
749  git fetch localrepo
750
751  subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
752                libcxx libcxxabi libunwind lld lldb llgo llvm openmp
753                parallel-libs polly pstl )
754
755  # Import histories for upstream split projects (this was probably
756  # already done for the ``migrate-downstream-fork.py`` run).
757  for project in ${subprojects[@]}; do
758    git remote add upstream/split/${project} \
759                   https://github.com/llvm-mirror/${subproject}.git
760    git fetch umbrella/split/${project}
761  done
762
763  # Import histories for downstream split projects (this was probably
764  # already done for the ``migrate-downstream-fork.py`` run).
765  for project in ${subprojects[@]}; do
766    git remote add local/split/${project} \
767                   https://my.local.mirror.org/${subproject}.git
768    git fetch local/split/${project}
769  done
770
771  # Import umbrella history.
772  git -C my-monorepo remote add umbrella \
773                                https://my.local.mirror.org/umbrella.git
774  git fetch umbrella
775
776  # Put myproj in local/myproj
777  echo "myproj local/myproj" > my-monorepo/submodule-map.txt
778
779  # Rewrite history
780  (
781    cd my-monorepo
782    zip-downstream-fork.py \
783      refs/remotes/umbrella \
784      --new-repo-prefix=refs/remotes/upstream/monorepo \
785      --old-repo-prefix=refs/remotes/upstream/split \
786      --revmap-in=monorepo-map.txt \
787      --revmap-out=zip-map.txt \
788      --subdir=local \
789      --submodule-map=submodule-map.txt \
790      --update-tags
791   )
792
793   # Create the zip branch (assuming umbrella master is wanted).
794   git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
795
796Note that if the umbrella has submodules to non-LLVM repositories,
797``zip-downstream-fork.py`` needs to know about them to be able to
798rewrite commits.  That is why the first step above is to fetch commits
799from such repositories.
800
801With ``--update-tags`` the tool will migrate annotated tags pointing
802to submodule commits that were inlined into the zipped history.  If
803the umbrella pulled in an upstream commit that happened to have a tag
804pointing to it, that tag will be migrated, which is almost certainly
805not what is wanted.  The tag can always be moved back to its original
806commit after rewriting, or the ``--update-tags`` option may be
807discarded and any local tags would then be migrated manually.
808
809**Example 2: Nested sources layout**
810
811The tool handles nested submodules (e.g. llvm is a submodule in
812umbrella and clang is a submodule in llvm).  The file
813``submodule-map.txt`` is a list of pairs, one per line.  The first
814pair item describes the path to a submodule in the umbrella
815repository.  The second pair item describes the path where trees for
816that submodule should be written in the zipped history.
817
818Let's say your umbrella repository is actually the llvm repository and
819it has submodules in the "nested sources" layout (clang in
820tools/clang, etc.).  Let's also say ``projects/myproj`` is a submodule
821pointing to some downstream repository.  The submodule map file should
822look like this (we still want myproj mapped the same way as
823previously)::
824
825  tools/clang clang
826  tools/clang/tools/extra clang-tools-extra
827  projects/compiler-rt compiler-rt
828  projects/debuginfo-tests debuginfo-tests
829  projects/libclc libclc
830  projects/libcxx libcxx
831  projects/libcxxabi libcxxabi
832  projects/libunwind libunwind
833  tools/lld lld
834  tools/lldb lldb
835  projects/openmp openmp
836  tools/polly polly
837  projects/myproj local/myproj
838
839If a submodule path does not appear in the map, the tools assumes it
840should be placed in the same place in the monorepo.  That means if you
841use the "nested sources" layout in your umrella, you *must* provide
842map entries for all of the projects in your umbrella (except llvm).
843Otherwise trees from submodule updates will appear underneath llvm in
844the zippped history.
845
846Because llvm is itself the umbrella, we use --subdir to write its
847content into ``llvm`` in the zippped history::
848
849  # Import any non-LLVM repositories the umbrella references.
850  git -C my-monorepo remote add localrepo \
851                                https://my.local.mirror.org/localrepo.git
852  git fetch localrepo
853
854  subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
855                libcxx libcxxabi libunwind lld lldb llgo llvm openmp
856                parallel-libs polly pstl )
857
858  # Import histories for upstream split projects (this was probably
859  # already done for the ``migrate-downstream-fork.py`` run).
860  for project in ${subprojects[@]}; do
861    git remote add upstream/split/${project} \
862                   https://github.com/llvm-mirror/${subproject}.git
863    git fetch umbrella/split/${project}
864  done
865
866  # Import histories for downstream split projects (this was probably
867  # already done for the ``migrate-downstream-fork.py`` run).
868  for project in ${subprojects[@]}; do
869    git remote add local/split/${project} \
870                   https://my.local.mirror.org/${subproject}.git
871    git fetch local/split/${project}
872  done
873
874  # Import umbrella history.  We want this under a different refspec
875  # so zip-downstream-fork.py knows what it is.
876  git -C my-monorepo remote add umbrella \
877                                 https://my.local.mirror.org/llvm.git
878  git fetch umbrella
879
880  # Create the submodule map.
881  echo "tools/clang clang" > my-monorepo/submodule-map.txt
882  echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt
883  echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt
884  echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt
885  echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt
886  echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt
887  echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt
888  echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt
889  echo "tools/lld lld" >> my-monorepo/submodule-map.txt
890  echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt
891  echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt
892  echo "tools/polly polly" >> my-monorepo/submodule-map.txt
893  echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt
894
895  # Rewrite history
896  (
897    cd my-monorepo
898    zip-downstream-fork.py \
899      refs/remotes/umbrella \
900      --new-repo-prefix=refs/remotes/upstream/monorepo \
901      --old-repo-prefix=refs/remotes/upstream/split \
902      --revmap-in=monorepo-map.txt \
903      --revmap-out=zip-map.txt \
904      --subdir=llvm \
905      --submodule-map=submodule-map.txt \
906      --update-tags
907   )
908
909   # Create the zip branch (assuming umbrella master is wanted).
910   git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
911
912
913Comments at the top of ``zip-downstream-fork.py`` describe in more
914detail how the tool works and various implications of its operation.
915
916Importing local repositories
917----------------------------
918
919You may have additional repositories that integrate with the LLVM
920ecosystem, essentially extending it with new tools.  If such
921repositories are tightly coupled with LLVM, it may make sense to
922import them into your local mirror of the monorepo.
923
924If such repositories participated in the umbrella repository used
925during the zipping process above, they will automatically be added to
926the monorepo.  For downstream repositories that don't participate in
927an umbrella setup, the ``import-downstream-repo.py`` tool at
928https://github.com/greened/llvm-git-migration/tree/import can help with
929getting them into the monorepo.  A recipe follows::
930
931  # Import downstream repo history into the monorepo.
932  git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git
933  git fetch myrepo
934
935  my_local_tags=( refs/tags/release
936                  refs/tags/hotfix )
937
938  (
939    cd my-monorepo
940    import-downstream-repo.py \
941      refs/remotes/myrepo \
942      ${my_local_tags[@]} \
943      --new-repo-prefix=refs/remotes/upstream/monorepo \
944      --subdir=myrepo \
945      --tag-prefix="myrepo-"
946   )
947
948   # Preserve release branches.
949   for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
950                  refs/remotes/myrepo/release); do
951     branch=${ref#refs/remotes/myrepo/}
952     git -C my-monorepo branch --no-track myrepo/${branch} ${ref}
953   done
954
955   # Preserve master.
956   git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master
957
958   # Merge master.
959   git -C my-monorepo checkout local/zip/master  # Or local/octopus/master
960   git -C my-monorepo merge myrepo/master
961
962You may want to merge other corresponding branches, for example
963``myrepo`` release branches if they were in lockstep with LLVM project
964releases.
965
966``--tag-prefix`` tells ``import-downstream-repo.py`` to rename
967annotated tags with the given prefix.  Due to limitations with
968``fast_filter_branch.py``, unannotated tags cannot be renamed
969(``fast_filter_branch.py`` considers them branches, not tags).  Since
970the upstream monorepo had its tags rewritten with an "llvmorg-"
971prefix, name conflicts should not be an issue.  ``--tag-prefix`` can
972be used to more clearly indicate which tags correspond to various
973imported repositories.
974
975Given this repository history::
976
977  R1 - R2 - R3 <- master
978       ^
979       |
980    release/1
981
982The above recipe results in a history like this::
983
984  U1 - U2 - U3 <- upstream/master
985   \    \    \
986    \    -----\---------------                                         local/zip--.
987     \         \              \                                                    |
988    - Lllvm1 - Llld1 - UM3 -  Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-'
989                                                                             /
990                                                                 R1 - R2 - R3  <-.
991                                                                      ^           |
992                                                                      |           |
993                                                               myrepo-release/1   |
994                                                                                  |
995                                                                   myrepo/master--'
996
997Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs
998from ``myrepo``.  If you require commits from ``myrepo`` to be
999interleaved with commits on local project branches (for example,
1000interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't
1001appear in an umbrella repository, a new tool will need to be
1002developed.  Creating such a tool would involve:
1003
10041. Modifying ``fast_filter_branch.py`` to optionally take a
1005   revlist directly rather than generating it itself
1006
10072. Creating a tool to generate an interleaved ordering of local
1008   commits based on some criteria (``zip-downstream-fork.py`` uses the
1009   umbrella history as its criterion)
1010
10113. Generating such an ordering and feeding it to
1012   ``fast_filter_branch.py`` as a revlist
1013
1014Some care will also likely need to be taken to handle merge commits,
1015to ensure the parents of such commits migrate correctly.
1016
1017Scrubbing the Local Monorepo
1018----------------------------
1019
1020Once all of the migrating, zipping and importing is done, it's time to
1021clean up.  The python tools use ``git-fast-import`` which leaves a lot
1022of cruft around and we want to shrink our new monorepo mirror as much
1023as possible.  Here is one way to do it::
1024
1025  git -C my-monorepo checkout master
1026
1027  # Delete branches we no longer need.  Do this for any other branches
1028  # you merged above.
1029  git -C my-monorepo branch -D local/zip/master || true
1030  git -C my-monorepo branch -D local/octopus/master || true
1031
1032  # Remove remotes.
1033  git -C my-monorepo remote remove upstream/monorepo
1034
1035  for p in ${my_projects[@]}; do
1036    git -C my-monorepo remote remove upstream/split/${p}
1037    git -C my-monorepo remote remove local/split/${p}
1038  done
1039
1040  git -C my-monorepo remote remove localrepo
1041  git -C my-monorepo remote remove umbrella
1042  git -C my-monorepo remote remove myrepo
1043
1044  # Add anything else here you don't need.  refs/tags/release is
1045  # listed below assuming tags have been rewritten with a local prefix.
1046  # If not, remove it from this list.
1047  refs_to_clean=(
1048    refs/original
1049    refs/remotes
1050    refs/tags/backups
1051    refs/tags/release
1052  )
1053
1054  git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} |
1055    xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d
1056
1057  git -C my-monorepo reflog expire --all --expire=now
1058
1059  # fast_filter_branch.py might have gc running in the background.
1060  while ! git -C my-monorepo \
1061    -c gc.reflogExpire=0 \
1062    -c gc.reflogExpireUnreachable=0 \
1063    -c gc.rerereresolved=0 \
1064    -c gc.rerereunresolved=0 \
1065    -c gc.pruneExpire=now \
1066    gc --prune=now; do
1067    continue
1068  done
1069
1070  # Takes a LOOOONG time!
1071  git -C my-monorepo repack -A -d -f --depth=250 --window=250
1072
1073  git -C my-monorepo prune-packed
1074  git -C my-monorepo prune
1075
1076You should now have a trim monorepo.  Upload it to your git server and
1077happy hacking!
1078
1079References
1080==========
1081
1082.. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
1083.. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
1084.. [JSonnRevNum] Joerg Sonnenberger, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
1085.. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
1086.. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
1087