1============================== 2Moving LLVM Projects to GitHub 3============================== 4 5Current Status 6============== 7 8We are planning to complete the transition to GitHub by Oct 21, 2019. See 9the GitHub migration `status page <https://llvm.org/GitHubMigrationStatus.html>`_ 10for the latest updates and instructions for how to migrate your workflows. 11 12.. contents:: Table of Contents 13 :depth: 4 14 :local: 15 16Introduction 17============ 18 19This is a proposal to move our current revision control system from our own 20hosted Subversion to GitHub. Below are the financial and technical arguments as 21to why we are proposing such a move and how people (and validation 22infrastructure) will continue to work with a Git-based LLVM. 23 24What This Proposal is *Not* About 25================================= 26 27Changing the development policy. 28 29This proposal relates only to moving the hosting of our source-code repository 30from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing 31using GitHub's issue tracker, pull-requests, or code-review. 32 33Contributors will continue to earn commit access on demand under the Developer 34Policy, except that that a GitHub account will be required instead of SVN 35username/password-hash. 36 37Why Git, and Why GitHub? 38======================== 39 40Why Move At All? 41---------------- 42 43This discussion began because we currently host our own Subversion server 44and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and 45provides limited support, but there is only so much it can do. 46 47Volunteers are not sysadmins themselves, but compiler engineers that happen 48to know a thing or two about hosting servers. We also don't have 24/7 support, 49and we sometimes wake up to see that continuous integration is broken because 50the SVN server is either down or unresponsive. 51 52We should take advantage of one of the services out there (GitHub, GitLab, 53and BitBucket, among others) that offer better service (24/7 stability, disk 54space, Git server, code browsing, forking facilities, etc) for free. 55 56Why Git? 57-------- 58 59Many new coders nowadays start with Git, and a lot of people have never used 60SVN, CVS, or anything else. Websites like GitHub have changed the landscape 61of open source contributions, reducing the cost of first contribution and 62fostering collaboration. 63 64Git is also the version control many LLVM developers use. Despite the 65sources being stored in a SVN server, these developers are already using Git 66through the Git-SVN integration. 67 68Git allows you to: 69 70* Commit, squash, merge, and fork locally without touching the remote server. 71* Maintain local branches, enabling multiple threads of development. 72* Collaborate on these branches (e.g. through your own fork of llvm on GitHub). 73* Inspect the repository history (blame, log, bisect) without Internet access. 74* Maintain remote forks and branches on Git hosting services and 75 integrate back to the main repository. 76 77In addition, because Git seems to be replacing many OSS projects' version 78control systems, there are many tools that are built over Git. 79Future tooling may support Git first (if not only). 80 81Why GitHub? 82----------- 83 84GitHub, like GitLab and BitBucket, provides free code hosting for open source 85projects. Any of these could replace the code-hosting infrastructure that we 86have today. 87 88These services also have a dedicated team to monitor, migrate, improve and 89distribute the contents of the repositories depending on region and load. 90 91GitHub has one important advantage over GitLab and 92BitBucket: it offers read-write **SVN** access to the repository 93(https://github.com/blog/626-announcing-svn-support). 94This would enable people to continue working post-migration as though our code 95were still canonically in an SVN repository. 96 97In addition, there are already multiple LLVM mirrors on GitHub, indicating that 98part of our community has already settled there. 99 100On Managing Revision Numbers with Git 101------------------------------------- 102 103The current SVN repository hosts all the LLVM sub-projects alongside each other. 104A single revision number (e.g. r123456) thus identifies a consistent version of 105all LLVM sub-projects. 106 107Git does not use sequential integer revision number but instead uses a hash to 108identify each commit. 109 110The loss of a sequential integer revision number has been a sticking point in 111past discussions about Git: 112 113- "The 'branch' I most care about is mainline, and losing the ability to say 114 'fixed in r1234' (with some sort of monotonically increasing number) would 115 be a tragic loss." [LattnerRevNum]_ 116- "I like those results sorted by time and the chronology should be obvious, but 117 timestamps are incredibly cumbersome and make it difficult to verify that a 118 given checkout matches a given set of results." [TrickRevNum]_ 119- "There is still the major regression with unreadable version numbers. 120 Given the amount of Bugzilla traffic with 'Fixed in...', that's a 121 non-trivial issue." [JSonnRevNum]_ 122- "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_. 123 124However, Git can emulate this increasing revision number: 125``git rev-list --count <commit-hash>``. This identifier is unique only 126within a single branch, but this means the tuple `(num, branch-name)` uniquely 127identifies a commit. 128 129We can thus use this revision number to ensure that e.g. `clang -v` reports a 130user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing 131the objections raised above with respect to this aspect of Git. 132 133What About Branches and Merges? 134------------------------------- 135 136In contrast to SVN, Git makes branching easy. Git's commit history is 137represented as a DAG, a departure from SVN's linear history. However, we propose 138to mandate making merge commits illegal in our canonical Git repository. 139 140Unfortunately, GitHub does not support server side hooks to enforce such a 141policy. We must rely on the community to avoid pushing merge commits. 142 143GitHub offers a feature called `Status Checks`: a branch protected by 144`status checks` requires commits to be explicitly allowed before the push can happen. 145We could supply a pre-push hook on the client side that would run and check the 146history, before allowing the commit being pushed [statuschecks]_. 147However this solution would be somewhat fragile (how do you update a script 148installed on every developer machine?) and prevents SVN access to the 149repository. 150 151What About Commit Emails? 152------------------------- 153 154We will need a new bot to send emails for each commit. This proposal leaves the 155email format unchanged besides the commit URL. 156 157Straw Man Migration Plan 158======================== 159 160Step #1 : Before The Move 161------------------------- 162 1631. Update docs to mention the move, so people are aware of what is going on. 1642. Set up a read-only version of the GitHub project, mirroring our current SVN 165 repository. 1663. Add the required bots to implement the commit emails, as well as the 167 umbrella repository update (if the multirepo is selected) or the read-only 168 Git views for the sub-projects (if the monorepo is selected). 169 170Step #2 : Git Move 171------------------ 172 1734. Update the buildbots to pick up updates and commits from the GitHub 174 repository. Not all bots have to migrate at this point, but it'll help 175 provide infrastructure testing. 1765. Update Phabricator to pick up commits from the GitHub repository. 1776. LNT and llvmlab have to be updated: they rely on unique monotonically 178 increasing integer across branch [MatthewsRevNum]_. 1797. Instruct downstream integrators to pick up commits from the GitHub 180 repository. 1818. Review and prepare an update for the LLVM documentation. 182 183Until this point nothing has changed for developers, it will just 184boil down to a lot of work for buildbot and other infrastructure 185owners. 186 187The migration will pause here until all dependencies have cleared, and all 188problems have been solved. 189 190Step #3: Write Access Move 191-------------------------- 192 1939. Collect developers' GitHub account information, and add them to the project. 19410. Switch the SVN repository to read-only and allow pushes to the GitHub repository. 19511. Update the documentation. 19612. Mirror Git to SVN. 197 198Step #4 : Post Move 199------------------- 200 20113. Archive the SVN repository. 20214. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to 203 point to GitHub instead. 204 205GitHub Repository Description 206============================= 207 208Monorepo 209---------------- 210 211The LLVM git repository hosted at https://github.com/llvm/llvm-project contains all 212sub-projects in a single source tree. It is often referred to as a monorepo and 213mimics an export of the current SVN repository, with each sub-project having its 214own top-level directory. Not all sub-projects are used for building toolchains. 215For example, www/ and test-suite/ are not part of the monorepo. 216 217Putting all sub-projects in a single checkout makes cross-project refactoring 218naturally simple: 219 220 * New sub-projects can be trivially split out for better reuse and/or layering 221 (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a 222 dependency on LLVM). 223 * Changing an API in LLVM and upgrading the sub-projects will always be done in 224 a single commit, designing away a common source of temporary build breakage. 225 * Moving code across sub-project (during refactoring for instance) in a single 226 commit enables accurate `git blame` when tracking code change history. 227 * Tooling based on `git grep` works natively across sub-projects, allowing to 228 easier find refactoring opportunities across projects (for example reusing a 229 datastructure initially in LLDB by moving it into libSupport). 230 * Having all the sources present encourages maintaining the other sub-projects 231 when changing API. 232 233Finally, the monorepo maintains the property of the existing SVN repository that 234the sub-projects move synchronously, and a single revision number (or commit 235hash) identifies the state of the development across all projects. 236 237.. _build_single_project: 238 239Building a single sub-project 240^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 241 242Even though there is a single source tree, you are not required to build 243all sub-projects together. It is trivial to configure builds for a single 244sub-project. 245 246For example:: 247 248 mkdir build && cd build 249 # Configure only LLVM (default) 250 cmake path/to/monorepo 251 # Configure LLVM and lld 252 cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld 253 # Configure LLVM and clang 254 cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang 255 256.. _git-svn-mirror: 257 258Outstanding Questions 259--------------------- 260 261Read-only sub-project mirrors 262^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 263 264With the Monorepo, it is undecided whether the existing single-subproject 265mirrors (e.g. https://git.llvm.org/git/compiler-rt.git) will continue to 266be maintained. 267 268Read/write SVN bridge 269^^^^^^^^^^^^^^^^^^^^^ 270 271GitHub supports a read/write SVN bridge for its repositories. However, 272there have been issues with this bridge working correctly in the past, 273so it's not clear if this is something that will be supported going forward. 274 275Monorepo Drawbacks 276------------------ 277 278 * Using the monolithic repository may add overhead for those contributing to a 279 standalone sub-project, particularly on runtimes like libcxx and compiler-rt 280 that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs. 281 1GB for the monorepo), and the commit rate of LLVM may cause more frequent 282 `git push` collisions when upstreaming. Affected contributors may be able to 283 use the SVN bridge or the single-subproject Git mirrors. However, it's 284 undecided if these projects will continue to be maintained. 285 * Using the monolithic repository may add overhead for those *integrating* a 286 standalone sub-project, even if they aren't contributing to it, due to the 287 same disk space concern as the point above. The availability of the 288 sub-project Git mirrors would addresses this. 289 * Preservation of the existing read/write SVN-based workflows relies on the 290 GitHub SVN bridge, which is an extra dependency. Maintaining this locks us 291 into GitHub and could restrict future workflow changes. 292 293Workflows 294^^^^^^^^^ 295 296 * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`. 297 * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`. 298 * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`. 299 * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`. 300 * :ref:`Bisecting <workflow-mono-bisecting>`. 301 302Workflow Before/After 303===================== 304 305This section goes through a few examples of workflows, intended to illustrate 306how end-users or developers would interact with the repository for 307various use-cases. 308 309.. _workflow-checkout-commit: 310 311Checkout/Clone a Single Project, with Commit Access 312--------------------------------------------------- 313 314Currently 315^^^^^^^^^ 316 317:: 318 319 # direct SVN checkout 320 svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm 321 # or using the read-only Git view, with git-svn 322 git clone https://llvm.org/git/llvm.git 323 cd llvm 324 git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> 325 git config svn-remote.svn.fetch :refs/remotes/origin/master 326 git svn rebase -l # -l avoids fetching ahead of the git mirror. 327 328Commits are performed using `svn commit` or with the sequence `git commit` and 329`git svn dcommit`. 330 331.. _workflow-multicheckout-nocommit: 332 333Monorepo Variant 334^^^^^^^^^^^^^^^^ 335 336With the monorepo variant, there are a few options, depending on your 337constraints. First, you could just clone the full repository: 338 339git clone https://github.com/llvm/llvm-project.git 340 341At this point you have every sub-project (llvm, clang, lld, lldb, ...), which 342:ref:`doesn't imply you have to build all of them <build_single_project>`. You 343can still build only compiler-rt for instance. In this way it's not different 344from someone who would check out all the projects with SVN today. 345 346If you want to avoid checking out all the sources, you can hide the other 347directories using a Git sparse checkout:: 348 349 git config core.sparseCheckout true 350 echo /compiler-rt > .git/info/sparse-checkout 351 git read-tree -mu HEAD 352 353The data for all sub-projects is still in your `.git` directory, but in your 354checkout, you only see `compiler-rt`. 355Before you push, you'll need to fetch and rebase (`git pull --rebase`) as 356usual. 357 358Note that when you fetch you'll likely pull in changes to sub-projects you don't 359care about. If you are using sparse checkout, the files from other projects 360won't appear on your disk. The only effect is that your commit hash changes. 361 362You can check whether the changes in the last fetch are relevant to your commit 363by running:: 364 365 git log origin/master@{1}..origin/master -- libcxx 366 367This command can be hidden in a script so that `git llvmpush` would perform all 368these steps, fail only if such a dependent change exists, and show immediately 369the change that prevented the push. An immediate repeat of the command would 370(almost) certainly result in a successful push. 371Note that today with SVN or git-svn, this step is not possible since the 372"rebase" implicitly happens while committing (unless a conflict occurs). 373 374Checkout/Clone Multiple Projects, with Commit Access 375---------------------------------------------------- 376 377Let's look how to assemble llvm+clang+libcxx at a given revision. 378 379Currently 380^^^^^^^^^ 381 382:: 383 384 svn co https://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION 385 cd llvm/tools 386 svn co https://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION 387 cd ../projects 388 svn co https://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION 389 390Or using git-svn:: 391 392 git clone https://llvm.org/git/llvm.git 393 cd llvm/ 394 git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> 395 git config svn-remote.svn.fetch :refs/remotes/origin/master 396 git svn rebase -l 397 git checkout `git svn find-rev -B r258109` 398 cd tools 399 git clone https://llvm.org/git/clang.git 400 cd clang/ 401 git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username> 402 git config svn-remote.svn.fetch :refs/remotes/origin/master 403 git svn rebase -l 404 git checkout `git svn find-rev -B r258109` 405 cd ../../projects/ 406 git clone https://llvm.org/git/libcxx.git 407 cd libcxx 408 git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username> 409 git config svn-remote.svn.fetch :refs/remotes/origin/master 410 git svn rebase -l 411 git checkout `git svn find-rev -B r258109` 412 413Note that the list would be longer with more sub-projects. 414 415.. _workflow-monocheckout-multicommit: 416 417Monorepo Variant 418^^^^^^^^^^^^^^^^ 419 420The repository contains natively the source for every sub-projects at the right 421revision, which makes this straightforward:: 422 423 git clone https://github.com/llvm/llvm-project.git 424 cd llvm-projects 425 git checkout $REVISION 426 427As before, at this point clang, llvm, and libcxx are stored in directories 428alongside each other. 429 430.. _workflow-cross-repo-commit: 431 432Commit an API Change in LLVM and Update the Sub-projects 433-------------------------------------------------------- 434 435Today this is possible, even though not common (at least not documented) for 436subversion users and for git-svn users. For example, few Git users try to update 437LLD or Clang in the same commit as they change an LLVM API. 438 439The multirepo variant does not address this: one would have to commit and push 440separately in every individual repository. It would be possible to establish a 441protocol whereby users add a special token to their commit messages that causes 442the umbrella repo's updater bot to group all of them into a single revision. 443 444The monorepo variant handles this natively. 445 446Branching/Stashing/Updating for Local Development or Experiments 447---------------------------------------------------------------- 448 449Currently 450^^^^^^^^^ 451 452SVN does not allow this use case, but developers that are currently using 453git-svn can do it. Let's look in practice what it means when dealing with 454multiple sub-projects. 455 456To update the repository to tip of trunk:: 457 458 git pull 459 cd tools/clang 460 git pull 461 cd ../../projects/libcxx 462 git pull 463 464To create a new branch:: 465 466 git checkout -b MyBranch 467 cd tools/clang 468 git checkout -b MyBranch 469 cd ../../projects/libcxx 470 git checkout -b MyBranch 471 472To switch branches:: 473 474 git checkout AnotherBranch 475 cd tools/clang 476 git checkout AnotherBranch 477 cd ../../projects/libcxx 478 git checkout AnotherBranch 479 480.. _workflow-mono-branching: 481 482Monorepo Variant 483^^^^^^^^^^^^^^^^ 484 485Regular Git commands are sufficient, because everything is in a single 486repository: 487 488To update the repository to tip of trunk:: 489 490 git pull 491 492To create a new branch:: 493 494 git checkout -b MyBranch 495 496To switch branches:: 497 498 git checkout AnotherBranch 499 500Bisecting 501--------- 502 503Assuming a developer is looking for a bug in clang (or lld, or lldb, ...). 504 505Currently 506^^^^^^^^^ 507 508SVN does not have builtin bisection support, but the single revision across 509sub-projects makes it possible to script around. 510 511Using the existing Git read-only view of the repositories, it is possible to use 512the native Git bisection script over the llvm repository, and use some scripting 513to synchronize the clang repository to match the llvm revision. 514 515.. _workflow-mono-bisecting: 516 517Monorepo Variant 518^^^^^^^^^^^^^^^^ 519 520Bisecting on the monorepo is straightforward, and very similar to the above, 521except that the bisection script does not need to include the 522`git submodule update` step. 523 524The same example, finding which commit introduces a regression where clang-3.9 525crashes but not clang-3.8 passes, will look like:: 526 527 git bisect start releases/3.9.x releases/3.8.x 528 git bisect run ./bisect_script.sh 529 530With the `bisect_script.sh` script being:: 531 532 #!/bin/sh 533 cd $BUILD_DIR 534 535 ninja clang || exit 125 # an exit code of 125 asks "git bisect" 536 # to "skip" the current commit 537 538 ./bin/clang some_crash_test.cpp 539 540Also, since the monorepo handles commits update across multiple projects, you're 541less like to encounter a build failure where a commit change an API in LLVM and 542another later one "fixes" the build in clang. 543 544Moving Local Branches to the Monorepo 545===================================== 546 547Suppose you have been developing against the existing LLVM git 548mirrors. You have one or more git branches that you want to migrate 549to the "final monorepo". 550 551The simplest way to migrate such branches is with the 552``migrate-downstream-fork.py`` tool at 553https://github.com/jyknight/llvm-git-migration. 554 555Basic migration 556--------------- 557 558Basic instructions for ``migrate-downstream-fork.py`` are in the 559Python script and are expanded on below to a more general recipe:: 560 561 # Make a repository which will become your final local mirror of the 562 # monorepo. 563 mkdir my-monorepo 564 git -C my-monorepo init 565 566 # Add a remote to the monorepo. 567 git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git 568 569 # Add remotes for each git mirror you use, from upstream as well as 570 # your local mirror. All projects are listed here but you need only 571 # import those for which you have local branches. 572 my_projects=( clang 573 clang-tools-extra 574 compiler-rt 575 debuginfo-tests 576 libcxx 577 libcxxabi 578 libunwind 579 lld 580 lldb 581 llvm 582 openmp 583 polly ) 584 for p in ${my_projects[@]}; do 585 git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git 586 git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git 587 done 588 589 # Pull in all the commits. 590 git -C my-monorepo fetch --all 591 592 # Run migrate-downstream-fork to rewrite local branches on top of 593 # the upstream monorepo. 594 ( 595 cd my-monorepo 596 migrate-downstream-fork.py \ 597 refs/remotes/local \ 598 refs/tags \ 599 --new-repo-prefix=refs/remotes/upstream/monorepo \ 600 --old-repo-prefix=refs/remotes/upstream/split \ 601 --source-kind=split \ 602 --revmap-out=monorepo-map.txt 603 ) 604 605 # Octopus-merge the resulting local split histories to unify them. 606 607 # Assumes local work on local split mirrors is on master (and 608 # upstream is presumably represented by some other branch like 609 # upstream/master). 610 my_local_branch="master" 611 612 git -C my-monorepo branch --no-track local/octopus/master \ 613 $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \ 614 refs/remotes/local/split/llvm/${my_local_branch}) 615 git -C my-monorepo checkout local/octopus/${my_local_branch} 616 617 subproject_branches=() 618 for p in ${my_projects[@]}; do 619 subproject_branch=${p}/local/monorepo/${my_local_branch} 620 git -C my-monorepo branch ${subproject_branch} \ 621 refs/remotes/local/split/${p}/${my_local_branch} 622 if [[ "${p}" != "llvm" ]]; then 623 subproject_branches+=( ${subproject_branch} ) 624 fi 625 done 626 627 git -C my-monorepo merge ${subproject_branches[@]} 628 629 for p in ${my_projects[@]}; do 630 subproject_branch=${p}/local/monorepo/${my_local_branch} 631 git -C my-monorepo branch -d ${subproject_branch} 632 done 633 634 # Create local branches for upstream monorepo branches. 635 for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \ 636 refs/remotes/upstream/monorepo); do 637 upstream_branch=${ref#refs/remotes/upstream/monorepo/} 638 git -C my-monorepo branch upstream/${upstream_branch} ${ref} 639 done 640 641The above gets you to a state like the following:: 642 643 U1 - U2 - U3 <- upstream/master 644 \ \ \ 645 \ \ - Llld1 - Llld2 - 646 \ \ \ 647 \ - Lclang1 - Lclang2-- Lmerge <- local/octopus/master 648 \ / 649 - Lllvm1 - Lllvm2----- 650 651Each branched component has its branch rewritten on top of the 652monorepo and all components are unified by a giant octopus merge. 653 654If additional active local branches need to be preserved, the above 655operations following the assignment to ``my_local_branch`` should be 656done for each branch. Ref paths will need to be updated to map the 657local branch to the corresponding upstream branch. If local branches 658have no corresponding upstream branch, then the creation of 659``local/octopus/<local branch>`` need not use ``git-merge-base`` to 660pinpoint its root commit; it may simply be branched from the 661appropriate component branch (say, ``llvm/local_release_X``). 662 663Zipping local history 664--------------------- 665 666The octopus merge is suboptimal for many cases, because walking back 667through the history of one component leaves the other components fixed 668at a history that likely makes things unbuildable. 669 670Some downstream users track the order commits were made to subprojects 671with some kind of "umbrella" project that imports the project git 672mirrors as submodules, similar to the multirepo umbrella proposed 673above. Such an umbrella repository looks something like this:: 674 675 UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master 676 | | | | | | | 677 Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 Lmyproj1 678 679The vertical bars represent submodule updates to a particular local 680commit in the project mirror. ``UM3`` in this case is a commit of 681some local umbrella repository state that is not a submodule update, 682perhaps a ``README`` or project build script update. Commit ``UM8`` 683updates a submodule of local project ``myproj``. 684 685The tool ``zip-downstream-fork.py`` at 686https://github.com/greened/llvm-git-migration/tree/zip can be used to 687convert the umbrella history into a monorepo-based history with 688commits in the order implied by submodule updates:: 689 690 U1 - U2 - U3 <- upstream/master 691 \ \ \ 692 \ -----\--------------- local/zip--. 693 \ \ \ | 694 - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-' 695 696 697The ``U*`` commits represent upstream commits to the monorepo master 698branch. Each submodule update in the local ``UM*`` commits brought in 699a subproject tree at some local commit. The trees in the ``L*1`` 700commits represent merges from upstream. These result in edges from 701the ``U*`` commits to their corresponding rewritten ``L*1`` commits. 702The ``L*2`` commits did not do any merges from upstream. 703 704Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but 705if, say, ``U3`` changed some files in upstream clang, the ``Lclang1`` 706commit appearing after the ``Llld1`` commit would actually represent a 707clang tree *earlier* in the upstream clang history. We want the 708``local/zip`` branch to accurately represent the state of our umbrella 709history and so the edge ``U2 -> Lclang1`` is a visual reminder of what 710clang's tree actually looks like in ``Lclang1``. 711 712Even so, the edge ``U3 -> Llld1`` could be problematic for future 713merges from upstream. git will think that we've already merged from 714``U3``, and we have, except for the state of the clang tree. One 715possible mitigation strategy is to manually diff clang between ``U2`` 716and ``U3`` and apply those updates to ``local/zip``. Another, 717possibly simpler strategy is to freeze local work on downstream 718branches and merge all submodules from the latest upstream before 719running ``zip-downstream-fork.py``. If downstream merged each project 720from upstream in lockstep without any intervening local commits, then 721things should be fine without any special action. We anticipate this 722to be the common case. 723 724The tree for ``Lclang1`` outside of clang will represent the state of 725things at ``U3`` since all of the upstream projects not participating 726in the umbrella history should be in a state respecting the commit 727``U3``. The trees for llvm and lld should correctly represent commits 728``Lllvm1`` and ``Llld1``, respectively. 729 730Commit ``UM3`` changed files not related to submodules and we need 731somewhere to put them. It is not safe in general to put them in the 732monorepo root directory because they may conflict with files in the 733monorepo. Let's assume we want them in a directory ``local`` in the 734monorepo. 735 736**Example 1: Umbrella looks like the monorepo** 737 738For this example, we'll assume that each subproject appears in its own 739top-level directory in the umbrella, just as they do in the monorepo . 740Let's also assume that we want the files in directory ``myproj`` to 741appear in ``local/myproj``. 742 743Given the above run of ``migrate-downstream-fork.py``, a recipe to 744create the zipped history is below:: 745 746 # Import any non-LLVM repositories the umbrella references. 747 git -C my-monorepo remote add localrepo \ 748 https://my.local.mirror.org/localrepo.git 749 git fetch localrepo 750 751 subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc 752 libcxx libcxxabi libunwind lld lldb llgo llvm openmp 753 parallel-libs polly pstl ) 754 755 # Import histories for upstream split projects (this was probably 756 # already done for the ``migrate-downstream-fork.py`` run). 757 for project in ${subprojects[@]}; do 758 git remote add upstream/split/${project} \ 759 https://github.com/llvm-mirror/${subproject}.git 760 git fetch umbrella/split/${project} 761 done 762 763 # Import histories for downstream split projects (this was probably 764 # already done for the ``migrate-downstream-fork.py`` run). 765 for project in ${subprojects[@]}; do 766 git remote add local/split/${project} \ 767 https://my.local.mirror.org/${subproject}.git 768 git fetch local/split/${project} 769 done 770 771 # Import umbrella history. 772 git -C my-monorepo remote add umbrella \ 773 https://my.local.mirror.org/umbrella.git 774 git fetch umbrella 775 776 # Put myproj in local/myproj 777 echo "myproj local/myproj" > my-monorepo/submodule-map.txt 778 779 # Rewrite history 780 ( 781 cd my-monorepo 782 zip-downstream-fork.py \ 783 refs/remotes/umbrella \ 784 --new-repo-prefix=refs/remotes/upstream/monorepo \ 785 --old-repo-prefix=refs/remotes/upstream/split \ 786 --revmap-in=monorepo-map.txt \ 787 --revmap-out=zip-map.txt \ 788 --subdir=local \ 789 --submodule-map=submodule-map.txt \ 790 --update-tags 791 ) 792 793 # Create the zip branch (assuming umbrella master is wanted). 794 git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master 795 796Note that if the umbrella has submodules to non-LLVM repositories, 797``zip-downstream-fork.py`` needs to know about them to be able to 798rewrite commits. That is why the first step above is to fetch commits 799from such repositories. 800 801With ``--update-tags`` the tool will migrate annotated tags pointing 802to submodule commits that were inlined into the zipped history. If 803the umbrella pulled in an upstream commit that happened to have a tag 804pointing to it, that tag will be migrated, which is almost certainly 805not what is wanted. The tag can always be moved back to its original 806commit after rewriting, or the ``--update-tags`` option may be 807discarded and any local tags would then be migrated manually. 808 809**Example 2: Nested sources layout** 810 811The tool handles nested submodules (e.g. llvm is a submodule in 812umbrella and clang is a submodule in llvm). The file 813``submodule-map.txt`` is a list of pairs, one per line. The first 814pair item describes the path to a submodule in the umbrella 815repository. The second pair item describes the path where trees for 816that submodule should be written in the zipped history. 817 818Let's say your umbrella repository is actually the llvm repository and 819it has submodules in the "nested sources" layout (clang in 820tools/clang, etc.). Let's also say ``projects/myproj`` is a submodule 821pointing to some downstream repository. The submodule map file should 822look like this (we still want myproj mapped the same way as 823previously):: 824 825 tools/clang clang 826 tools/clang/tools/extra clang-tools-extra 827 projects/compiler-rt compiler-rt 828 projects/debuginfo-tests debuginfo-tests 829 projects/libclc libclc 830 projects/libcxx libcxx 831 projects/libcxxabi libcxxabi 832 projects/libunwind libunwind 833 tools/lld lld 834 tools/lldb lldb 835 projects/openmp openmp 836 tools/polly polly 837 projects/myproj local/myproj 838 839If a submodule path does not appear in the map, the tools assumes it 840should be placed in the same place in the monorepo. That means if you 841use the "nested sources" layout in your umrella, you *must* provide 842map entries for all of the projects in your umbrella (except llvm). 843Otherwise trees from submodule updates will appear underneath llvm in 844the zippped history. 845 846Because llvm is itself the umbrella, we use --subdir to write its 847content into ``llvm`` in the zippped history:: 848 849 # Import any non-LLVM repositories the umbrella references. 850 git -C my-monorepo remote add localrepo \ 851 https://my.local.mirror.org/localrepo.git 852 git fetch localrepo 853 854 subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc 855 libcxx libcxxabi libunwind lld lldb llgo llvm openmp 856 parallel-libs polly pstl ) 857 858 # Import histories for upstream split projects (this was probably 859 # already done for the ``migrate-downstream-fork.py`` run). 860 for project in ${subprojects[@]}; do 861 git remote add upstream/split/${project} \ 862 https://github.com/llvm-mirror/${subproject}.git 863 git fetch umbrella/split/${project} 864 done 865 866 # Import histories for downstream split projects (this was probably 867 # already done for the ``migrate-downstream-fork.py`` run). 868 for project in ${subprojects[@]}; do 869 git remote add local/split/${project} \ 870 https://my.local.mirror.org/${subproject}.git 871 git fetch local/split/${project} 872 done 873 874 # Import umbrella history. We want this under a different refspec 875 # so zip-downstream-fork.py knows what it is. 876 git -C my-monorepo remote add umbrella \ 877 https://my.local.mirror.org/llvm.git 878 git fetch umbrella 879 880 # Create the submodule map. 881 echo "tools/clang clang" > my-monorepo/submodule-map.txt 882 echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt 883 echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt 884 echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt 885 echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt 886 echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt 887 echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt 888 echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt 889 echo "tools/lld lld" >> my-monorepo/submodule-map.txt 890 echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt 891 echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt 892 echo "tools/polly polly" >> my-monorepo/submodule-map.txt 893 echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt 894 895 # Rewrite history 896 ( 897 cd my-monorepo 898 zip-downstream-fork.py \ 899 refs/remotes/umbrella \ 900 --new-repo-prefix=refs/remotes/upstream/monorepo \ 901 --old-repo-prefix=refs/remotes/upstream/split \ 902 --revmap-in=monorepo-map.txt \ 903 --revmap-out=zip-map.txt \ 904 --subdir=llvm \ 905 --submodule-map=submodule-map.txt \ 906 --update-tags 907 ) 908 909 # Create the zip branch (assuming umbrella master is wanted). 910 git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master 911 912 913Comments at the top of ``zip-downstream-fork.py`` describe in more 914detail how the tool works and various implications of its operation. 915 916Importing local repositories 917---------------------------- 918 919You may have additional repositories that integrate with the LLVM 920ecosystem, essentially extending it with new tools. If such 921repositories are tightly coupled with LLVM, it may make sense to 922import them into your local mirror of the monorepo. 923 924If such repositories participated in the umbrella repository used 925during the zipping process above, they will automatically be added to 926the monorepo. For downstream repositories that don't participate in 927an umbrella setup, the ``import-downstream-repo.py`` tool at 928https://github.com/greened/llvm-git-migration/tree/import can help with 929getting them into the monorepo. A recipe follows:: 930 931 # Import downstream repo history into the monorepo. 932 git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git 933 git fetch myrepo 934 935 my_local_tags=( refs/tags/release 936 refs/tags/hotfix ) 937 938 ( 939 cd my-monorepo 940 import-downstream-repo.py \ 941 refs/remotes/myrepo \ 942 ${my_local_tags[@]} \ 943 --new-repo-prefix=refs/remotes/upstream/monorepo \ 944 --subdir=myrepo \ 945 --tag-prefix="myrepo-" 946 ) 947 948 # Preserve release branches. 949 for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \ 950 refs/remotes/myrepo/release); do 951 branch=${ref#refs/remotes/myrepo/} 952 git -C my-monorepo branch --no-track myrepo/${branch} ${ref} 953 done 954 955 # Preserve master. 956 git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master 957 958 # Merge master. 959 git -C my-monorepo checkout local/zip/master # Or local/octopus/master 960 git -C my-monorepo merge myrepo/master 961 962You may want to merge other corresponding branches, for example 963``myrepo`` release branches if they were in lockstep with LLVM project 964releases. 965 966``--tag-prefix`` tells ``import-downstream-repo.py`` to rename 967annotated tags with the given prefix. Due to limitations with 968``fast_filter_branch.py``, unannotated tags cannot be renamed 969(``fast_filter_branch.py`` considers them branches, not tags). Since 970the upstream monorepo had its tags rewritten with an "llvmorg-" 971prefix, name conflicts should not be an issue. ``--tag-prefix`` can 972be used to more clearly indicate which tags correspond to various 973imported repositories. 974 975Given this repository history:: 976 977 R1 - R2 - R3 <- master 978 ^ 979 | 980 release/1 981 982The above recipe results in a history like this:: 983 984 U1 - U2 - U3 <- upstream/master 985 \ \ \ 986 \ -----\--------------- local/zip--. 987 \ \ \ | 988 - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-' 989 / 990 R1 - R2 - R3 <-. 991 ^ | 992 | | 993 myrepo-release/1 | 994 | 995 myrepo/master--' 996 997Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs 998from ``myrepo``. If you require commits from ``myrepo`` to be 999interleaved with commits on local project branches (for example, 1000interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't 1001appear in an umbrella repository, a new tool will need to be 1002developed. Creating such a tool would involve: 1003 10041. Modifying ``fast_filter_branch.py`` to optionally take a 1005 revlist directly rather than generating it itself 1006 10072. Creating a tool to generate an interleaved ordering of local 1008 commits based on some criteria (``zip-downstream-fork.py`` uses the 1009 umbrella history as its criterion) 1010 10113. Generating such an ordering and feeding it to 1012 ``fast_filter_branch.py`` as a revlist 1013 1014Some care will also likely need to be taken to handle merge commits, 1015to ensure the parents of such commits migrate correctly. 1016 1017Scrubbing the Local Monorepo 1018---------------------------- 1019 1020Once all of the migrating, zipping and importing is done, it's time to 1021clean up. The python tools use ``git-fast-import`` which leaves a lot 1022of cruft around and we want to shrink our new monorepo mirror as much 1023as possible. Here is one way to do it:: 1024 1025 git -C my-monorepo checkout master 1026 1027 # Delete branches we no longer need. Do this for any other branches 1028 # you merged above. 1029 git -C my-monorepo branch -D local/zip/master || true 1030 git -C my-monorepo branch -D local/octopus/master || true 1031 1032 # Remove remotes. 1033 git -C my-monorepo remote remove upstream/monorepo 1034 1035 for p in ${my_projects[@]}; do 1036 git -C my-monorepo remote remove upstream/split/${p} 1037 git -C my-monorepo remote remove local/split/${p} 1038 done 1039 1040 git -C my-monorepo remote remove localrepo 1041 git -C my-monorepo remote remove umbrella 1042 git -C my-monorepo remote remove myrepo 1043 1044 # Add anything else here you don't need. refs/tags/release is 1045 # listed below assuming tags have been rewritten with a local prefix. 1046 # If not, remove it from this list. 1047 refs_to_clean=( 1048 refs/original 1049 refs/remotes 1050 refs/tags/backups 1051 refs/tags/release 1052 ) 1053 1054 git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} | 1055 xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d 1056 1057 git -C my-monorepo reflog expire --all --expire=now 1058 1059 # fast_filter_branch.py might have gc running in the background. 1060 while ! git -C my-monorepo \ 1061 -c gc.reflogExpire=0 \ 1062 -c gc.reflogExpireUnreachable=0 \ 1063 -c gc.rerereresolved=0 \ 1064 -c gc.rerereunresolved=0 \ 1065 -c gc.pruneExpire=now \ 1066 gc --prune=now; do 1067 continue 1068 done 1069 1070 # Takes a LOOOONG time! 1071 git -C my-monorepo repack -A -d -f --depth=250 --window=250 1072 1073 git -C my-monorepo prune-packed 1074 git -C my-monorepo prune 1075 1076You should now have a trim monorepo. Upload it to your git server and 1077happy hacking! 1078 1079References 1080========== 1081 1082.. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html 1083.. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html 1084.. [JSonnRevNum] Joerg Sonnenberger, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html 1085.. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html 1086.. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/ 1087