1===================================================================
2How To Add Your Build Configuration To LLVM Buildbot Infrastructure
3===================================================================
4
5Introduction
6============
7
8This document contains information about adding a build configuration and
9buildbot-worker to private worker builder to LLVM Buildbot Infrastructure.
10
11Buildmasters
12============
13
14There are two buildmasters running.
15
16* The main buildmaster at `<https://lab.llvm.org/buildbot>`_. All builders
17  attached to this machine will notify commit authors every time they break
18  the build.
19* The staging buildmaster at `<https://lab.llvm.org/staging>`_. All builders
20  attached to this machine will be completely silent by default when the build
21  is broken.
22
23In order to remain connected to the main buildmaster (and thus notify
24developers of failures), a builbot must:
25
26* Be building a supported configuration.  Builders for experimental backends
27  should generally be attached to staging buildmaster.
28* Be able to keep up with new commits to the main branch, or at a minimum
29  recover to tip of tree within a couple of days of falling behind.
30
31Additionally, we encourage all bot owners to point their bots towards the
32staging master during maintenance windows, instability troubleshooting, and
33such.
34
35Roles & Expectations
36====================
37
38Each buildbot has an owner who is the responsible party for addressing problems
39which arise with said buildbot.  We generally expect the bot owner to be
40reasonably responsive.
41
42For some bots, the ownership responsibility is split between a "resource owner"
43who provides the underlying machine resource, and a "configuration owner" who
44maintains the build configuration.  Generally, operational responsibility lies
45with the "config owner".  We do expect "resource owners" - who are generally
46the contact listed in a workers attributes - to proxy requests to the relevant
47"config owner" in a timely manner.
48
49Most issues with a buildbot should be addressed directly with a bot owner
50via email.  Please CC `Galina Kistanova <mailto:gkistanova@gmail.com>`_.
51
52Steps To Add Builder To LLVM Buildbot
53=====================================
54Volunteers can provide their build machines to work as build workers to
55public LLVM Buildbot.
56
57Here are the steps you can follow to do so:
58
59#. Check the existing build configurations to make sure the one you are
60   interested in is not covered yet or gets built on your computer much
61   faster than on the existing one. We prefer faster builds so developers
62   will get feedback sooner after changes get committed.
63
64#. The computer you will be registering with the LLVM buildbot
65   infrastructure should have all dependencies installed and you can
66   actually build your configuration successfully. Please check what degree
67   of parallelism (-j param) would give the fastest build.  You can build
68   multiple configurations on one computer.
69
70#. Install buildbot-worker (currently we are using buildbot version 2.8.5).
71   Depending on the platform, buildbot-worker could be available to download and
72   install with your package manager, or you can download it directly from
73   `<http://trac.buildbot.net>`_ and install it manually.
74
75#. Create a designated user account, your buildbot-worker will be running under,
76   and set appropriate permissions.
77
78#. Choose the buildbot-worker root directory (all builds will be placed under
79   it), buildbot-worker access name and password the build master will be using
80   to authenticate your buildbot-worker.
81
82#. Create a buildbot-worker in context of that buildbot-worker account. Point it
83   to the **lab.llvm.org** port **9994** (see `Buildbot documentation,
84   Creating a worker
85   <http://docs.buildbot.net/current/tutorial/firstrun.html#creating-a-worker>`_
86   for more details) by running the following command:
87
88    .. code-block:: bash
89
90       $ buildbot-worker create-worker <buildbot-worker-root-directory> \
91                    lab.llvm.org:9994 \
92                    <buildbot-worker-access-name> \
93                    <buildbot-worker-access-password>
94
95   This will cause your new worker to connect to the staging buildmaster
96   which is silent by default.  Only once a new worker is stable, and
97   approval from Galina has been received (see last step) should it
98   be pointed at the main buildmaster.
99
100#. Fill the buildbot-worker description and admin name/e-mail.  Here is an
101   example of the buildbot-worker description::
102
103       Windows 7 x64
104       Core i7 (2.66GHz), 16GB of RAM
105
106       g++.exe (TDM-1 mingw32) 4.4.0
107       GNU Binutils 2.19.1
108       cmake version 2.8.4
109       Microsoft(R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
110
111#. Make sure you can actually start the buildbot-worker successfully. Then set
112   up your buildbot-worker to start automatically at the start up time.  See the
113   buildbot documentation for help.  You may want to restart your computer
114   to see if it works.
115
116#. Send a patch which adds your build worker and your builder to
117   `zorg <https://github.com/llvm/llvm-zorg>`_. Use the typical LLVM
118   `workflow <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
119
120   * workers are added to ``buildbot/osuosl/master/config/workers.py``
121   * builders are added to ``buildbot/osuosl/master/config/builders.py``
122
123   Please make sure your builder name and its builddir are unique through the
124   file.
125
126   All new builders should default to using the "'collapseRequests': False"
127   configuration.  This causes the builder to build each commit individually
128   and not merge build requests.  To maximize quality of feedback to developers,
129   we *strongly prefer* builders to be configured not to collapse requests.
130   This flag should be removed only after all reasonable efforts have been
131   exhausted to improve build times such that the builder can keep up with
132   commit flow.
133
134   It is possible to allow email addresses to unconditionally receive
135   notifications on build failure; for this you'll need to add an
136   ``InformativeMailNotifier`` to ``buildbot/osuosl/master/config/status.py``.
137   This is particularly useful for the staging buildmaster which is silent
138   otherwise.
139
140#. Send the buildbot-worker access name and the access password directly to
141   `Galina Kistanova <mailto:gkistanova@gmail.com>`_, and wait till she
142   will let you know that your changes are applied and buildmaster is
143   reconfigured.
144
145#. Check the status of your buildbot-worker on the `Waterfall Display (Staging)
146   <http://lab.llvm.org/staging/#/waterfall>`_ to make sure it is
147   connected, and the `Workers Display (Staging)
148   <http://lab.llvm.org/staging/#/workers>`_ to see if administrator
149   contact and worker information are correct.
150
151#. At this point, you have a working builder connected to the staging
152   buildmaster.  You can now make sure it is reliably green and keeps
153   up with the build queue.  No notifications will be sent, so you can
154   keep an unstable builder connected to staging indefinitely.
155
156#. (Optional) Once the builder is stable on the staging buildmaster with
157   several days of green history, you can chose to move it to the production
158   buildmaster to enable developer notifications.  Please email `Galina
159   Kistanova <mailto:gkistanova@gmail.com>`_ for review and approval.
160
161   To move a worker to production (once approved), stop your worker, edit the
162   buildbot.tac file to change the port number from 9994 to 9990 and start it
163   again.
164
165Best Practices for Configuring a Fast Builder
166=============================================
167
168As mentioned above, we generally have a strong preference for
169builders which can build every commit as they come in.  This section
170includes best practices and some recommendations as to how to achieve
171that end.
172
173The goal
174  In 2020, the monorepo had just under 35 thousand commits.  This works
175  out to an average of 4 commits per hour.  Already, we can see that a
176  builder must cycle in less than 15 minutes to have a hope of being
177  useful.  However, those commits are not uniformly distributed.  They
178  tend to cluster strongly during US working hours.  Looking at a couple
179  of recent (Nov 2021) working days, we routinely see ~10 commits per
180  hour during peek times, with occasional spikes as high as ~15 commits
181  per hour.  Thus, as a rule of thumb, we should plan for our builder to
182  complete ~10-15 builds an hour.
183
184Resource Appropriately
185  At 10-15 builds per hour, we need to complete a new build on average every
186  4 to 6 minutes.  For anything except the fastest of hardware/build configs,
187  this is going to be well beyond the ability of a single machine.  In buildbot
188  terms, we likely going to need multiple workers to build requests in parallel
189  under a single builder configuration.  For some rough back of the envelope
190  numbers, if your build config takes e.g. 30 minutes, you will need something
191  on the order of 5-8 workers.  If your build config takes ~2 hours, you'll
192  need something on the order of 20-30 workers.  The rest of this section
193  focuses on how to reduce cycle times.
194
195Restrict what you build and test
196  Think hard about why you're setting up a bot, and restrict your build
197  configuration as much as you can.  Basic functionality is probably
198  already covered by other bots, and you don't need to duplicate that
199  testing.  You only need to be building and testing the *unique* parts
200  of the configuration.  (e.g. For a multi-stage clang builder, you probably
201  don't need to be enabling every target or building all the various utilities.)
202
203  It can sometimes be worthwhile splitting a single builder into two or more,
204  if you have multiple distinct purposes for the same builder.  As an example,
205  if you want to both a) confirm that all of LLVM builds with your host
206  compiler, and b) want to do a multi-stage clang build on your target, you
207  may be better off with two separate bots.  Splitting increases resource
208  consumption, but makes it easy for each bot to keep up with commit flow.
209  Additionally, splitting bots may assist in triage by narrowing attention to
210  relevant parts of the failing configuration.
211
212  In general, we recommend Release build types with Assertions enabled.  This
213  generally provides a good balance between build times and bug detection for
214  most buildbots.  There may be room for including some debug info (e.g. with
215  `-gmlt`), but in general the balance between debug info quality and build
216  times is a delicate one.
217
218Use Ninja & LLD
219  Ninja really does help build times over Make, particularly for highly
220  parallel builds.  LLD helps to reduce both link times and memory usage
221  during linking significantly.  With a build machine with sufficient
222  parallism, link times tend to dominate critical path of the build, and are
223  thus worth optimizing.
224
225Use CCache and NOT incremental builds
226  Using ccache materially improves average build times.  Incremental builds
227  can be slightly faster, but introduce the risk of build corruption due to
228  e.g. state changes, etc...  At this point, the recommendation is not to
229  use incremental builds and instead use ccache as the latter captures the
230  majority of the benefit with less risk of false positives.
231
232  One of the non-obvious benefits of using ccache is that it makes the
233  builder less sensitive to which projects are being monitored vs built.
234  If a change triggers a build request, but doesn't change the build output
235  (e.g. doc changes, python utility changes, etc..), the build will entirely
236  hit in cache and the build request will complete in just the testing time.
237
238  With multiple workers, it is tempting to try to configure a shared cache
239  between the workers.  Experience to date indicates this is difficult to
240  well, and that having local per-worker caches gets most of the benefit
241  anyways.  We don't currently recommend shared caches.
242
243  CCache does depend on the builder hardware having sufficient IO to access
244  the cache with reasonable access times - i.e. a fast disk, or enough memory
245  for a RAM cache, etc..  For builders without, incremental may be your best
246  option, but is likely to require higher ongoing involvement from the
247  sponsor.
248
249Enable batch builds
250  As a last resort, you can configure your builder to batch build requests.
251  This makes the build failure notifications markedly less actionable, and
252  should only be done once all other reasonable measures have been taken.
253
254Leave it on the staging buildmaster
255  While most of this section has been biased towards builders intended for
256  the main buildmaster, it is worth highlighting that builders can run
257  indefinitely on the staging buildmaster.  Such a builder may still be
258  useful for the sponsoring organization, without concern of negatively
259  impacting the broader community.  The sponsoring organization simply
260  has to take on the responsibility of all bisection and triage.
261
262
263