1.. Licensed to the Apache Software Foundation (ASF) under one
2.. or more contributor license agreements.  See the NOTICE file
3.. distributed with this work for additional information
4.. regarding copyright ownership.  The ASF licenses this file
5.. to you under the Apache License, Version 2.0 (the
6.. "License"); you may not use this file except in compliance
7.. with the License.  You may obtain a copy of the License at
8
9..   http://www.apache.org/licenses/LICENSE-2.0
10
11.. Unless required by applicable law or agreed to in writing,
12.. software distributed under the License is distributed on an
13.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14.. KIND, either express or implied.  See the License for the
15.. specific language governing permissions and limitations
16.. under the License.
17
18Packaging and Testing with Crossbow
19===================================
20
21The content of ``arrow/dev/tasks`` directory aims for automating the process of
22Arrow packaging and integration testing.
23
24Packages:
25  - C++ and Python `conda-forge packages`_ for Linux, Mac and Windows
26  - Python `Wheels`_ for Linux, Mac and Windows
27  - C++ and GLib `Linux packages`_ for multiple distributions
28  - Java for Gandiva
29
30Integration tests:
31  - Various docker tests
32  - Pandas
33  - Dask
34  - Turbodbc
35  - HDFS
36  - Spark
37
38Architecture
39------------
40
41Executors
42~~~~~~~~~
43
44Individual jobs are executed on public CI services, currently:
45
46- Linux: TravisCI, CircleCI, Azure Pipelines
47- Mac: TravisCI, Azure Pipelines
48- Windows: AppVeyor, Azure Pipelines
49
50Queue
51~~~~~
52
53Because of the nature of how the CI services work, the scheduling of
54jobs happens through an additional git repository, which acts like a job
55queue for the tasks. Anyone can host a ``queue`` repository which is usually
56called as ``crossbow``.
57
58A job is a git commit on a particular git branch, containing only the required
59configuration file to run the requested build (like ``.travis.yml``,
60``appveyor.yml`` or ``azure-pipelines.yml``).
61
62Scheduler
63~~~~~~~~~
64
65`Crossbow.py`_ handles version generation, task rendering and
66submission. The tasks are defined in ``tasks.yml``.
67
68Install
69-------
70
71   The following guide depends on GitHub, but theoretically any git
72   server can be used.
73
741. `Create the queue repository`_
75
762. Enable `TravisCI`_, `Appveyor`_, `Azure Pipelines_` and `CircleCI`_
77   integrations on for the newly created queue repository.
78
79   -  turn off Travis’ `auto cancellation`_ feature on branches
80
813. Clone the newly created repository next to the arrow repository:
82
83   By default the scripts looks for ``crossbow`` next to arrow repository, but
84   this can configured through command line arguments.
85
86   .. code:: bash
87
88      git clone https://github.com/<user>/crossbow crossbow
89
90   **Important note:** Crossbow only supports GitHub token based
91   authentication. Although it overwrites the repository urls provided with ssh
92   protocol, it's advisable to use the HTTPS repository URLs.
93
944. `Create a Personal Access Token`_ with ``repo`` permissions (other
95   permissions are not needed)
96
975. Locally export the token as an environment variable:
98
99   .. code:: bash
100
101      export CROSSBOW_GITHUB_TOKEN=<token>
102
103   ..
104
105      or pass as an argument to the CLI script ``--github-token``
106
1076. Export the previously created GitHub token on both CI services:
108
109   Use ``CROSSBOW_GITHUB_TOKEN`` encrypted environment variable. You can
110   set them at the following URLs, where ``ghuser`` is the GitHub
111   username and ``ghrepo`` is the GitHub repository name (typically
112   ``crossbow``):
113
114   -  TravisCI: ``https://travis-ci.org/<ghuser>/<ghrepo>/settings``
115   -  Appveyor:
116      ``https://ci.appveyor.com/project/<ghuser>/<ghrepo>/settings/environment``
117   -  CircleCI:
118      ``https://circleci.com/gh/<ghuser>/<ghrepo>/edit#env-vars``
119
120   On Appveyor check the ``skip branches without appveyor.yml`` checkbox
121   on the web UI under crossbow repository’s settings.
122
1237. Install Python (minimum supported version is 3.6):
124
125   Miniconda is preferred, see installation instructions:
126   https://conda.io/docs/user-guide/install/index.html
127
1288. Install the python dependencies for the script:
129
130   .. code:: bash
131
132      conda install -c conda-forge -y --file arrow/ci/conda_env_crossbow.txt
133
134   .. code:: bash
135
136      # pygit2 requires libgit2: http://www.pygit2.org/install.html
137      pip install \
138          jinja2 \
139          pygit2 \
140          click \
141          ruamel.yaml \
142          setuptools_scm \
143          github3.py \
144          toolz \
145          jira
146
1479. Try running it:
148
149   .. code:: bash
150
151      $ python crossbow.py --help
152
153Usage
154-----
155
156The script does the following:
157
1581. Detects the current repository, thus supports forks. The following
159   snippet will build kszucs’s fork instead of the upstream apache/arrow
160   repository.
161
162   .. code:: bash
163
164      $ git clone https://github.com/kszucs/arrow
165      $ git clone https://github.com/kszucs/crossbow
166
167      $ cd arrow/dev/tasks
168      $ python crossbow.py submit --help  # show the available options
169      $ python crossbow.py submit conda-win conda-linux conda-osx
170
1712. Gets the HEAD commit of the currently checked out branch and
172   generates the version number based on `setuptools_scm`_. So to build
173   a particular branch check out before running the script:
174
175   .. code:: bash
176
177      git checkout ARROW-<ticket number>
178      python dev/tasks/crossbow.py submit --dry-run conda-linux conda-osx
179
180   ..
181
182      Note that the arrow branch must be pushed beforehand, because the
183      script will clone the selected branch.
184
1853. Reads and renders the required build configurations with the
186   parameters substituted.
187
1884. Create a branch per task, prefixed with the job id. For example to
189   build conda recipes on linux it will create a new branch:
190   ``crossbow@build-<id>-conda-linux``.
191
1925. Pushes the modified branches to GitHub which triggers the builds. For
193   authentication it uses GitHub OAuth tokens described in the install
194   section.
195
196Query the build status
197~~~~~~~~~~~~~~~~~~~~~~
198
199Build id (which has a corresponding branch in the queue repository) is returned
200by the ``submit`` command.
201
202.. code:: bash
203
204   python crossbow.py status <build id / branch name>
205
206Download the build artifacts
207~~~~~~~~~~~~~~~~~~~~~~~~~~~~
208
209.. code:: bash
210
211   python crossbow.py artifacts <build id / branch name>
212
213Examples
214~~~~~~~~
215
216Submit command accepts a list of task names and/or a list of task-group names
217to select which tasks to build.
218
219Run multiple builds:
220
221.. code:: bash
222
223   $ python crossbow.py submit debian-stretch conda-linux-gcc-py37
224   Repository: https://github.com/kszucs/arrow@tasks
225   Commit SHA: 810a718836bb3a8cefc053055600bdcc440e6702
226   Version: 0.9.1.dev48+g810a7188.d20180414
227   Pushed branches:
228    - debian-stretch
229    - conda-linux-gcc-py37
230
231Just render without applying or committing the changes:
232
233.. code:: bash
234
235   $ python crossbow.py submit --dry-run task_name
236
237Run only ``conda`` package builds and a Linux one:
238
239.. code:: bash
240
241   $ python crossbow.py submit --group conda centos-7
242
243Run ``wheel`` builds:
244
245.. code:: bash
246
247   $ python crossbow.py submit --group wheel
248
249There are multiple task groups in the ``tasks.yml`` like docker, integration
250and cpp-python for running docker based tests.
251
252``python crossbow.py submit`` supports multiple options and arguments, for more
253see its help page:
254
255.. code:: bash
256
257  $ python crossbow.py submit --help
258
259
260.. _conda-forge packages: conda-recipes
261.. _Wheels: python-wheels
262.. _Linux packages: linux-packages
263.. _Crossbow.py: crossbow.py
264.. _Create the queue repository: https://help.github.com/articles/creating-a-new-repository
265.. _TravisCI: https://travis-ci.org/getting_started
266.. _Appveyor: https://www.appveyor.com/docs/
267.. _CircleCI: https://circleci.com/docs/2.0/getting-started/
268.. _Azure Pipelines: https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-sign-up
269.. _auto cancellation: https://docs.travis-ci.com/user/customizing-the-build/#Building-only-the-latest-commit
270.. _Create a Personal Access Token: https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/
271.. _setuptools_scm: https://pypi.python.org/pypi/setuptools_scm
272