Packaging and Testing with Crossbow#
The content of arrow/dev/tasks
directory aims for automating the process of
Arrow packaging and integration testing.
- Packages:
C++ and Python conda-forge packages for Linux, macOS and Windows
Python Wheels for Linux, macOS and Windows
C++ and GLib Linux packages for multiple distributions
Java for Gandiva
- Integration tests:
Various docker tests
Pandas
Dask
Turbodbc
HDFS
Spark
Architecture#
Executors#
Individual jobs are executed on public CI services, currently:
Linux: GitHub Actions, Travis CI, Azure Pipelines
macOS: GitHub Actions, Azure Pipelines
Windows: GitHub Actions, Azure Pipelines
Queue#
Because of the nature of how the CI services work, the scheduling of
jobs happens through an additional git repository, which acts like a job
queue for the tasks. Anyone can host a queue
repository (usually
named <ghuser>/crossbow
).
A job is a git commit on a particular git branch, containing the required
configuration files to run the requested builds (like .travis.yml
,
azure-pipelines.yml
, or crossbow.yml
for GitHub Actions ).
Scheduler#
Crossbow handles version generation, task rendering and
submission. The tasks are defined in tasks.yml
.
Install#
The following guide depends on GitHub, but theoretically any git server can be used.
If you are not using the ursacomputing/crossbow repository, you will need to complete the first two steps, otherwise proceed to step 3:
Enable Travis CI and Azure Pipelines integrations for the newly created queue repository.
Clone either ursacomputing/crossbow if you are using that, or the newly created repository next to the arrow repository:
By default the scripts looks for a
crossbow
clone next to thearrow
directory, but this can configured through command line arguments.git clone https://github.com/<user>/crossbow crossbow
Important note: Crossbow only supports GitHub token based authentication. Although it overwrites the repository urls provided with ssh protocol, it’s advisable to use the HTTPS repository URLs.
Create a Personal Access Token with
repo
andworkflow
permissions (other permissions are not needed)Locally export the token as an environment variable:
export CROSSBOW_GITHUB_TOKEN=<token>
or pass as an argument to the CLI script
--github-token
Add the previously created GitHub token to Travis CI:
Use
CROSSBOW_GITHUB_TOKEN
encrypted environment variable. You can set it at the following URL, whereghuser
is the GitHub username andghrepo
is the GitHub repository name (typicallycrossbow
):https://travis-ci.com/<ghuser>/<ghrepo>/settings
Confirm the auto cancellation feature is turned off for branch builds. This should be the default setting.
Install Python (minimum supported version is 3.9):
Miniconda is preferred, see installation instructions:Install the archery toolset containing crossbow itself:
$ pip install -e "arrow/dev/archery[crossbow]"
Try running it:
$ archery crossbow --help
Usage#
The script does the following:
Detects the current repository, thus supports forks. The following snippet will build kszucs’s fork instead of the upstream apache/arrow repository.
$ git clone https://github.com/kszucs/arrow $ git clone https://github.com/kszucs/crossbow $ cd arrow/dev/tasks $ archery crossbow submit --help # show the available options $ archery crossbow submit conda-win conda-linux conda-osx
Gets the HEAD commit of the currently checked out branch and generates the version number based on setuptools_scm. So to build a particular branch check out before running the script:
$ git checkout ARROW-<ticket number> $ archery crossbow submit --dry-run conda-linux conda-osx
Note that the arrow branch must be pushed beforehand, because the script will clone the selected branch.
Reads and renders the required build configurations with the parameters substituted.
Create a branch per task, prefixed with the job id. For example, to build conda recipes on linux, it will create a new branch:
crossbow@build-<id>-conda-linux
.Pushes the modified branches to GitHub which triggers the builds. For authentication it uses GitHub OAuth tokens described in the install section.
Query the build status#
Build id (which has a corresponding branch in the queue repository) is returned
by the submit
command.
$ archery crossbow status <build id / branch name>
Download the build artifacts#
$ archery crossbow artifacts <build id / branch name>
Examples#
Submit command accepts a list of task names and/or a list of task-group names to select which tasks to build.
Run multiple builds:
$ archery crossbow submit debian-stretch conda-linux-gcc-py37-r40
Repository: https://github.com/kszucs/arrow@tasks
Commit SHA: 810a718836bb3a8cefc053055600bdcc440e6702
Version: 0.9.1.dev48+g810a7188.d20180414
Pushed branches:
- debian-stretch
- conda-linux-gcc-py37-r40
Just render without applying or committing the changes:
$ archery crossbow submit --dry-run task_name
Run only conda
package builds and a Linux one:
$ archery crossbow submit --group conda centos-7
Run wheel
builds:
$ archery crossbow submit --group wheel
There are multiple task groups in the tasks.yml
like docker, integration
and cpp-python for running docker based tests.
archery crossbow submit
supports multiple options and arguments, for more
see its help page:
$ archery crossbow submit --help