Set up

Install and set up Git

The Arrow project is developed using Git for version control which is easily available for all common operating systems.

You can follow the instructions to install Git from GitHub where Arrow repository is hosted, following the quickstart instructions.

When Git is set up do not forget to configure your name and email

$ git config --global user.name "Your Name"
$ git config --global user.email your.email@example.com

and authenticate with GitHub as this will allow you to interact with GitHub without typing a username and password each time you execute a git command.

Note

This guide assumes you are comfortable working from the command line. Some IDEs allow you to manage a Git repository, but may implicitly run unwanted operations when doing so (such as creating project files).

For example, cloning it in RStudio assumes the whole repository is an RStudio project and will create a .Rproj file in the root directory. For this reason it is highly recommended to clone the repository using the command line or a Git client.

Get the source code

Fork the repository

The Arrow GitHub repository contains both the Arrow C++ library plus libraries for other languages such as Go, Java, Matlab, Python, R, etc. The first step to contributing is to create a fork of the repository in your own GitHub account.

  1. Go to https://github.com/apache/arrow.

  2. Press Fork in the top right corner.

    Fork the Apache Arrow repository on GitHub.

    The icon to fork the Apache Arrow repository on GitHub.

  3. Choose to fork the repository to your username so the fork will be created at https://github.com/<your username>/arrow.

Clone the repository

Next you need to clone the repository

$ git clone https://github.com/<your username>/arrow.git

and add Apache Arrow repository as a remote called upstream.

$ cd arrow
$ git remote add upstream https://github.com/apache/arrow

Verify your upstream

Your upstream should be pointing at the Arrow GitHub repo.

Running in the shell:

$ git remote -v

Should give you a result similar to this:

origin       https://github.com/<your username>/arrow.git (fetch)
origin       https://github.com/<your username>/arrow.git (push)
upstream     https://github.com/apache/arrow (fetch)
upstream     https://github.com/apache/arrow (push)

If you did everything correctly, you should now have a copy of the code in the arrow directory and two remotes that refer to your own GitHub fork (origin) and the official Arrow repository (upstream).