New Contributor’s Guide

This guide is a resource for contributing to Apache Arrow for new contributors.

No matter what your current skills are, you can try and make your first time contribution to Arrow.

Starting to contribute to a project like Apache Arrow can be intimidating. Taking small steps will make this task easier.

Why contribute to Arrow?

There can be various reasons why you might want to contribute to Arrow:

  • You find the project interesting and would like to try making a contribution to learn more about the library and grow your skills.

  • You use Arrow in a project you are working on and you would like to implement a new feature or fix a bug you encountered.

Read more about the project in the Architectural Overview section.

Note

Contributors at Apache Arrow are following ASF’s Code of Conduct.

Quick Reference

Here are the basic steps needed to get set up and contribute to Arrow. This is meant both as a checklist and also to provide an overall picture of the process.

For complete instructions please follow Steps in making your first PR (a step-by-step guide) or R and Python Tutorials for an example of adding a basic feature.

  1. Install and set up Git, and fork the Arrow repository

    See detailed instructions on how to Set up Git and fork the Arrow repository.

  2. Build Arrow

    Arrow libraries include a wide range of functionalities and may require the installation of third-party packages, depending on which build options and components you enable. The C++ build guide has suggestions for for commonly encountered issues - you can find it here. Anytime you are stuck, feel free to reach out via appropriate Communication channel.

    See a short description about the building process of PyArrow or the R package or go straight to detailed instructions on how to build one of Arrow libraries in the documentation .

  3. Run the tests

    We should run the tests to check if everything is working correctly. For example, you can run the tests from a terminal for Python

    $ pytest pyarrow
    

    or in R console for R

    devtools::test()
    

    See also the section on Testing 🧪.

  4. Find an issue (if needed), create a new branch and work on the problem

    Finding an issue

    You might already have a bug to fix in mind, or a new feature that you want to implement. But if you don’t and you need an issue to work on, then you may need help finding it. Read through the Finding good first issues 🔎 section to get some ideas.

    Finding your way through the project

    The first step when starting a new project is the hardest and so we’ve wrote some guides to help you with this.

    You can start by reading through Working on the Arrow codebase 🧐 section.

    Communication

    Communication is very important. You may need some help solving problems you encounter on the way (this happens to developers all the time). Also, if you have a JIRA issue you want to solve, then it is advisable to let the team know you are working on it and may need some help.

    See possible channels of Communication.

  5. Once you implemented the planned fix or feature, write and run tests for it

    See detailed instructions on how to test. Also run the linter to make sure the code is styled correctly before proceeding to the next step!

  6. Push the branch on your fork and create a Pull Request

    See detailed instructions on Creating a pull request

If you are ready you can start with building Arrow or choose to follow one of the Tutorials on writing an R binding or Python feature.

You can also take a look at the Helping with documentation or Additional information and resources section.

We want to encourage everyone to contribute to Arrow!

Full Table of Contents