Right now the primary audience for Apache Arrow are the developers of data systems; most people will use Apache Arrow indirectly through systems that use it for internal data handling and interoperating with other Arrow-enabled systems.
Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we’d be happy to have you involved:
Follow our activity on GitHub
Learn the Format / Specification
PyArrow is for the major part a wrapper around the functionalities that Arrow C++ implementation provides. The library tries to take what’s available in C++ and expose it through a user experience that is more pythonic and less complex to use. So while in some cases it might be easy to map what’s in C++ to what’s in Python, in many cases the C++ classes and methods are used as foundations to build easier to use entities.
*.pyfiles in the pyarrow package are usually where the entities exposed to the user are declared. In some cases, those files might directly import the entities from inner implementation if they want to expose it as is without modification.
lib.pyxfile is where the majority of the core C++ libarrow capabilities are exposed to Python. Most of the implementation of this module relies on included
*.pxifiles where the specific pieces are built. While being exposed to Python as
pyarrow.libits content should be considered internal. The public classes are then directly exposed in other modules (like
pyarrowitself) by virtue of importing them from
_*.pyxfiles are where the glue code is usually created, it puts together the C++ capabilities turning it into Python classes and methods. They can be considered the internal implementation of the capabilities exposed by the
includes/*.pxdfiles are where the raw C++ library APIs are declared for usage in Cython. Here the C++ classes and methods are declared as they are so that in the other
.pyxfiles they can be used to implement Python classes, functions and helpers.
Apart from Arrow C++ library, which dependence is mentioned in the previous line, PyArrow is also based on PyArrow C++, dedicated pieces of code that live in
python/pyarrow/src/arrow/pythondirectory and provide the low level code for capabilities like converting to and from numpy or pandas and the classes that allow to use Python objects and callbacks in C++.