Organizations creating products and projects for use with Apache Arrow, along
with associated marketing materials, should take care to respect the trademark
in “Apache Arrow” and its logo. Please refer to ASF Trademarks Guidance
and associated FAQ for comprehensive and authoritative guidance on proper
usage of ASF trademarks.
Names that do not include “Apache Arrow” at all have no potential trademark
issue with the Apache Arrow project. This is recommended.
Names like “Apache Arrow BigCoProduct” are not OK, as are names including
“Apache Arrow” in general. The above links, however, describe some exceptions,
like for names such as “BigCoProduct, powered by Apache Arrow” or
“BigCoProduct for Apache Arrow”.
It is common practice to create software identifiers (Maven coordinates, module
names, etc.) like “arrow-foo”. These are permitted. Nominative use of trademarks
in descriptions is also always allowed, as in “BigCoProduct is a widget for
To add yourself to the list, please open a pull request adding your
organization name, URL, a list of which Arrow components you are using, and a
short description of your use case. See the following for some examples.
Apache Parquet: A columnar storage format available to any project
in the Hadoop ecosystem, regardless of the choice of data processing
framework, data model or programming language. The C++ and Java
implementation provide vectorized reads and write to/from Arrow data
Apache Spark: Apache Spark™ is a fast and general engine for
large-scale data processing. Spark uses Apache Arrow to
improve performance of conversion between Spark DataFrame and pandas DataFrame
enable a set of vectorized user-defined functions (pandas_udf) in PySpark.
Dask: Python library for parallel and distributed execution of
dynamic task graphs. Dask supports using pyarrow for accessing Parquet
Dremio: A self-service data platform. Dremio makes it easy for
users to discover, curate, accelerate, and share data from any source.
It includes a distributed SQL execution engine based on Apache Arrow.
Dremio reads data from any source (RDBMS, HDFS, S3, NoSQL) into Arrow
buffers, and provides fast SQL access via ODBC, JDBC, and REST for BI,
Python, R, and more (all backed by Apache Arrow).
Fletcher: Fletcher is an FPGA acceleration framework that can
convert an Arrow schema into an easy-to-use hardware interface. The
accelerator can request data from Arrow tables by supplying row indices.
In turn, the interface provides streams of data of the types defined
through the schema. Furthermore, Arrow alleviates serialization bottlenecks.
GeoMesa: A suite of tools that enables large-scale geospatial query
and analytics on distributed computing systems. GeoMesa supports query
results in the Arrow IPC format, which can then be used for in-browser
visualizations and/or further analytics.
GOAI: Open GPU-Accelerated Analytics Initiative for Arrow-powered
analytics across GPU tools and vendors
Graphistry: Supercharged Visual Investigation Platform used by
teams for security, anti-fraud, and related investigations. The Graphistry
team uses Arrow in its NodeJS GPU backend and client libraries, and is an
early contributing member to GOAI and Arrow[JS] focused on bringing these
technologies to the enterprise.
libgdf: A C library of CUDA-based analytics functions and GPU IPC
support for structured data. Uses the Arrow IPC format and targets the Arrow
memory layout in its analytic functions. This work is part of the GPU Open
MapD: in-memory columnar SQL engine designed to run on GPUs. MapD
supports Arrow for data ingest and data interchange via CUDA IPC
handles. This work is part of the GPU Open Analytics Initiative
pandas: data analysis toolkit for Python programmers. pandas
supports reading and writing Parquet files using pyarrow. Several pandas
core developers are also contributors to Apache Arrow.
Quilt Data: Quilt is a data package manager, designed to make
managing data as easy as managing code. It supports Parquet format via
pyarrow for data access.
Ray: A flexible, high-performance distributed execution framework
with a focus on machine learning and AI applications. Uses Arrow to
efficiently store Python data structures containing large arrays of numerical
data. Data can be accessed with zero-copy by multiple processes using the
Plasma shared memory object store which originated from Ray and is part
of Arrow now.
Red Data Tools: A project that provides data processing
tools for Ruby. It provides Red Arrow that is a Ruby bindings
of Apache Arrow based on Apache Arrow GLib. Red Arrow is a core
library for it. It also provides many Ruby libraries to integrate
existing Ruby libraries with Apache Arrow. They use Red Arrow.
SciDB: Paradigm4’s SciDB is a scalable, scientific
database management system that helps researchers integrate and
analyze diverse, multi-dimensional, high resolution data - like
genomic, clinical, images, sensor, environmental, and IoT data -
all in one analytical platform. SciDB streaming is
powered by Apache Arrow.
Turbodbc: Python module to access relational databases via the Open
Database Connectivity (ODBC) interface. It provides the ability to return
Arrow Tables and RecordBatches in addition to the Python Database API