Apache Arrow nanoarrow 0.2 Release
Published
22 Jun 2023
By
The Apache Arrow PMC (pmc)
The Apache Arrow team is pleased to announce the 0.2.0 release of Apache Arrow nanoarrow. This initial release covers 19 resolved issues from 6 contributors.
Release Highlights
- Addition of the Arrow IPC stream reader extension
- Addition of the Getting Started with nanoarrow tutorial
- Improvements in reliability and platform test coverage of the C library
- Improvements in reliability and type support in the R bindings
See the Changelog for a detailed list of contributions to this release.
IPC stream support
This release includes support for reading schemas and record batches serialized
using the
Arrow IPC format. Based on the
flatcc
flatbuffers implementation, the nanoarrow IPC read support is implemented as
an optional extension to the core nanoarrow library. The easiest way to get
started is with the ArrowArrayStream
provider using one of the built-in
ArrowIpcInputStream
implementations:
#include <stdio.h>
#include <stdbool.h>
#include "nanoarrow_ipc.h"
int main(int argc, char* argv[]) {
FILE* file_ptr = freopen(NULL, "rb", stdin);
struct ArrowIpcInputStream input;
NANOARROW_RETURN_NOT_OK(ArrowIpcInputStreamInitFile(&input, file_ptr, false));
struct ArrowArrayStream stream;
NANOARROW_RETURN_NOT_OK(ArrowIpcArrayStreamReaderInit(&stream, &input, NULL));
struct ArrowSchema schema;
NANOARROW_RETURN_NOT_OK(stream.get_schema(&stream, &schema));
struct ArrowArray array;
while (true) {
NANOARROW_RETURN_NOT_OK(stream.get_next(&stream, &array));
if (array.release == NULL) {
break;
}
}
return 0;
}
Facilities for advanced usage are also provided via the low-level ArrowIpcDecoder
,
which takes care of the details of deserializing the flatbuffer headers and
assembling buffers into an ArrowArray
. The current implementation can read
schema and record batch messages that contain any Arrow type that supported by
the C data interface. The initial version can read both big and little endian
streams originating from platforms of either endian. Dictionary encoding and
compression are not currently supported.
Getting started with nanoarrow
Early users of the nanoarrow C library respectfully noted that it was difficult to know where to begin. This release includes improvements in reference documentation but also includes a long-form tutorial for those just getting started with or considering adopting the library. You can find the tutorial at the nanoarrow documentation site.
C library
The nanoarrow 0.2.0 release also includes a number of bugfixes and improvements to the core C library, many of which were identified as a result of usage connected with development of the IPC extension.
- Helpers for extracting/appending
Decimal128
andDecimal256
elements from/to arrays were added. - The C library can now perform “full” validation to validate untrusted input (e.g., serialized IPC from the wire).
- The C library can now perform “minimal” validation that performs all checks
that do not access any buffer data. This feature was added to facilitate
future support for the
ArrowDeviceArray
that was recently added as the Arrow C device interface. - Release verification on Ubuntu (x86_64, arm64), Fedora (x86_64),
Archlinux (x86_64), Centos 7 (x86_64, arm64), Alpine (x86_64, arm64, s390x),
Windows (x86_64), and MacOS (x86_64) were added to the continuous
integration system. Linux verification is implemented using
docker compose
to facilitate local checks when developing features that may affect a specific platform.
R bindings
The nanoarrow R bindings are distributed as the nanoarrow
package on
CRAN. The 0.2.0 release of the R bindings includes
improvements in type support, improvements in stability, and features required
for the forthcoming release of Arrow Database Connectivity (ADBC) R bindings.
Notably:
- Support for conversion of union arrays to R objects was added to facilitate support for an ADBC function that returns such an array.
- Support for adding an R-level finalizer to an
ArrowArrayStream
was added to facilitate safely wrapping a stream resulting from an ADBC call at the R level.
Python bindings?
The nanoarrow 0.2.0 release does not include Python bindings, but improvements to the unreleased draft bindings were added to facilitate discussion among Python developers regarding the useful scope of a potential future nanoarrow Python package. If one of those developers is you, feel free to open an issue or send a post to the developer mailing list to engage in the discussion.
Contributors
This initial release consists of contributions from 6 contributors in addition to the invaluable advice and support of the Apache Arrow developer mailing list. Special thanks to David Li for reviewing nearly every PR in this release!
$ git shortlog -sn d91c35b33c7b6ff94f5f929384879352e241ed71..apache-arrow-nanoarrow-0.2.0 | grep -v "GitHub Actions"
61 Dewey Dunnington
2 Dirk Eddelbuettel
2 Joris Van den Bossche
2 Kirill Müller
2 William Ayd
1 David Li