Arrays#

class ArrayData#

Mutable container for generic Arrow array data.

This data structure is a self-contained representation of the memory and metadata inside an Arrow array data structure (called vectors in Java). The classes arrow::Array and its subclasses provide strongly-typed accessors with support for the visitor pattern and other affordances.

This class is designed for easy internal data manipulation, analytical data processing, and data transport to and from IPC messages. For example, we could cast from int64 to float64 like so:

Int64Array arr = GetMyData(); auto new_data = arr.data()->Copy(); new_data->type = arrow::float64(); DoubleArray double_arr(new_data);

This object is also useful in an analytics setting where memory may be reused. For example, if we had a group of operations all returning doubles, say:

Log(Sqrt(Expr(arr)))

Then the low-level implementations of each of these functions could have the signatures

void Log(const ArrayData& values, ArrayData* out);

As another example a function may consume one or more memory buffers in an input array and replace them with newly-allocated data, changing the output data type as well.

Public Functions

std::shared_ptr<ArrayData> Slice(int64_t offset, int64_t length) const#

Construct a zero-copy slice of the data with the given offset and length.

Result<std::shared_ptr<ArrayData>> SliceSafe(int64_t offset, int64_t length) const#

Input-checking variant of Slice.

An Invalid Status is returned if the requested slice falls out of bounds. Note that unlike Slice, length isn’t clamped to the available buffer size.

int64_t GetNullCount() const#

Return physical null count, or compute and set it if it’s not known.

inline bool MayHaveNulls() const#

Return true if the data has a validity bitmap and the physical null count is known to be non-zero or not yet known.

Note that this is not the same as MayHaveLogicalNulls, which also checks for the presence of nulls in child data for types like unions and run-end encoded types.

inline bool HasValidityBitmap() const#

Return true if the data has a validity bitmap.

inline bool MayHaveLogicalNulls() const#

Return true if the validity bitmap may have 0’s in it, or if the child arrays (in the case of types without a validity bitmap) may have nulls.

This is not a drop-in replacement for MayHaveNulls, as historically MayHaveNulls() has been used to check for the presence of a validity bitmap that needs to be checked.

Code that previously used MayHaveNulls() and then dealt with the validity bitmap directly can be fixed to handle all types correctly without performance degradation when handling most types by adopting HasValidityBitmap and MayHaveLogicalNulls.

Before:

uint8_t* validity = array.MayHaveNulls() ? array.buffers[0].data : NULLPTR;
for (int64_t i = 0; i < array.length; ++i) {
  if (validity && !bit_util::GetBit(validity, i)) {
    continue;  // skip a NULL
  }
  ...
}
After:
bool all_valid = !array.MayHaveLogicalNulls();
uint8_t* validity = array.HasValidityBitmap() ? array.buffers[0].data : NULLPTR;
for (int64_t i = 0; i < array.length; ++i) {
  bool is_valid = all_valid ||
                  (validity && bit_util::GetBit(validity, i)) ||
                  array.IsValid(i);
  if (!is_valid) {
    continue;  // skip a NULL
  }
  ...
}

int64_t ComputeLogicalNullCount() const#

Computes the logical null count for arrays of all types including those that do not have a validity bitmap like union and run-end encoded arrays.

If the array has a validity bitmap, this function behaves the same as GetNullCount. For types that have no validity bitmap, this function will recompute the null count every time it is called.

See also

GetNullCount

class Array#

Array base type Immutable data array with some logical type and some length.

Any memory is owned by the respective Buffer instance (or its parents).

The base class is only required to have a null bitmap buffer if the null count is greater than 0

If known, the null count can be provided in the base Array constructor. If the null count is not known, pass -1 to indicate that the null count is to be computed on the first call to null_count()

Subclassed by arrow::BaseListArray< LargeListType >, arrow::BaseListArray< ListType >, arrow::BaseListArray< TYPE >, arrow::DictionaryArray, arrow::ExtensionArray, arrow::FixedSizeListArray, arrow::FlatArray, arrow::RunEndEncodedArray, arrow::StructArray, arrow::UnionArray

Public Functions

inline bool IsNull(int64_t i) const#

Return true if value at index is null. Does not boundscheck.

inline bool IsValid(int64_t i) const#

Return true if value at index is valid (not null).

Does not boundscheck

Result<std::shared_ptr<Scalar>> GetScalar(int64_t i) const#

Return a Scalar containing the value of this array at i.

inline int64_t length() const#

Size in the number of elements this array contains.

inline int64_t offset() const#

A relative position into another array’s data, to enable zero-copy slicing.

This value defaults to zero

int64_t null_count() const#

The number of null entries in the array.

If the null count was not known at time of construction (and set to a negative value), then the null count will be computed and cached on the first invocation of this function

int64_t ComputeLogicalNullCount() const#

Computes the logical null count for arrays of all types including those that do not have a validity bitmap like union and run-end encoded arrays.

If the array has a validity bitmap, this function behaves the same as null_count(). For types that have no validity bitmap, this function will recompute the null count every time it is called.

See also

GetNullCount

inline const std::shared_ptr<Buffer> &null_bitmap() const#

Buffer for the validity (null) bitmap, if any.

Note that Union types never have a null bitmap.

Note that for null_count == 0 or for null type, this will be null. This buffer does not account for any slice offset

inline const uint8_t *null_bitmap_data() const#

Raw pointer to the null bitmap.

Note that for null_count == 0 or for null type, this will be null. This buffer does not account for any slice offset

bool Equals(const Array &arr, const EqualOptions& = EqualOptions::Defaults()) const#

Equality comparison with another array.

std::string Diff(const Array &other) const#

Return the formatted unified diff of arrow::Diff between this Array and another Array.

bool ApproxEquals(const std::shared_ptr<Array> &arr, const EqualOptions& = EqualOptions::Defaults()) const#

Approximate equality comparison with another array.

epsilon is only used if this is FloatArray or DoubleArray

bool RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx, const Array &other, const EqualOptions& = EqualOptions::Defaults()) const#

Compare if the range of slots specified are equal for the given array and this array.

end_idx exclusive. This methods does not bounds check.

Status Accept(ArrayVisitor *visitor) const#

Apply the ArrayVisitor::Visit() method specialized to the array type.

Result<std::shared_ptr<Array>> View(const std::shared_ptr<DataType> &type) const#

Construct a zero-copy view of this array with the given type.

This method checks if the types are layout-compatible. Nested types are traversed in depth-first order. Data buffers must have the same item sizes, even though the logical types may be different. An error is returned if the types are not layout-compatible.

std::shared_ptr<Array> Slice(int64_t offset, int64_t length) const#

Construct a zero-copy slice of the array with the indicated offset and length.

Parameters:
  • offset[in] the position of the first element in the constructed slice

  • length[in] the length of the slice. If there are not enough elements in the array, the length will be adjusted accordingly

Returns:

a new object wrapped in std::shared_ptr<Array>

std::shared_ptr<Array> Slice(int64_t offset) const#

Slice from offset until end of the array.

Result<std::shared_ptr<Array>> SliceSafe(int64_t offset, int64_t length) const#

Input-checking variant of Array::Slice.

Result<std::shared_ptr<Array>> SliceSafe(int64_t offset) const#

Input-checking variant of Array::Slice.

std::string ToString() const#
Returns:

PrettyPrint representation of array suitable for debugging

Status Validate() const#

Perform cheap validation checks to determine obvious inconsistencies within the array’s internal data.

This is O(k) where k is the number of descendents.

Returns:

Status

Status ValidateFull() const#

Perform extensive validation checks to determine inconsistencies within the array’s internal data.

This is potentially O(k*n) where k is the number of descendents and n is the array length.

Returns:

Status

Concrete array subclasses#

Primitive and temporal#

class NullArray : public arrow::FlatArray#

Degenerate null type Array.

class BooleanArray : public arrow::PrimitiveArray#

Concrete Array class for boolean data.

Public Functions

int64_t false_count() const#

Return the number of false (0) values among the valid values.

Result is not cached.

int64_t true_count() const#

Return the number of true (1) values among the valid values.

Result is not cached.

using DecimalArray = Decimal128Array#
class Decimal128Array : public arrow::FixedSizeBinaryArray#
#include <arrow/array/array_decimal.h>

Concrete Array class for 128-bit decimal data.

Public Functions

explicit Decimal128Array(const std::shared_ptr<ArrayData> &data)#

Construct Decimal128Array from ArrayData instance.

class Decimal256Array : public arrow::FixedSizeBinaryArray#
#include <arrow/array/array_decimal.h>

Concrete Array class for 256-bit decimal data.

Public Functions

explicit Decimal256Array(const std::shared_ptr<ArrayData> &data)#

Construct Decimal256Array from ArrayData instance.

template<typename TYPE>
class NumericArray : public arrow::PrimitiveArray#
#include <arrow/array/array_primitive.h>

Concrete Array class for numeric data with a corresponding C type.

This class is templated on the corresponding DataType subclass for the given data, for example NumericArray<Int8Type> or NumericArray<Date32Type>.

Note that convenience aliases are available for all accepted types (for example Int8Array for NumericArray<Int8Type>).

class DayTimeIntervalArray : public arrow::PrimitiveArray#
#include <arrow/array/array_primitive.h>

Array of Day and Millisecond values.

DayTimeArray

class MonthDayNanoIntervalArray : public arrow::PrimitiveArray#
#include <arrow/array/array_primitive.h>

Array of Month, Day and nanosecond values.

Binary-like#

template<typename TYPE>
class BaseBinaryArray : public arrow::FlatArray#
#include <arrow/array/array_binary.h>

Base class for variable-sized binary arrays, regardless of offset size and logical interpretation.

Public Functions

inline const uint8_t *GetValue(int64_t i, offset_type *out_length) const#

Return the pointer to the given elements bytes.

inline std::string_view GetView(int64_t i) const#

Get binary value as a string_view.

Parameters:

i – the value index

Returns:

the view over the selected value

inline std::string_view Value(int64_t i) const#

Get binary value as a string_view Provided for consistency with other arrays.

Parameters:

i – the value index

Returns:

the view over the selected value

inline std::string GetString(int64_t i) const#

Get binary value as a std::string.

Parameters:

i – the value index

Returns:

the value copied into a std::string

inline std::shared_ptr<Buffer> value_offsets() const#

Note that this buffer does not account for any slice offset.

inline std::shared_ptr<Buffer> value_data() const#

Note that this buffer does not account for any slice offset.

inline offset_type value_offset(int64_t i) const#

Return the data buffer absolute offset of the data for the value at the passed index.

Does not perform boundschecking

inline offset_type value_length(int64_t i) const#

Return the length of the data for the value at the passed index.

Does not perform boundschecking

inline offset_type total_values_length() const#

Return the total length of the memory in the data buffer referenced by this array.

If the array has been sliced then this may be less than the size of the data buffer (data_->buffers[2]).

class BinaryArray : public arrow::BaseBinaryArray<BinaryType>#
#include <arrow/array/array_binary.h>

Concrete Array class for variable-size binary data.

Subclassed by arrow::StringArray

class StringArray : public arrow::BinaryArray#
#include <arrow/array/array_binary.h>

Concrete Array class for variable-size string (utf-8) data.

Public Functions

Status ValidateUTF8() const#

Validate that this array contains only valid UTF8 entries.

This check is also implied by ValidateFull()

class LargeBinaryArray : public arrow::BaseBinaryArray<LargeBinaryType>#
#include <arrow/array/array_binary.h>

Concrete Array class for large variable-size binary data.

Subclassed by arrow::LargeStringArray

class LargeStringArray : public arrow::LargeBinaryArray#
#include <arrow/array/array_binary.h>

Concrete Array class for large variable-size string (utf-8) data.

Public Functions

Status ValidateUTF8() const#

Validate that this array contains only valid UTF8 entries.

This check is also implied by ValidateFull()

class FixedSizeBinaryArray : public arrow::PrimitiveArray#
#include <arrow/array/array_binary.h>

Concrete Array class for fixed-size binary data.

Subclassed by arrow::Decimal128Array, arrow::Decimal256Array

Nested#

template<typename TYPE>
class BaseListArray : public arrow::Array#
#include <arrow/array/array_nested.h>

Base class for variable-sized list arrays, regardless of offset size.

Public Functions

inline const std::shared_ptr<Array> &values() const#

Return array object containing the list’s values.

Note that this buffer does not account for any slice offset or length.

inline const std::shared_ptr<Buffer> &value_offsets() const#

Note that this buffer does not account for any slice offset or length.

inline const offset_type *raw_value_offsets() const#

Return pointer to raw value offsets accounting for any slice offset.

class ListArray : public arrow::BaseListArray<ListType>#
#include <arrow/array/array_nested.h>

Concrete Array class for list data.

Subclassed by arrow::MapArray

Public Functions

Result<std::shared_ptr<Array>> Flatten(MemoryPool *memory_pool = default_memory_pool()) const#

Return an Array that is a concatenation of the lists in this array.

Note that it’s different from values() in that it takes into consideration of this array’s offsets as well as null elements backed by non-empty lists (they are skipped, thus copying may be needed).

std::shared_ptr<Array> offsets() const#

Return list offsets as an Int32Array.

The returned array will not have a validity bitmap, so you cannot expect to pass it to ListArray::FromArrays() and get back the same list array if the original one has nulls.

Public Static Functions

static Result<std::shared_ptr<ListArray>> FromArrays(const Array &offsets, const Array &values, MemoryPool *pool = default_memory_pool(), std::shared_ptr<Buffer> null_bitmap = NULLPTR, int64_t null_count = kUnknownNullCount)#

Construct ListArray from array of offsets and child value array.

This function does the bare minimum of validation of the offsets and input types, and will allocate a new offsets array if necessary (i.e. if the offsets contain any nulls). If the offsets do not have nulls, they are assumed to be well-formed

Offsets of an Array’s null bitmap can be present or an explicit null_bitmap, but not both.

Parameters:
  • offsets[in] Array containing n + 1 offsets encoding length and size. Must be of int32 type

  • values[in] Array containing list values

  • pool[in] MemoryPool in case new offsets array needs to be allocated because of null values

  • null_bitmap[in] Optional validity bitmap

  • null_count[in] Optional null count in null_bitmap

class LargeListArray : public arrow::BaseListArray<LargeListType>#
#include <arrow/array/array_nested.h>

Concrete Array class for large list data (with 64-bit offsets)

Public Functions

Result<std::shared_ptr<Array>> Flatten(MemoryPool *memory_pool = default_memory_pool()) const#

Return an Array that is a concatenation of the lists in this array.

Note that it’s different from values() in that it takes into consideration of this array’s offsets as well as null elements backed by non-empty lists (they are skipped, thus copying may be needed).

std::shared_ptr<Array> offsets() const#

Return list offsets as an Int64Array.

Public Static Functions

static Result<std::shared_ptr<LargeListArray>> FromArrays(const Array &offsets, const Array &values, MemoryPool *pool = default_memory_pool(), std::shared_ptr<Buffer> null_bitmap = NULLPTR, int64_t null_count = kUnknownNullCount)#

Construct LargeListArray from array of offsets and child value array.

This function does the bare minimum of validation of the offsets and input types, and will allocate a new offsets array if necessary (i.e. if the offsets contain any nulls). If the offsets do not have nulls, they are assumed to be well-formed

Parameters:
  • offsets[in] Array containing n + 1 offsets encoding length and size. Must be of int64 type

  • values[in] Array containing list values

  • pool[in] MemoryPool in case new offsets array needs to be allocated because of null values

  • null_bitmap[in] Optional validity bitmap

  • null_count[in] Optional null count in null_bitmap

class MapArray : public arrow::ListArray#
#include <arrow/array/array_nested.h>

Concrete Array class for map data.

NB: “value” in this context refers to a pair of a key and the corresponding item

Public Functions

inline const std::shared_ptr<Array> &keys() const#

Return array object containing all map keys.

inline const std::shared_ptr<Array> &items() const#

Return array object containing all mapped items.

Public Static Functions

static Result<std::shared_ptr<Array>> FromArrays(const std::shared_ptr<Array> &offsets, const std::shared_ptr<Array> &keys, const std::shared_ptr<Array> &items, MemoryPool *pool = default_memory_pool())#

Construct MapArray from array of offsets and child key, item arrays.

This function does the bare minimum of validation of the offsets and input types, and will allocate a new offsets array if necessary (i.e. if the offsets contain any nulls). If the offsets do not have nulls, they are assumed to be well-formed

Parameters:
  • offsets[in] Array containing n + 1 offsets encoding length and size. Must be of int32 type

  • keys[in] Array containing key values

  • items[in] Array containing item values

  • pool[in] MemoryPool in case new offsets array needs to be allocated because of null values

static Status ValidateChildData(const std::vector<std::shared_ptr<ArrayData>> &child_data)#

Validate child data before constructing the actual MapArray.

class FixedSizeListArray : public arrow::Array#
#include <arrow/array/array_nested.h>

Concrete Array class for fixed size list data.

Public Functions

const std::shared_ptr<Array> &values() const#

Return array object containing the list’s values.

Result<std::shared_ptr<Array>> Flatten(MemoryPool *memory_pool = default_memory_pool()) const#

Return an Array that is a concatenation of the lists in this array.

Note that it’s different from values() in that it takes into consideration null elements (they are skipped, thus copying may be needed).

Public Static Functions

static Result<std::shared_ptr<Array>> FromArrays(const std::shared_ptr<Array> &values, int32_t list_size)#

Construct FixedSizeListArray from child value array and value_length.

Parameters:
  • values[in] Array containing list values

  • list_size[in] The fixed length of each list

Returns:

Will have length equal to values.length() / list_size

static Result<std::shared_ptr<Array>> FromArrays(const std::shared_ptr<Array> &values, std::shared_ptr<DataType> type)#

Construct FixedSizeListArray from child value array and type.

Parameters:
  • values[in] Array containing list values

  • type[in] The fixed sized list type

Returns:

Will have length equal to values.length() / type.list_size()

class StructArray : public arrow::Array#
#include <arrow/array/array_nested.h>

Concrete Array class for struct data.

Public Functions

std::shared_ptr<Array> GetFieldByName(const std::string &name) const#

Returns null if name not found.

Status CanReferenceFieldByName(const std::string &name) const#

Indicate if field named name can be found unambiguously in the struct.

Status CanReferenceFieldsByNames(const std::vector<std::string> &names) const#

Indicate if fields named names can be found unambiguously in the struct.

Result<ArrayVector> Flatten(MemoryPool *pool = default_memory_pool()) const#

Flatten this array as a vector of arrays, one for each field.

Parameters:

pool[in] The pool to allocate null bitmaps from, if necessary

Result<std::shared_ptr<Array>> GetFlattenedField(int index, MemoryPool *pool = default_memory_pool()) const#

Get one of the child arrays, combining its null bitmap with the parent struct array’s bitmap.

Parameters:
  • index[in] Which child array to get

  • pool[in] The pool to allocate null bitmaps from, if necessary

Public Static Functions

static Result<std::shared_ptr<StructArray>> Make(const ArrayVector &children, const std::vector<std::string> &field_names, std::shared_ptr<Buffer> null_bitmap = NULLPTR, int64_t null_count = kUnknownNullCount, int64_t offset = 0)#

Return a StructArray from child arrays and field names.

The length and data type are automatically inferred from the arguments. There should be at least one child array.

static Result<std::shared_ptr<StructArray>> Make(const ArrayVector &children, const FieldVector &fields, std::shared_ptr<Buffer> null_bitmap = NULLPTR, int64_t null_count = kUnknownNullCount, int64_t offset = 0)#

Return a StructArray from child arrays and fields.

The length is automatically inferred from the arguments. There should be at least one child array. This method does not check that field types and child array types are consistent.

class UnionArray : public arrow::Array#
#include <arrow/array/array_nested.h>

Base class for SparseUnionArray and DenseUnionArray.

Subclassed by arrow::DenseUnionArray, arrow::SparseUnionArray

Public Functions

inline const std::shared_ptr<Buffer> &type_codes() const#

Note that this buffer does not account for any slice offset.

inline type_code_t type_code(int64_t i) const#

The logical type code of the value at index.

inline int child_id(int64_t i) const#

The physical child id containing value at index.

std::shared_ptr<Array> field(int pos) const#

Return the given field as an individual array.

For sparse unions, the returned array has its offset, length and null count adjusted.

class SparseUnionArray : public arrow::UnionArray#
#include <arrow/array/array_nested.h>

Concrete Array class for sparse union data.

Public Functions

Result<std::shared_ptr<Array>> GetFlattenedField(int index, MemoryPool *pool = default_memory_pool()) const#

Get one of the child arrays, adjusting its null bitmap where the union array type code does not match.

Parameters:
  • index[in] Which child array to get (i.e. the physical index, not the type code)

  • pool[in] The pool to allocate null bitmaps from, if necessary

Public Static Functions

static inline Result<std::shared_ptr<Array>> Make(const Array &type_ids, ArrayVector children, std::vector<type_code_t> type_codes)#

Construct SparseUnionArray from type_ids and children.

This function does the bare minimum of validation of the input types.

Parameters:
  • type_ids[in] An array of logical type ids for the union type

  • children[in] Vector of children Arrays containing the data for each type.

  • type_codes[in] Vector of type codes.

static Result<std::shared_ptr<Array>> Make(const Array &type_ids, ArrayVector children, std::vector<std::string> field_names = {}, std::vector<type_code_t> type_codes = {})#

Construct SparseUnionArray with custom field names from type_ids and children.

This function does the bare minimum of validation of the input types.

Parameters:
  • type_ids[in] An array of logical type ids for the union type

  • children[in] Vector of children Arrays containing the data for each type.

  • field_names[in] Vector of strings containing the name of each field.

  • type_codes[in] Vector of type codes.

class DenseUnionArray : public arrow::UnionArray#
#include <arrow/array/array_nested.h>

Concrete Array class for dense union data.

Note that union types do not have a validity bitmap

Public Functions

inline const std::shared_ptr<Buffer> &value_offsets() const#

Note that this buffer does not account for any slice offset.

Public Static Functions

static inline Result<std::shared_ptr<Array>> Make(const Array &type_ids, const Array &value_offsets, ArrayVector children, std::vector<type_code_t> type_codes)#

Construct DenseUnionArray from type_ids, value_offsets, and children.

This function does the bare minimum of validation of the offsets and input types.

Parameters:
  • type_ids[in] An array of logical type ids for the union type

  • value_offsets[in] An array of signed int32 values indicating the relative offset into the respective child array for the type in a given slot. The respective offsets for each child value array must be in order / increasing.

  • children[in] Vector of children Arrays containing the data for each type.

  • type_codes[in] Vector of type codes.

static Result<std::shared_ptr<Array>> Make(const Array &type_ids, const Array &value_offsets, ArrayVector children, std::vector<std::string> field_names = {}, std::vector<type_code_t> type_codes = {})#

Construct DenseUnionArray with custom field names from type_ids, value_offsets, and children.

This function does the bare minimum of validation of the offsets and input types.

Parameters:
  • type_ids[in] An array of logical type ids for the union type

  • value_offsets[in] An array of signed int32 values indicating the relative offset into the respective child array for the type in a given slot. The respective offsets for each child value array must be in order / increasing.

  • children[in] Vector of children Arrays containing the data for each type.

  • field_names[in] Vector of strings containing the name of each field.

  • type_codes[in] Vector of type codes.

Dictionary-encoded#

class DictionaryArray : public arrow::Array#

Array type for dictionary-encoded data with a data-dependent dictionary.

A dictionary array contains an array of non-negative integers (the “dictionary indices”) along with a data type containing a “dictionary” corresponding to the distinct values represented in the data.

For example, the array

[“foo”, “bar”, “foo”, “bar”, “foo”, “bar”]

with dictionary [“bar”, “foo”], would have dictionary array representation

indices: [1, 0, 1, 0, 1, 0] dictionary: [“bar”, “foo”]

The indices in principle may be any integer type.

Public Functions

Result<std::shared_ptr<Array>> Transpose(const std::shared_ptr<DataType> &type, const std::shared_ptr<Array> &dictionary, const int32_t *transpose_map, MemoryPool *pool = default_memory_pool()) const#

Transpose this DictionaryArray.

This method constructs a new dictionary array with the given dictionary type, transposing indices using the transpose map. The type and the transpose map are typically computed using DictionaryUnifier.

Parameters:
  • type[in] the new type object

  • dictionary[in] the new dictionary

  • transpose_map[in] transposition array of this array’s indices into the target array’s indices

  • pool[in] a pool to allocate the array data from

bool CanCompareIndices(const DictionaryArray &other) const#

Determine whether dictionary arrays may be compared without unification.

const std::shared_ptr<Array> &dictionary() const#

Return the dictionary for this array, which is stored as a member of the ArrayData internal structure.

int64_t GetValueIndex(int64_t i) const#

Return the ith value of indices, cast to int64_t.

Not recommended for use in performance-sensitive code. Does not validate whether the value is null or out-of-bounds.

Public Static Functions

static Result<std::shared_ptr<Array>> FromArrays(const std::shared_ptr<DataType> &type, const std::shared_ptr<Array> &indices, const std::shared_ptr<Array> &dictionary)#

Construct DictionaryArray from dictionary and indices array and validate.

This function does the validation of the indices and input type. It checks if all indices are non-negative and smaller than the size of the dictionary.

Parameters:
  • type[in] a dictionary type

  • dictionary[in] the dictionary with same value type as the type object

  • indices[in] an array of non-negative integers smaller than the size of the dictionary

Extension arrays#

class ExtensionArray : public arrow::Array#

Base array class for user-defined extension types.

Subclassed by arrow::extension::FixedShapeTensorArray

Public Functions

explicit ExtensionArray(const std::shared_ptr<ArrayData> &data)#

Construct an ExtensionArray from an ArrayData.

The ArrayData must have the right ExtensionType.

ExtensionArray(const std::shared_ptr<DataType> &type, const std::shared_ptr<Array> &storage)#

Construct an ExtensionArray from a type and the underlying storage.

inline const std::shared_ptr<Array> &storage() const#

The physical storage for the extension array.

Chunked Arrays#

class ChunkedArray#

A data structure managing a list of primitive Arrow arrays logically as one large array.

Data chunking is treated throughout this project largely as an implementation detail for performance and memory use optimization. ChunkedArray allows Array objects to be collected and interpreted as a single logical array without requiring an expensive concatenation step.

In some cases, data produced by a function may exceed the capacity of an Array (like BinaryArray or StringArray) and so returning multiple Arrays is the only possibility. In these cases, we recommend returning a ChunkedArray instead of vector of Arrays or some alternative.

When data is processed in parallel, it may not be practical or possible to create large contiguous memory allocations and write output into them. With some data types, like binary and string types, it is not possible at all to produce non-chunked array outputs without requiring a concatenation step at the end of processing.

Application developers may tune chunk sizes based on analysis of performance profiles but many developer-users will not need to be especially concerned with the chunking details.

Preserving the chunk layout/sizes in processing steps is generally not considered to be a contract in APIs. A function may decide to alter the chunking of its result. Similarly, APIs accepting multiple ChunkedArray inputs should not expect the chunk layout to be the same in each input.

Public Functions

inline explicit ChunkedArray(std::shared_ptr<Array> chunk)#

Construct a chunked array from a single Array.

explicit ChunkedArray(ArrayVector chunks, std::shared_ptr<DataType> type = NULLPTR)#

Construct a chunked array from a vector of arrays and an optional data type.

The vector elements must have the same data type. If the data type is passed explicitly, the vector may be empty. If the data type is omitted, the vector must be non-empty.

inline int64_t length() const#
Returns:

the total length of the chunked array; computed on construction

inline int64_t null_count() const#
Returns:

the total number of nulls among all chunks

inline int num_chunks() const#
Returns:

the total number of chunks in the chunked array

inline const std::shared_ptr<Array> &chunk(int i) const#
Returns:

chunk a particular chunk from the chunked array

inline const ArrayVector &chunks() const#
Returns:

an ArrayVector of chunks

std::shared_ptr<ChunkedArray> Slice(int64_t offset, int64_t length) const#

Construct a zero-copy slice of the chunked array with the indicated offset and length.

Parameters:
  • offset[in] the position of the first element in the constructed slice

  • length[in] the length of the slice. If there are not enough elements in the chunked array, the length will be adjusted accordingly

Returns:

a new object wrapped in std::shared_ptr<ChunkedArray>

std::shared_ptr<ChunkedArray> Slice(int64_t offset) const#

Slice from offset until end of the chunked array.

Result<std::vector<std::shared_ptr<ChunkedArray>>> Flatten(MemoryPool *pool = default_memory_pool()) const#

Flatten this chunked array as a vector of chunked arrays, one for each struct field.

Parameters:

pool[in] The pool for buffer allocations, if any

Result<std::shared_ptr<ChunkedArray>> View(const std::shared_ptr<DataType> &type) const#

Construct a zero-copy view of this chunked array with the given type.

Calls Array::View on each constituent chunk. Always succeeds if there are zero chunks

inline const std::shared_ptr<DataType> &type() const#

Return the type of the chunked array.

Result<std::shared_ptr<Scalar>> GetScalar(int64_t index) const#

Return a Scalar containing the value of this array at index.

bool Equals(const ChunkedArray &other) const#

Determine if two chunked arrays are equal.

Two chunked arrays can be equal only if they have equal datatypes. However, they may be equal even if they have different chunkings.

bool Equals(const std::shared_ptr<ChunkedArray> &other) const#

Determine if two chunked arrays are equal.

bool ApproxEquals(const ChunkedArray &other, const EqualOptions& = EqualOptions::Defaults()) const#

Determine if two chunked arrays approximately equal.

std::string ToString() const#
Returns:

PrettyPrint representation suitable for debugging

Status Validate() const#

Perform cheap validation checks to determine obvious inconsistencies within the chunk array’s internal data.

This is O(k*m) where k is the number of array descendents, and m is the number of chunks.

Returns:

Status

Status ValidateFull() const#

Perform extensive validation checks to determine inconsistencies within the chunk array’s internal data.

This is O(k*n) where k is the number of array descendents, and n is the length in elements.

Returns:

Status

Public Static Functions

static Result<std::shared_ptr<ChunkedArray>> MakeEmpty(std::shared_ptr<DataType> type, MemoryPool *pool = default_memory_pool())#

Create an empty ChunkedArray of a given type.

The output ChunkedArray will have one chunk with an empty array of the given type.

Parameters:
  • type[in] the data type of the empty ChunkedArray

  • pool[in] the memory pool to allocate memory from

Returns:

the resulting ChunkedArray

Utilities#

class ArrayVisitor#

Abstract array visitor class.

Subclass this to create a visitor that can be used with the Array::Accept() method.

Public Functions

virtual ~ArrayVisitor() = default#
virtual Status Visit(const NullArray &array)#
virtual Status Visit(const BooleanArray &array)#
virtual Status Visit(const Int8Array &array)#
virtual Status Visit(const Int16Array &array)#
virtual Status Visit(const Int32Array &array)#
virtual Status Visit(const Int64Array &array)#
virtual Status Visit(const UInt8Array &array)#
virtual Status Visit(const UInt16Array &array)#
virtual Status Visit(const UInt32Array &array)#
virtual Status Visit(const UInt64Array &array)#
virtual Status Visit(const HalfFloatArray &array)#
virtual Status Visit(const FloatArray &array)#
virtual Status Visit(const DoubleArray &array)#
virtual Status Visit(const StringArray &array)#
virtual Status Visit(const BinaryArray &array)#
virtual Status Visit(const LargeStringArray &array)#
virtual Status Visit(const LargeBinaryArray &array)#
virtual Status Visit(const FixedSizeBinaryArray &array)#
virtual Status Visit(const Date32Array &array)#
virtual Status Visit(const Date64Array &array)#
virtual Status Visit(const Time32Array &array)#
virtual Status Visit(const Time64Array &array)#
virtual Status Visit(const TimestampArray &array)#
virtual Status Visit(const DayTimeIntervalArray &array)#
virtual Status Visit(const MonthDayNanoIntervalArray &array)#
virtual Status Visit(const MonthIntervalArray &array)#
virtual Status Visit(const DurationArray &array)#
virtual Status Visit(const Decimal128Array &array)#
virtual Status Visit(const Decimal256Array &array)#
virtual Status Visit(const ListArray &array)#
virtual Status Visit(const LargeListArray &array)#
virtual Status Visit(const MapArray &array)#
virtual Status Visit(const FixedSizeListArray &array)#
virtual Status Visit(const StructArray &array)#
virtual Status Visit(const SparseUnionArray &array)#
virtual Status Visit(const DenseUnionArray &array)#
virtual Status Visit(const DictionaryArray &array)#
virtual Status Visit(const RunEndEncodedArray &array)#
virtual Status Visit(const ExtensionArray &array)#