Data Types

enum arrow::Type::type

Main data type enumeration.

This enumeration provides a quick way to interrogate the category of a DataType instance.

Values:

enumerator NA

A NULL type having no physical storage.

enumerator BOOL

Boolean as 1 bit, LSB bit-packed ordering.

enumerator UINT8

Unsigned 8-bit little-endian integer.

enumerator INT8

Signed 8-bit little-endian integer.

enumerator UINT16

Unsigned 16-bit little-endian integer.

enumerator INT16

Signed 16-bit little-endian integer.

enumerator UINT32

Unsigned 32-bit little-endian integer.

enumerator INT32

Signed 32-bit little-endian integer.

enumerator UINT64

Unsigned 64-bit little-endian integer.

enumerator INT64

Signed 64-bit little-endian integer.

enumerator HALF_FLOAT

2-byte floating point value

enumerator FLOAT

4-byte floating point value

enumerator DOUBLE

8-byte floating point value

enumerator STRING

UTF8 variable-length string as List<Char>

enumerator BINARY

Variable-length bytes (no guarantee of UTF8-ness)

enumerator FIXED_SIZE_BINARY

Fixed-size binary. Each value occupies the same number of bytes.

enumerator DATE32

int32_t days since the UNIX epoch

enumerator DATE64

int64_t milliseconds since the UNIX epoch

enumerator TIMESTAMP

Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond.

enumerator TIME32

Time as signed 32-bit integer, representing either seconds or milliseconds since midnight.

enumerator TIME64

Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight.

enumerator INTERVAL_MONTHS

YEAR_MONTH interval in SQL style.

enumerator INTERVAL_DAY_TIME

DAY_TIME interval in SQL style.

enumerator DECIMAL

Precision- and scale-based decimal type.

Storage type depends on the parameters.

enumerator LIST

A list of some logical data type.

enumerator STRUCT

Struct of logical types.

enumerator SPARSE_UNION

Sparse unions of logical types.

enumerator DENSE_UNION

Dense unions of logical types.

enumerator DICTIONARY

Dictionary-encoded type, also called “categorical” or “factor” in other programming languages.

Holds the dictionary value type but not the dictionary itself, which is part of the ArrayData struct

enumerator MAP

Map, a repeated struct logical type.

enumerator EXTENSION

Custom data type, implemented by user.

enumerator FIXED_SIZE_LIST

Fixed size list of some logical type.

enumerator DURATION

Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds.

enumerator LARGE_STRING

Like STRING, but with 64-bit offsets.

enumerator LARGE_BINARY

Like BINARY, but with 64-bit offsets.

enumerator LARGE_LIST

Like LIST, but with 64-bit offsets.

enumerator MAX_ID
class arrow::DataType : public arrow::detail::Fingerprintable

Base class for all data types.

Data types in this library are all logical. They can be expressed as either a primitive physical type (bytes or bits of some fixed size), a nested type consisting of other data types, or another data type (e.g. a timestamp encoded as an int64).

Simple datatypes may be entirely described by their Type::type id, but complex datatypes are usually parametric.

Subclassed by arrow::BaseBinaryType, arrow::ExtensionType, arrow::FixedWidthType, arrow::NestedType, arrow::NullType

Public Functions

bool Equals(const DataType &other, bool check_metadata = false) const

Return whether the types are equal.

Types that are logically convertible from one to another (e.g. List<UInt8> and Binary) are NOT equal.

bool Equals(const std::shared_ptr<DataType> &other) const

Return whether the types are equal.

const std::shared_ptr<Field> &field(int i) const

Returns the the child-field at index i.

const std::vector<std::shared_ptr<Field>> &fields() const

Returns the children fields associated with this type.

int num_fields() const

Returns the number of children fields associated with this type.

std::string ToString() const = 0

A string representation of the type, including any children.

size_t Hash() const

Return hash value (excluding metadata in child fields)

std::string name() const = 0

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

DataTypeLayout layout() const = 0

Return the data type layout.

Children are not included.

Note

Experimental API

Type::type id() const

Return the type category.

Factory functions

These functions are recommended for creating data types. They may return new objects or existing singletons, depending on the type requested.

std::shared_ptr<DataType> null()

Return a NullType instance.

std::shared_ptr<DataType> boolean()

Return a BooleanType instance.

std::shared_ptr<DataType> int8()

Return a Int8Type instance.

std::shared_ptr<DataType> int16()

Return a Int16Type instance.

std::shared_ptr<DataType> int32()

Return a Int32Type instance.

std::shared_ptr<DataType> int64()

Return a Int64Type instance.

std::shared_ptr<DataType> uint8()

Return a UInt8Type instance.

std::shared_ptr<DataType> uint16()

Return a UInt16Type instance.

std::shared_ptr<DataType> uint32()

Return a UInt32Type instance.

std::shared_ptr<DataType> uint64()

Return a UInt64Type instance.

std::shared_ptr<DataType> float16()

Return a HalfFloatType instance.

std::shared_ptr<DataType> float32()

Return a FloatType instance.

std::shared_ptr<DataType> float64()

Return a DoubleType instance.

std::shared_ptr<DataType> utf8()

Return a StringType instance.

std::shared_ptr<DataType> large_utf8()

Return a LargeStringType instance.

std::shared_ptr<DataType> binary()

Return a BinaryType instance.

std::shared_ptr<DataType> large_binary()

Return a LargeBinaryType instance.

std::shared_ptr<DataType> date32()

Return a Date32Type instance.

std::shared_ptr<DataType> date64()

Return a Date64Type instance.

std::shared_ptr<DataType> fixed_size_binary(int32_t byte_width)

Create a FixedSizeBinaryType instance.

std::shared_ptr<DataType> decimal(int32_t precision, int32_t scale)

Create a Decimal128Type instance.

std::shared_ptr<DataType> list(const std::shared_ptr<Field> &value_type)

Create a ListType instance from its child Field type.

std::shared_ptr<DataType> list(const std::shared_ptr<DataType> &value_type)

Create a ListType instance from its child DataType.

std::shared_ptr<DataType> large_list(const std::shared_ptr<Field> &value_type)

Create a LargeListType instance from its child Field type.

std::shared_ptr<DataType> large_list(const std::shared_ptr<DataType> &value_type)

Create a LargeListType instance from its child DataType.

std::shared_ptr<DataType> map(std::shared_ptr<DataType> key_type, std::shared_ptr<DataType> item_type, bool keys_sorted = false)

Create a MapType instance from its key and value DataTypes.

std::shared_ptr<DataType> map(std::shared_ptr<DataType> key_type, std::shared_ptr<Field> item_field, bool keys_sorted = false)

Create a MapType instance from its key DataType and value field.

The field override is provided to communicate nullability of the value.

std::shared_ptr<DataType> fixed_size_list(const std::shared_ptr<Field> &value_type, int32_t list_size)

Create a FixedSizeListType instance from its child Field type.

std::shared_ptr<DataType> fixed_size_list(const std::shared_ptr<DataType> &value_type, int32_t list_size)

Create a FixedSizeListType instance from its child DataType.

std::shared_ptr<DataType> duration(TimeUnit::type unit)

Return a Duration instance (naming use _type to avoid namespace conflict with built in time clases).

std::shared_ptr<DataType> day_time_interval()

Return a DayTimeIntervalType instance.

std::shared_ptr<DataType> month_interval()

Return a MonthIntervalType instance.

std::shared_ptr<DataType> timestamp(TimeUnit::type unit)

Create a TimestampType instance from its unit.

std::shared_ptr<DataType> timestamp(TimeUnit::type unit, const std::string &timezone)

Create a TimestampType instance from its unit and timezone.

std::shared_ptr<DataType> time32(TimeUnit::type unit)

Create a 32-bit time type instance.

Unit can be either SECOND or MILLI

std::shared_ptr<DataType> time64(TimeUnit::type unit)

Create a 64-bit time type instance.

Unit can be either MICRO or NANO

std::shared_ptr<DataType> struct_(const std::vector<std::shared_ptr<Field>> &fields)

Create a StructType instance.

std::shared_ptr<DataType> sparse_union(FieldVector child_fields, std::vector<int8_t> type_codes = {})

Create a SparseUnionType instance.

std::shared_ptr<DataType> dense_union(FieldVector child_fields, std::vector<int8_t> type_codes = {})

Create a DenseUnionType instance.

std::shared_ptr<DataType> sparse_union(const ArrayVector &children, std::vector<std::string> field_names = {}, std::vector<int8_t> type_codes = {})

Create a SparseUnionType instance.

std::shared_ptr<DataType> dense_union(const ArrayVector &children, std::vector<std::string> field_names = {}, std::vector<int8_t> type_codes = {})

Create a DenseUnionType instance.

std::shared_ptr<DataType> union_(const std::vector<std::shared_ptr<Field>> &child_fields, const std::vector<int8_t> &type_codes, UnionMode::type mode = UnionMode::SPARSE)

Create a UnionType instance.

std::shared_ptr<DataType> union_(const std::vector<std::shared_ptr<Field>> &child_fields, UnionMode::type mode = UnionMode::SPARSE)

Create a UnionType instance.

std::shared_ptr<DataType> union_(const std::vector<std::shared_ptr<Array>> &children, const std::vector<std::string> &field_names, const std::vector<int8_t> &type_codes, UnionMode::type mode = UnionMode::SPARSE)

Create a UnionType instance.

std::shared_ptr<DataType> union_(const std::vector<std::shared_ptr<Array>> &children, const std::vector<std::string> &field_names, UnionMode::type mode = UnionMode::SPARSE)

Create a UnionType instance.

std::shared_ptr<DataType> union_(const std::vector<std::shared_ptr<Array>> &children, UnionMode::type mode = UnionMode::SPARSE)

Create a UnionType instance.

std::shared_ptr<DataType> dictionary(const std::shared_ptr<DataType> &index_type, const std::shared_ptr<DataType> &dict_type, bool ordered = false)

Create a DictionaryType instance.

Parameters
  • [in] index_type: the type of the dictionary indices (must be a signed integer)

  • [in] dict_type: the type of the values in the variable dictionary

  • [in] ordered: true if the order of the dictionary values has semantic meaning and should be preserved where possible

Concrete type subclasses

Primitive

class arrow::NullType : public arrow::DataType

Concrete type class for always-null data.

Public Functions

std::string ToString() const override

A string representation of the type, including any children.

DataTypeLayout layout() const override

Return the data type layout.

Children are not included.

Note

Experimental API

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

class arrow::BooleanType : public arrow::detail::CTypeImpl<BooleanType, PrimitiveCType, Type::BOOL, bool>

Concrete type class for boolean data.

Public Functions

DataTypeLayout layout() const override

Return the data type layout.

Children are not included.

Note

Experimental API

class Int8Type : public arrow::detail::IntegerTypeImpl<Int8Type, Type::INT8, int8_t>

Concrete type class for signed 8-bit integer data.

class Int16Type : public arrow::detail::IntegerTypeImpl<Int16Type, Type::INT16, int16_t>

Concrete type class for signed 16-bit integer data.

class Int32Type : public arrow::detail::IntegerTypeImpl<Int32Type, Type::INT32, int32_t>

Concrete type class for signed 32-bit integer data.

class Int64Type : public arrow::detail::IntegerTypeImpl<Int64Type, Type::INT64, int64_t>

Concrete type class for signed 64-bit integer data.

class UInt8Type : public arrow::detail::IntegerTypeImpl<UInt8Type, Type::UINT8, uint8_t>

Concrete type class for unsigned 8-bit integer data.

class UInt16Type : public arrow::detail::IntegerTypeImpl<UInt16Type, Type::UINT16, uint16_t>

Concrete type class for unsigned 16-bit integer data.

class UInt32Type : public arrow::detail::IntegerTypeImpl<UInt32Type, Type::UINT32, uint32_t>

Concrete type class for unsigned 32-bit integer data.

class UInt64Type : public arrow::detail::IntegerTypeImpl<UInt64Type, Type::UINT64, uint64_t>

Concrete type class for unsigned 64-bit integer data.

class HalfFloatType : public arrow::detail::CTypeImpl<HalfFloatType, FloatingPointType, Type::HALF_FLOAT, uint16_t>

Concrete type class for 16-bit floating-point data.

class FloatType : public arrow::detail::CTypeImpl<FloatType, FloatingPointType, Type::FLOAT, float>

Concrete type class for 32-bit floating-point data (C “float”)

class DoubleType : public arrow::detail::CTypeImpl<DoubleType, FloatingPointType, Type::DOUBLE, double>

Concrete type class for 64-bit floating-point data (C “double”)

Binary-like

class arrow::BinaryType : public arrow::BaseBinaryType

Concrete type class for variable-size binary data.

Subclassed by arrow::StringType

Public Functions

DataTypeLayout layout() const override

Return the data type layout.

Children are not included.

Note

Experimental API

std::string ToString() const override

A string representation of the type, including any children.

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

class arrow::StringType : public arrow::BinaryType

Concrete type class for variable-size string data, utf8-encoded.

Public Functions

std::string ToString() const override

A string representation of the type, including any children.

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

class arrow::FixedSizeBinaryType : public arrow::FixedWidthType, public arrow::ParametricType

Concrete type class for fixed-size binary data.

Subclassed by arrow::DecimalType

Public Functions

std::string ToString() const override

A string representation of the type, including any children.

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

DataTypeLayout layout() const override

Return the data type layout.

Children are not included.

Note

Experimental API

class arrow::Decimal128Type : public arrow::DecimalType

Concrete type class for 128-bit decimal data.

Public Functions

Decimal128Type(int32_t precision, int32_t scale)

Decimal128Type constructor that aborts on invalid input.

std::string ToString() const override

A string representation of the type, including any children.

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

Public Static Functions

Result<std::shared_ptr<DataType>> Make(int32_t precision, int32_t scale)

Decimal128Type constructor that returns an error on invalid input.

Nested

class arrow::ListType : public arrow::BaseListType

Concrete type class for list data.

List data is nested data where each value is a variable number of child items. Lists can be recursively nested, for example list(list(int32)).

Subclassed by arrow::MapType

Public Functions

DataTypeLayout layout() const override

Return the data type layout.

Children are not included.

Note

Experimental API

std::string ToString() const override

A string representation of the type, including any children.

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

class arrow::MapType : public arrow::ListType

Concrete type class for map data.

Map data is nested data where each value is a variable number of key-item pairs. Maps can be recursively nested, for example map(utf8, map(utf8, int32)).

Public Functions

std::string ToString() const override

A string representation of the type, including any children.

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

class arrow::StructType : public arrow::NestedType

Concrete type class for struct data.

Public Functions

DataTypeLayout layout() const override

Return the data type layout.

Children are not included.

Note

Experimental API

std::string ToString() const override

A string representation of the type, including any children.

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

std::shared_ptr<Field> GetFieldByName(const std::string &name) const

Returns null if name not found.

std::vector<std::shared_ptr<Field>> GetAllFieldsByName(const std::string &name) const

Return all fields having this name.

int GetFieldIndex(const std::string &name) const

Returns -1 if name not found or if there are multiple fields having the same name.

std::vector<int> GetAllFieldIndices(const std::string &name) const

Return the indices of all fields having this name in sorted order.

class arrow::UnionType : public arrow::NestedType

Concrete type class for union data.

Subclassed by arrow::DenseUnionType, arrow::SparseUnionType

Public Functions

DataTypeLayout layout() const override

Return the data type layout.

Children are not included.

Note

Experimental API

std::string ToString() const override

A string representation of the type, including any children.

const std::vector<int8_t> &type_codes() const

The array of logical type ids.

For example, the first type in the union might be denoted by the id 5 (instead of 0).

const std::vector<int> &child_ids() const

An array mapping logical type ids to physical child ids.

Dictionary-encoded

class arrow::DictionaryType : public arrow::FixedWidthType

Dictionary-encoded value type with data-dependent dictionary.

Indices are represented by any integer types.

Public Functions

std::string ToString() const override

A string representation of the type, including any children.

std::string name() const override

A string name of the type, omitting any child fields.

Note

Experimental API

Since

0.7.0

DataTypeLayout layout() const override

Return the data type layout.

Children are not included.

Note

Experimental API

Fields and Schemas

std::shared_ptr<Field> field(std::string name, std::shared_ptr<DataType> type, bool nullable = true, std::shared_ptr<const KeyValueMetadata> metadata = NULLPTR)

Create a Field instance.

Parameters
  • name: the field name

  • type: the field value type

  • nullable: whether the values are nullable, default true

  • metadata: any custom key-value metadata, default null

std::shared_ptr<Schema> schema(std::vector<std::shared_ptr<Field>> fields, std::shared_ptr<const KeyValueMetadata> metadata = NULLPTR)

Create a Schema instance.

Return

schema shared_ptr to Schema

Parameters
  • fields: the schema’s fields

  • metadata: any custom key-value metadata, default null

class arrow::Field : public arrow::detail::Fingerprintable

The combination of a field name and data type, with optional metadata.

Fields are used to describe the individual constituents of a nested DataType or a Schema.

A field’s metadata is represented by a KeyValueMetadata instance, which holds arbitrary key-value pairs.

Public Functions

std::shared_ptr<const KeyValueMetadata> metadata() const

Return the field’s attached metadata.

bool HasMetadata() const

Return whether the field has non-empty metadata.

std::shared_ptr<Field> WithMetadata(const std::shared_ptr<const KeyValueMetadata> &metadata) const

Return a copy of this field with the given metadata attached to it.

std::shared_ptr<Field> WithMergedMetadata(const std::shared_ptr<const KeyValueMetadata> &metadata) const

EXPERIMENTAL: Return a copy of this field with the given metadata merged with existing metadata (any colliding keys will be overridden by the passed metadata)

std::shared_ptr<Field> RemoveMetadata() const

Return a copy of this field without any metadata attached to it.

std::shared_ptr<Field> WithType(const std::shared_ptr<DataType> &type) const

Return a copy of this field with the replaced type.

std::shared_ptr<Field> WithName(const std::string &name) const

Return a copy of this field with the replaced name.

std::shared_ptr<Field> WithNullable(bool nullable) const

Return a copy of this field with the replaced nullability.

Result<std::shared_ptr<Field>> MergeWith(const Field &other, MergeOptions options = MergeOptions::Defaults()) const

Merge the current field with a field of the same name.

The two fields must be compatible, i.e:

  • have the same name

  • have the same type, or of compatible types according to options.

The metadata of the current field is preserved; the metadata of the other field is discarded.

bool Equals(const Field &other, bool check_metadata = false) const

Indicate if fields are equals.

Return

true if fields are equal, false otherwise.

Parameters
  • [in] other: field to check equality with.

  • [in] check_metadata: controls if it should check for metadata equality.

bool IsCompatibleWith(const Field &other) const

Indicate if fields are compatibles.

See the criteria of MergeWith.

Return

true if fields are compatible, false otherwise.

std::string ToString(bool show_metadata = false) const

Return a string representation ot the field.

Parameters
  • [in] show_metadata: when true, if KeyValueMetadata is non-empty, print keys and values in the output

const std::string &name() const

Return the field name.

const std::shared_ptr<DataType> &type() const

Return the field data type.

bool nullable() const

Return whether the field is nullable.

struct MergeOptions

Options that control the behavior of MergeWith.

Options are to be added to allow type conversions, including integer widening, promotion from integer to float, or conversion to or from boolean.

Public Members

bool promote_nullability = true

If true, a Field of NullType can be unified with a Field of another type.

The unified field will be of the other type and become nullable. Nullability will be promoted to the looser option (nullable if one is not nullable).

class arrow::Schema : public arrow::detail::Fingerprintable, public arrow::util::EqualityComparable<Schema>, public arrow::util::ToStringOstreamable<Schema>

Sequence of arrow::Field objects describing the columns of a record batch or table data structure.

Public Functions

bool Equals(const Schema &other, bool check_metadata = false) const

Returns true if all of the schema fields are equal.

int num_fields() const

Return the number of fields (columns) in the schema.

const std::shared_ptr<Field> &field(int i) const

Return the ith schema element. Does not boundscheck.

std::shared_ptr<Field> GetFieldByName(const std::string &name) const

Returns null if name not found.

std::vector<std::shared_ptr<Field>> GetAllFieldsByName(const std::string &name) const

Return the indices of all fields having this name in sorted order.

int GetFieldIndex(const std::string &name) const

Returns -1 if name not found.

std::vector<int> GetAllFieldIndices(const std::string &name) const

Return the indices of all fields having this name.

Status CanReferenceFieldsByNames(const std::vector<std::string> &names) const

Indicate if fields named names can be found unambiguously in the schema.

std::shared_ptr<const KeyValueMetadata> metadata() const

The custom key-value metadata, if any.

Return

metadata may be null

std::string ToString(bool show_metadata = false) const

Render a string representation of the schema suitable for debugging.

Parameters
  • [in] show_metadata: when true, if KeyValueMetadata is non-empty, print keys and values in the output

std::shared_ptr<Schema> WithMetadata(const std::shared_ptr<const KeyValueMetadata> &metadata) const

Replace key-value metadata with new metadata.

Return

new Schema

Parameters
  • [in] metadata: new KeyValueMetadata

std::shared_ptr<Schema> RemoveMetadata() const

Return copy of Schema without the KeyValueMetadata.

bool HasMetadata() const

Indicate that the Schema has non-empty KevValueMetadata.

bool HasDistinctFieldNames() const

Indicate that the Schema has distinct field names.