Data Types¶
-
enum
arrow::Type
::
type
¶ Main data type enumeration.
This enumeration provides a quick way to interrogate the category of a DataType instance.
Values:
-
enumerator
NA
¶ A NULL type having no physical storage.
-
enumerator
BOOL
¶ Boolean as 1 bit, LSB bit-packed ordering.
-
enumerator
UINT8
¶ Unsigned 8-bit little-endian integer.
-
enumerator
INT8
¶ Signed 8-bit little-endian integer.
-
enumerator
UINT16
¶ Unsigned 16-bit little-endian integer.
-
enumerator
INT16
¶ Signed 16-bit little-endian integer.
-
enumerator
UINT32
¶ Unsigned 32-bit little-endian integer.
-
enumerator
INT32
¶ Signed 32-bit little-endian integer.
-
enumerator
UINT64
¶ Unsigned 64-bit little-endian integer.
-
enumerator
INT64
¶ Signed 64-bit little-endian integer.
-
enumerator
HALF_FLOAT
¶ 2-byte floating point value
-
enumerator
FLOAT
¶ 4-byte floating point value
-
enumerator
DOUBLE
¶ 8-byte floating point value
-
enumerator
STRING
¶ UTF8 variable-length string as List<Char>
-
enumerator
BINARY
¶ Variable-length bytes (no guarantee of UTF8-ness)
-
enumerator
FIXED_SIZE_BINARY
¶ Fixed-size binary. Each value occupies the same number of bytes.
-
enumerator
DATE32
¶ int32_t days since the UNIX epoch
-
enumerator
DATE64
¶ int64_t milliseconds since the UNIX epoch
-
enumerator
TIMESTAMP
¶ Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond.
-
enumerator
TIME32
¶ Time as signed 32-bit integer, representing either seconds or milliseconds since midnight.
-
enumerator
TIME64
¶ Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight.
-
enumerator
INTERVAL_MONTHS
¶ YEAR_MONTH interval in SQL style.
-
enumerator
INTERVAL_DAY_TIME
¶ DAY_TIME interval in SQL style.
-
enumerator
DECIMAL
¶ Precision- and scale-based decimal type.
Storage type depends on the parameters.
-
enumerator
LIST
¶ A list of some logical data type.
-
enumerator
STRUCT
¶ Struct of logical types.
-
enumerator
SPARSE_UNION
¶ Sparse unions of logical types.
-
enumerator
DENSE_UNION
¶ Dense unions of logical types.
-
enumerator
DICTIONARY
¶ Dictionary-encoded type, also called “categorical” or “factor” in other programming languages.
Holds the dictionary value type but not the dictionary itself, which is part of the ArrayData struct
-
enumerator
MAP
¶ Map, a repeated struct logical type.
-
enumerator
EXTENSION
¶ Custom data type, implemented by user.
-
enumerator
FIXED_SIZE_LIST
¶ Fixed size list of some logical type.
-
enumerator
DURATION
¶ Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds.
-
enumerator
LARGE_STRING
¶ Like STRING, but with 64-bit offsets.
-
enumerator
LARGE_BINARY
¶ Like BINARY, but with 64-bit offsets.
-
enumerator
LARGE_LIST
¶ Like LIST, but with 64-bit offsets.
-
enumerator
MAX_ID
¶
-
enumerator
-
class
arrow
::
DataType
: public arrow::detail::Fingerprintable¶ Base class for all data types.
Data types in this library are all logical. They can be expressed as either a primitive physical type (bytes or bits of some fixed size), a nested type consisting of other data types, or another data type (e.g. a timestamp encoded as an int64).
Simple datatypes may be entirely described by their Type::type id, but complex datatypes are usually parametric.
Subclassed by arrow::BaseBinaryType, arrow::ExtensionType, arrow::FixedWidthType, arrow::NestedType, arrow::NullType
Public Functions
-
bool
Equals
(const DataType &other, bool check_metadata = false) const¶ Return whether the types are equal.
Types that are logically convertible from one to another (e.g. List<UInt8> and Binary) are NOT equal.
Return whether the types are equal.
-
const std::vector<std::shared_ptr<Field>> &
fields
() const¶ Returns the children fields associated with this type.
-
int
num_fields
() const¶ Returns the number of children fields associated with this type.
-
std::string
ToString
() const = 0¶ A string representation of the type, including any children.
-
size_t
Hash
() const¶ Return hash value (excluding metadata in child fields)
-
std::string
name
() const = 0¶ A string name of the type, omitting any child fields.
- Note
Experimental API
- Since
0.7.0
-
DataTypeLayout
layout
() const = 0¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
bool
Factory functions¶
These functions are recommended for creating data types. They may return new objects or existing singletons, depending on the type requested.
-
std::shared_ptr<DataType>
boolean
()¶ Return a BooleanType instance.
-
std::shared_ptr<DataType>
uint16
()¶ Return a UInt16Type instance.
-
std::shared_ptr<DataType>
uint32
()¶ Return a UInt32Type instance.
-
std::shared_ptr<DataType>
uint64
()¶ Return a UInt64Type instance.
-
std::shared_ptr<DataType>
float16
()¶ Return a HalfFloatType instance.
-
std::shared_ptr<DataType>
float64
()¶ Return a DoubleType instance.
-
std::shared_ptr<DataType>
utf8
()¶ Return a StringType instance.
-
std::shared_ptr<DataType>
large_utf8
()¶ Return a LargeStringType instance.
-
std::shared_ptr<DataType>
binary
()¶ Return a BinaryType instance.
-
std::shared_ptr<DataType>
large_binary
()¶ Return a LargeBinaryType instance.
-
std::shared_ptr<DataType>
date32
()¶ Return a Date32Type instance.
-
std::shared_ptr<DataType>
date64
()¶ Return a Date64Type instance.
-
std::shared_ptr<DataType>
fixed_size_binary
(int32_t byte_width)¶ Create a FixedSizeBinaryType instance.
-
std::shared_ptr<DataType>
decimal
(int32_t precision, int32_t scale)¶ Create a Decimal128Type instance.
Create a LargeListType instance from its child Field type.
Create a LargeListType instance from its child DataType.
Create a MapType instance from its key and value DataTypes.
Create a MapType instance from its key DataType and value field.
The field override is provided to communicate nullability of the value.
Create a FixedSizeListType instance from its child Field type.
Create a FixedSizeListType instance from its child DataType.
-
std::shared_ptr<DataType>
duration
(TimeUnit::type unit)¶ Return a Duration instance (naming use _type to avoid namespace conflict with built in time clases).
-
std::shared_ptr<DataType>
day_time_interval
()¶ Return a DayTimeIntervalType instance.
-
std::shared_ptr<DataType>
month_interval
()¶ Return a MonthIntervalType instance.
-
std::shared_ptr<DataType>
timestamp
(TimeUnit::type unit)¶ Create a TimestampType instance from its unit.
-
std::shared_ptr<DataType>
timestamp
(TimeUnit::type unit, const std::string &timezone)¶ Create a TimestampType instance from its unit and timezone.
-
std::shared_ptr<DataType>
time32
(TimeUnit::type unit)¶ Create a 32-bit time type instance.
Unit can be either SECOND or MILLI
-
std::shared_ptr<DataType>
time64
(TimeUnit::type unit)¶ Create a 64-bit time type instance.
Unit can be either MICRO or NANO
Create a StructType instance.
-
std::shared_ptr<DataType>
sparse_union
(FieldVector child_fields, std::vector<int8_t> type_codes = {})¶ Create a SparseUnionType instance.
-
std::shared_ptr<DataType>
dense_union
(FieldVector child_fields, std::vector<int8_t> type_codes = {})¶ Create a DenseUnionType instance.
-
std::shared_ptr<DataType>
sparse_union
(const ArrayVector &children, std::vector<std::string> field_names = {}, std::vector<int8_t> type_codes = {})¶ Create a SparseUnionType instance.
-
std::shared_ptr<DataType>
dense_union
(const ArrayVector &children, std::vector<std::string> field_names = {}, std::vector<int8_t> type_codes = {})¶ Create a DenseUnionType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a DictionaryType instance.
- Parameters
[in] index_type
: the type of the dictionary indices (must be a signed integer)[in] dict_type
: the type of the values in the variable dictionary[in] ordered
: true if the order of the dictionary values has semantic meaning and should be preserved where possible
Concrete type subclasses¶
Primitive¶
-
class
arrow
::
NullType
: public arrow::DataType¶ Concrete type class for always-null data.
Public Functions
-
std::string
ToString
() const override¶ A string representation of the type, including any children.
-
DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Note
Experimental API
- Since
0.7.0
-
std::string
-
class
arrow
::
BooleanType
: public arrow::detail::CTypeImpl<BooleanType, PrimitiveCType, Type::BOOL, bool>¶ Concrete type class for boolean data.
Public Functions
-
DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
DataTypeLayout
-
class
Int8Type
: public arrow::detail::IntegerTypeImpl<Int8Type, Type::INT8, int8_t>¶ Concrete type class for signed 8-bit integer data.
-
class
Int16Type
: public arrow::detail::IntegerTypeImpl<Int16Type, Type::INT16, int16_t>¶ Concrete type class for signed 16-bit integer data.
-
class
Int32Type
: public arrow::detail::IntegerTypeImpl<Int32Type, Type::INT32, int32_t>¶ Concrete type class for signed 32-bit integer data.
-
class
Int64Type
: public arrow::detail::IntegerTypeImpl<Int64Type, Type::INT64, int64_t>¶ Concrete type class for signed 64-bit integer data.
-
class
UInt8Type
: public arrow::detail::IntegerTypeImpl<UInt8Type, Type::UINT8, uint8_t>¶ Concrete type class for unsigned 8-bit integer data.
-
class
UInt16Type
: public arrow::detail::IntegerTypeImpl<UInt16Type, Type::UINT16, uint16_t>¶ Concrete type class for unsigned 16-bit integer data.
-
class
UInt32Type
: public arrow::detail::IntegerTypeImpl<UInt32Type, Type::UINT32, uint32_t>¶ Concrete type class for unsigned 32-bit integer data.
-
class
UInt64Type
: public arrow::detail::IntegerTypeImpl<UInt64Type, Type::UINT64, uint64_t>¶ Concrete type class for unsigned 64-bit integer data.
-
class
HalfFloatType
: public arrow::detail::CTypeImpl<HalfFloatType, FloatingPointType, Type::HALF_FLOAT, uint16_t>¶ Concrete type class for 16-bit floating-point data.
-
class
FloatType
: public arrow::detail::CTypeImpl<FloatType, FloatingPointType, Type::FLOAT, float>¶ Concrete type class for 32-bit floating-point data (C “float”)
-
class
DoubleType
: public arrow::detail::CTypeImpl<DoubleType, FloatingPointType, Type::DOUBLE, double>¶ Concrete type class for 64-bit floating-point data (C “double”)
Binary-like¶
-
class
arrow
::
BinaryType
: public arrow::BaseBinaryType¶ Concrete type class for variable-size binary data.
Subclassed by arrow::StringType
Public Functions
-
DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
std::string
ToString
() const override¶ A string representation of the type, including any children.
-
std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Note
Experimental API
- Since
0.7.0
-
DataTypeLayout
-
class
arrow
::
StringType
: public arrow::BinaryType¶ Concrete type class for variable-size string data, utf8-encoded.
-
class
arrow
::
FixedSizeBinaryType
: public arrow::FixedWidthType, public arrow::ParametricType¶ Concrete type class for fixed-size binary data.
Subclassed by arrow::DecimalType
Public Functions
-
std::string
ToString
() const override¶ A string representation of the type, including any children.
-
std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Note
Experimental API
- Since
0.7.0
-
DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
std::string
-
class
arrow
::
Decimal128Type
: public arrow::DecimalType¶ Concrete type class for 128-bit decimal data.
Public Functions
-
Decimal128Type
(int32_t precision, int32_t scale)¶ Decimal128Type constructor that aborts on invalid input.
-
std::string
ToString
() const override¶ A string representation of the type, including any children.
-
std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Note
Experimental API
- Since
0.7.0
Public Static Functions
-
Result<std::shared_ptr<DataType>>
Make
(int32_t precision, int32_t scale)¶ Decimal128Type constructor that returns an error on invalid input.
-
Nested¶
-
class
arrow
::
ListType
: public arrow::BaseListType¶ Concrete type class for list data.
List data is nested data where each value is a variable number of child items. Lists can be recursively nested, for example list(list(int32)).
Subclassed by arrow::MapType
Public Functions
-
DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
std::string
ToString
() const override¶ A string representation of the type, including any children.
-
std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Note
Experimental API
- Since
0.7.0
-
DataTypeLayout
-
class
arrow
::
MapType
: public arrow::ListType¶ Concrete type class for map data.
Map data is nested data where each value is a variable number of key-item pairs. Maps can be recursively nested, for example map(utf8, map(utf8, int32)).
-
class
arrow
::
StructType
: public arrow::NestedType¶ Concrete type class for struct data.
Public Functions
-
DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
std::string
ToString
() const override¶ A string representation of the type, including any children.
-
std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Note
Experimental API
- Since
0.7.0
-
std::shared_ptr<Field>
GetFieldByName
(const std::string &name) const¶ Returns null if name not found.
-
std::vector<std::shared_ptr<Field>>
GetAllFieldsByName
(const std::string &name) const¶ Return all fields having this name.
-
int
GetFieldIndex
(const std::string &name) const¶ Returns -1 if name not found or if there are multiple fields having the same name.
-
std::vector<int>
GetAllFieldIndices
(const std::string &name) const¶ Return the indices of all fields having this name in sorted order.
-
DataTypeLayout
-
class
arrow
::
UnionType
: public arrow::NestedType¶ Concrete type class for union data.
Subclassed by arrow::DenseUnionType, arrow::SparseUnionType
Public Functions
-
DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
std::string
ToString
() const override¶ A string representation of the type, including any children.
-
const std::vector<int8_t> &
type_codes
() const¶ The array of logical type ids.
For example, the first type in the union might be denoted by the id 5 (instead of 0).
-
const std::vector<int> &
child_ids
() const¶ An array mapping logical type ids to physical child ids.
-
DataTypeLayout
Dictionary-encoded¶
-
class
arrow
::
DictionaryType
: public arrow::FixedWidthType¶ Dictionary-encoded value type with data-dependent dictionary.
Indices are represented by any integer types.
Public Functions
-
std::string
ToString
() const override¶ A string representation of the type, including any children.
-
std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Note
Experimental API
- Since
0.7.0
-
DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
- Note
Experimental API
-
std::string
Fields and Schemas¶
Create a Field instance.
- Parameters
name
: the field nametype
: the field value typenullable
: whether the values are nullable, default truemetadata
: any custom key-value metadata, default null
Create a Schema instance.
- Return
schema shared_ptr to Schema
- Parameters
fields
: the schema’s fieldsmetadata
: any custom key-value metadata, default null
-
class
arrow
::
Field
: public arrow::detail::Fingerprintable¶ The combination of a field name and data type, with optional metadata.
Fields are used to describe the individual constituents of a nested DataType or a Schema.
A field’s metadata is represented by a KeyValueMetadata instance, which holds arbitrary key-value pairs.
Public Functions
-
std::shared_ptr<const KeyValueMetadata>
metadata
() const¶ Return the field’s attached metadata.
-
bool
HasMetadata
() const¶ Return whether the field has non-empty metadata.
Return a copy of this field with the given metadata attached to it.
EXPERIMENTAL: Return a copy of this field with the given metadata merged with existing metadata (any colliding keys will be overridden by the passed metadata)
-
std::shared_ptr<Field>
RemoveMetadata
() const¶ Return a copy of this field without any metadata attached to it.
Return a copy of this field with the replaced type.
-
std::shared_ptr<Field>
WithName
(const std::string &name) const¶ Return a copy of this field with the replaced name.
-
std::shared_ptr<Field>
WithNullable
(bool nullable) const¶ Return a copy of this field with the replaced nullability.
-
Result<std::shared_ptr<Field>>
MergeWith
(const Field &other, MergeOptions options = MergeOptions::Defaults()) const¶ Merge the current field with a field of the same name.
The two fields must be compatible, i.e:
have the same name
have the same type, or of compatible types according to
options
.
The metadata of the current field is preserved; the metadata of the other field is discarded.
-
bool
Equals
(const Field &other, bool check_metadata = false) const¶ Indicate if fields are equals.
- Return
true if fields are equal, false otherwise.
- Parameters
[in] other
: field to check equality with.[in] check_metadata
: controls if it should check for metadata equality.
-
bool
IsCompatibleWith
(const Field &other) const¶ Indicate if fields are compatibles.
See the criteria of MergeWith.
- Return
true if fields are compatible, false otherwise.
-
std::string
ToString
(bool show_metadata = false) const¶ Return a string representation ot the field.
- Parameters
[in] show_metadata
: when true, if KeyValueMetadata is non-empty, print keys and values in the output
-
const std::string &
name
() const¶ Return the field name.
-
bool
nullable
() const¶ Return whether the field is nullable.
-
struct
MergeOptions
¶ Options that control the behavior of
MergeWith
.Options are to be added to allow type conversions, including integer widening, promotion from integer to float, or conversion to or from boolean.
-
std::shared_ptr<const KeyValueMetadata>
-
class
arrow
::
Schema
: public arrow::detail::Fingerprintable, public arrow::util::EqualityComparable<Schema>, public arrow::util::ToStringOstreamable<Schema>¶ Sequence of arrow::Field objects describing the columns of a record batch or table data structure.
Public Functions
-
bool
Equals
(const Schema &other, bool check_metadata = false) const¶ Returns true if all of the schema fields are equal.
-
int
num_fields
() const¶ Return the number of fields (columns) in the schema.
-
const std::shared_ptr<Field> &
field
(int i) const¶ Return the ith schema element. Does not boundscheck.
-
std::shared_ptr<Field>
GetFieldByName
(const std::string &name) const¶ Returns null if name not found.
-
std::vector<std::shared_ptr<Field>>
GetAllFieldsByName
(const std::string &name) const¶ Return the indices of all fields having this name in sorted order.
-
int
GetFieldIndex
(const std::string &name) const¶ Returns -1 if name not found.
-
std::vector<int>
GetAllFieldIndices
(const std::string &name) const¶ Return the indices of all fields having this name.
-
Status
CanReferenceFieldsByNames
(const std::vector<std::string> &names) const¶ Indicate if fields named
names
can be found unambiguously in the schema.
-
std::shared_ptr<const KeyValueMetadata>
metadata
() const¶ The custom key-value metadata, if any.
- Return
metadata may be null
-
std::string
ToString
(bool show_metadata = false) const¶ Render a string representation of the schema suitable for debugging.
- Parameters
[in] show_metadata
: when true, if KeyValueMetadata is non-empty, print keys and values in the output
Replace key-value metadata with new metadata.
- Return
new Schema
- Parameters
[in] metadata
: new KeyValueMetadata
-
bool