Data Types¶
-
enum
arrow::Type
::
type
¶ Main data type enumeration.
This enumeration provides a quick way to interrogate the category of a DataType instance.
Values:
-
enumerator
NA
¶ A NULL type having no physical storage.
-
enumerator
BOOL
¶ Boolean as 1 bit, LSB bit-packed ordering.
-
enumerator
UINT8
¶ Unsigned 8-bit little-endian integer.
-
enumerator
INT8
¶ Signed 8-bit little-endian integer.
-
enumerator
UINT16
¶ Unsigned 16-bit little-endian integer.
-
enumerator
INT16
¶ Signed 16-bit little-endian integer.
-
enumerator
UINT32
¶ Unsigned 32-bit little-endian integer.
-
enumerator
INT32
¶ Signed 32-bit little-endian integer.
-
enumerator
UINT64
¶ Unsigned 64-bit little-endian integer.
-
enumerator
INT64
¶ Signed 64-bit little-endian integer.
-
enumerator
HALF_FLOAT
¶ 2-byte floating point value
-
enumerator
FLOAT
¶ 4-byte floating point value
-
enumerator
DOUBLE
¶ 8-byte floating point value
-
enumerator
STRING
¶ UTF8 variable-length string as List<Char>
-
enumerator
BINARY
¶ Variable-length bytes (no guarantee of UTF8-ness)
-
enumerator
FIXED_SIZE_BINARY
¶ Fixed-size binary. Each value occupies the same number of bytes.
-
enumerator
DATE32
¶ int32_t days since the UNIX epoch
-
enumerator
DATE64
¶ int64_t milliseconds since the UNIX epoch
-
enumerator
TIMESTAMP
¶ Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond.
-
enumerator
TIME32
¶ Time as signed 32-bit integer, representing either seconds or milliseconds since midnight.
-
enumerator
TIME64
¶ Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight.
-
enumerator
INTERVAL_MONTHS
¶ YEAR_MONTH interval in SQL style.
-
enumerator
INTERVAL_DAY_TIME
¶ DAY_TIME interval in SQL style.
-
enumerator
DECIMAL128
¶ Precision- and scale-based decimal type with 128 bits.
-
enumerator
DECIMAL
¶ Defined for backward-compatibility.
-
enumerator
DECIMAL256
¶ Precision- and scale-based decimal type with 256 bits.
-
enumerator
LIST
¶ A list of some logical data type.
-
enumerator
STRUCT
¶ Struct of logical types.
-
enumerator
SPARSE_UNION
¶ Sparse unions of logical types.
-
enumerator
DENSE_UNION
¶ Dense unions of logical types.
-
enumerator
DICTIONARY
¶ Dictionary-encoded type, also called “categorical” or “factor” in other programming languages.
Holds the dictionary value type but not the dictionary itself, which is part of the ArrayData struct
-
enumerator
MAP
¶ Map, a repeated struct logical type.
-
enumerator
EXTENSION
¶ Custom data type, implemented by user.
-
enumerator
FIXED_SIZE_LIST
¶ Fixed size list of some logical type.
-
enumerator
DURATION
¶ Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds.
-
enumerator
LARGE_STRING
¶ Like STRING, but with 64-bit offsets.
-
enumerator
LARGE_BINARY
¶ Like BINARY, but with 64-bit offsets.
-
enumerator
LARGE_LIST
¶ Like LIST, but with 64-bit offsets.
-
enumerator
INTERVAL_MONTH_DAY_NANO
¶ Calendar interval type with three fields.
-
enumerator
MAX_ID
¶
-
enumerator
-
class
arrow
::
DataType
: public arrow::detail::Fingerprintable¶ Base class for all data types.
Data types in this library are all logical. They can be expressed as either a primitive physical type (bytes or bits of some fixed size), a nested type consisting of other data types, or another data type (e.g. a timestamp encoded as an int64).
Simple datatypes may be entirely described by their Type::type id, but complex datatypes are usually parametric.
Subclassed by arrow::BaseBinaryType, arrow::ExtensionType, arrow::FixedWidthType, arrow::NestedType, arrow::NullType
Public Functions
-
bool
Equals
(const DataType &other, bool check_metadata = false) const¶ Return whether the types are equal.
Types that are logically convertible from one to another (e.g. List<UInt8> and Binary) are NOT equal.
Return whether the types are equal.
-
inline const std::vector<std::shared_ptr<Field>> &
fields
() const¶ Return the children fields associated with this type.
-
inline int
num_fields
() const¶ Return the number of children fields associated with this type.
-
virtual std::string
ToString
() const = 0¶ A string representation of the type, including any children.
-
size_t
Hash
() const¶ Return hash value (excluding metadata in child fields)
-
virtual std::string
name
() const = 0¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
virtual DataTypeLayout
layout
() const = 0¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
bool
Factory functions¶
These functions are recommended for creating data types. They may return new objects or existing singletons, depending on the type requested.
-
std::shared_ptr<DataType>
boolean
()¶ Return a BooleanType instance.
-
std::shared_ptr<DataType>
uint16
()¶ Return a UInt16Type instance.
-
std::shared_ptr<DataType>
uint32
()¶ Return a UInt32Type instance.
-
std::shared_ptr<DataType>
uint64
()¶ Return a UInt64Type instance.
-
std::shared_ptr<DataType>
float16
()¶ Return a HalfFloatType instance.
-
std::shared_ptr<DataType>
float64
()¶ Return a DoubleType instance.
-
std::shared_ptr<DataType>
utf8
()¶ Return a StringType instance.
-
std::shared_ptr<DataType>
large_utf8
()¶ Return a LargeStringType instance.
-
std::shared_ptr<DataType>
binary
()¶ Return a BinaryType instance.
-
std::shared_ptr<DataType>
large_binary
()¶ Return a LargeBinaryType instance.
-
std::shared_ptr<DataType>
date32
()¶ Return a Date32Type instance.
-
std::shared_ptr<DataType>
date64
()¶ Return a Date64Type instance.
-
std::shared_ptr<DataType>
fixed_size_binary
(int32_t byte_width)¶ Create a FixedSizeBinaryType instance.
-
std::shared_ptr<DataType>
decimal
(int32_t precision, int32_t scale)¶ Create a DecimalType instance depending on the precision.
If the precision is greater than 38, a Decimal256Type is returned, otherwise a Decimal128Type.
-
std::shared_ptr<DataType>
decimal128
(int32_t precision, int32_t scale)¶ Create a Decimal128Type instance.
-
std::shared_ptr<DataType>
decimal256
(int32_t precision, int32_t scale)¶ Create a Decimal256Type instance.
Create a LargeListType instance from its child Field type.
Create a LargeListType instance from its child DataType.
Create a MapType instance from its key and value DataTypes.
Create a MapType instance from its key DataType and value field.
The field override is provided to communicate nullability of the value.
Create a FixedSizeListType instance from its child Field type.
Create a FixedSizeListType instance from its child DataType.
-
std::shared_ptr<DataType>
duration
(TimeUnit::type unit)¶ Return a Duration instance (naming use _type to avoid namespace conflict with built in time classes).
-
std::shared_ptr<DataType>
day_time_interval
()¶ Return a DayTimeIntervalType instance.
-
std::shared_ptr<DataType>
month_interval
()¶ Return a MonthIntervalType instance.
-
std::shared_ptr<DataType>
month_day_nano_interval
()¶ Return a MonthDayNanoIntervalType instance.
-
std::shared_ptr<DataType>
timestamp
(TimeUnit::type unit)¶ Create a TimestampType instance from its unit.
-
std::shared_ptr<DataType>
timestamp
(TimeUnit::type unit, const std::string &timezone)¶ Create a TimestampType instance from its unit and timezone.
-
std::shared_ptr<DataType>
time32
(TimeUnit::type unit)¶ Create a 32-bit time type instance.
Unit can be either SECOND or MILLI
-
std::shared_ptr<DataType>
time64
(TimeUnit::type unit)¶ Create a 64-bit time type instance.
Unit can be either MICRO or NANO
Create a StructType instance.
-
std::shared_ptr<DataType>
sparse_union
(FieldVector child_fields, std::vector<int8_t> type_codes = {})¶ Create a SparseUnionType instance.
-
std::shared_ptr<DataType>
sparse_union
(const ArrayVector &children, std::vector<std::string> field_names = {}, std::vector<int8_t> type_codes = {})¶ Create a SparseUnionType instance.
-
std::shared_ptr<DataType>
dense_union
(FieldVector child_fields, std::vector<int8_t> type_codes = {})¶ Create a DenseUnionType instance.
-
std::shared_ptr<DataType>
dense_union
(const ArrayVector &children, std::vector<std::string> field_names = {}, std::vector<int8_t> type_codes = {})¶ Create a DenseUnionType instance.
Create a DictionaryType instance.
- Parameters
[in] index_type – the type of the dictionary indices (must be a signed integer)
[in] dict_type – the type of the values in the variable dictionary
[in] ordered – true if the order of the dictionary values has semantic meaning and should be preserved where possible
Concrete type subclasses¶
Primitive¶
-
class
arrow
::
NullType
: public arrow::DataType¶ Concrete type class for always-null data.
Public Functions
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
virtual std::string
-
class
arrow
::
BooleanType
: public arrow::detail::CTypeImpl<BooleanType, PrimitiveCType, Type::BOOL, bool>¶ Concrete type class for boolean data.
Public Functions
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
inline virtual DataTypeLayout
-
class
UInt8Type
: public arrow::detail::IntegerTypeImpl<UInt8Type, Type::UINT8, uint8_t>¶ - #include <arrow/type.h>
Concrete type class for unsigned 8-bit integer data.
-
class
Int8Type
: public arrow::detail::IntegerTypeImpl<Int8Type, Type::INT8, int8_t>¶ - #include <arrow/type.h>
Concrete type class for signed 8-bit integer data.
-
class
UInt16Type
: public arrow::detail::IntegerTypeImpl<UInt16Type, Type::UINT16, uint16_t>¶ - #include <arrow/type.h>
Concrete type class for unsigned 16-bit integer data.
-
class
Int16Type
: public arrow::detail::IntegerTypeImpl<Int16Type, Type::INT16, int16_t>¶ - #include <arrow/type.h>
Concrete type class for signed 16-bit integer data.
-
class
UInt32Type
: public arrow::detail::IntegerTypeImpl<UInt32Type, Type::UINT32, uint32_t>¶ - #include <arrow/type.h>
Concrete type class for unsigned 32-bit integer data.
-
class
Int32Type
: public arrow::detail::IntegerTypeImpl<Int32Type, Type::INT32, int32_t>¶ - #include <arrow/type.h>
Concrete type class for signed 32-bit integer data.
-
class
UInt64Type
: public arrow::detail::IntegerTypeImpl<UInt64Type, Type::UINT64, uint64_t>¶ - #include <arrow/type.h>
Concrete type class for unsigned 64-bit integer data.
-
class
Int64Type
: public arrow::detail::IntegerTypeImpl<Int64Type, Type::INT64, int64_t>¶ - #include <arrow/type.h>
Concrete type class for signed 64-bit integer data.
-
class
HalfFloatType
: public arrow::detail::CTypeImpl<HalfFloatType, FloatingPointType, Type::HALF_FLOAT, uint16_t>¶ - #include <arrow/type.h>
Concrete type class for 16-bit floating-point data.
-
class
FloatType
: public arrow::detail::CTypeImpl<FloatType, FloatingPointType, Type::FLOAT, float>¶ - #include <arrow/type.h>
Concrete type class for 32-bit floating-point data (C “float”)
-
class
DoubleType
: public arrow::detail::CTypeImpl<DoubleType, FloatingPointType, Type::DOUBLE, double>¶ - #include <arrow/type.h>
Concrete type class for 64-bit floating-point data (C “double”)
-
class
arrow
::
DecimalType
: public arrow::FixedSizeBinaryType¶ - #include <arrow/type.h>
Base type class for (fixed-size) decimal data.
Subclassed by arrow::Decimal128Type, arrow::Decimal256Type
-
class
arrow
::
Decimal128Type
: public arrow::DecimalType¶ - #include <arrow/type.h>
Concrete type class for 128-bit decimal data.
Arrow decimals are fixed-point decimal numbers encoded as a scaled integer. The precision is the number of significant digits that the decimal type can represent; the scale is the number of digits after the decimal point (note the scale can be negative).
As an example,
Decimal128Type(7, 3)
can exactly represent the numbers 1234.567 and -1234.567 (encoded internally as the 128-bit integers 1234567 and -1234567, respectively), but neither 12345.67 nor 123.4567.Decimal128Type has a maximum precision of 38 significant digits (also available as Decimal128Type::kMaxPrecision). If higher precision is needed, consider using Decimal256Type.
Public Functions
-
explicit
Decimal128Type
(int32_t precision, int32_t scale)¶ Decimal128Type constructor that aborts on invalid input.
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
Public Static Functions
-
static Result<std::shared_ptr<DataType>>
Make
(int32_t precision, int32_t scale)¶ Decimal128Type constructor that returns an error on invalid input.
-
explicit
-
class
arrow
::
Decimal256Type
: public arrow::DecimalType¶ - #include <arrow/type.h>
Concrete type class for 256-bit decimal data.
Arrow decimals are fixed-point decimal numbers encoded as a scaled integer. The precision is the number of significant digits that the decimal type can represent; the scale is the number of digits after the decimal point (note the scale can be negative).
Decimal256Type has a maximum precision of 76 significant digits. (also available as Decimal256Type::kMaxPrecision).
For most use cases, the maximum precision offered by Decimal128Type is sufficient, and it will result in a more compact and more efficient encoding.
Public Functions
-
explicit
Decimal256Type
(int32_t precision, int32_t scale)¶ Decimal256Type constructor that aborts on invalid input.
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
Public Static Functions
-
static Result<std::shared_ptr<DataType>>
Make
(int32_t precision, int32_t scale)¶ Decimal256Type constructor that returns an error on invalid input.
-
explicit
Temporal¶
-
enum
arrow::TimeUnit
::
type
¶ The unit for a time or timestamp DataType.
Values:
-
enumerator
SECOND
¶
-
enumerator
MILLI
¶
-
enumerator
MICRO
¶
-
enumerator
NANO
¶
-
enumerator
-
std::ostream &
operator<<
(std::ostream &os, TimeUnit::type unit)¶
-
std::ostream &
operator<<
(std::ostream &os, DayTimeIntervalType::DayMilliseconds interval)¶
-
std::ostream &
operator<<
(std::ostream &os, MonthDayNanoIntervalType::MonthDayNanos interval)¶
-
class
arrow
::
TemporalType
: public arrow::FixedWidthType¶ - #include <arrow/type.h>
Base type for all date and time types.
Subclassed by arrow::DateType, arrow::DurationType, arrow::IntervalType, arrow::TimestampType, arrow::TimeType
Public Functions
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
inline virtual DataTypeLayout
-
class
DateType
: public arrow::TemporalType¶ - #include <arrow/type.h>
Base type class for date data.
Subclassed by arrow::Date32Type, arrow::Date64Type
-
class
arrow
::
Date32Type
: public arrow::DateType¶ - #include <arrow/type.h>
Concrete type class for 32-bit date data (as number of days since UNIX epoch)
-
class
arrow
::
Date64Type
: public arrow::DateType¶ - #include <arrow/type.h>
Concrete type class for 64-bit date data (as number of milliseconds since UNIX epoch)
-
class
TimeType
: public arrow::TemporalType, public arrow::ParametricType¶ - #include <arrow/type.h>
Base type class for time data.
Subclassed by arrow::Time32Type, arrow::Time64Type
-
class
arrow
::
Time32Type
: public arrow::TimeType¶ - #include <arrow/type.h>
Concrete type class for 32-bit time data (as number of seconds or milliseconds since midnight)
-
class
arrow
::
Time64Type
: public arrow::TimeType¶ - #include <arrow/type.h>
Concrete type class for 64-bit time data (as number of microseconds or nanoseconds since midnight)
-
class
arrow
::
TimestampType
: public arrow::TemporalType, public arrow::ParametricType¶ - #include <arrow/type.h>
Concrete type class for datetime data (as number of seconds, milliseconds, microseconds or nanoseconds since UNIX epoch)
If supplied, the timezone string should take either the form (i) “Area/Location”, with values drawn from the names in the IANA Time Zone Database (such as “Europe/Zurich”); or (ii) “(+|-)HH:MM” indicating an absolute offset from GMT (such as “-08:00”). To indicate a native UTC timestamp, one of the strings “UTC”, “Etc/UTC” or “+00:00” should be used.
If any non-empty string is supplied as the timezone for a TimestampType, then the Arrow field containing that timestamp type (and by extension the column associated with such a field) is considered “timezone-aware”. The integer arrays that comprise a timezone-aware column must contain UTC normalized datetime values, regardless of the contents of their timezone string. More precisely, (i) the producer of a timezone-aware column must populate its constituent arrays with valid UTC values (performing offset conversions from non-UTC values if necessary); and (ii) the consumer of a timezone-aware column may assume that the column’s values are directly comparable (that is, with no offset adjustment required) to the values of any other timezone-aware column or to any other valid UTC datetime value (provided all values are expressed in the same units).
If a TimestampType is constructed without a timezone (or, equivalently, if the timezone supplied is an empty string) then the resulting Arrow field (column) is considered “timezone-naive”. The producer of a timezone-naive column may populate its constituent integer arrays with datetime values from any timezone; the consumer of a timezone-naive column should make no assumptions about the interoperability or comparability of the values of such a column with those of any other timestamp column or datetime value.
If a timezone-aware field contains a recognized timezone, its values may be localized to that locale upon display; the values of timezone-naive fields must always be displayed “as is”, with no localization performed on them.
-
class
IntervalType
: public arrow::TemporalType, public arrow::ParametricType¶ - #include <arrow/type.h>
Subclassed by arrow::DayTimeIntervalType, arrow::MonthDayNanoIntervalType, arrow::MonthIntervalType
-
class
arrow
::
MonthIntervalType
: public arrow::IntervalType¶ - #include <arrow/type.h>
Represents a number of months.
Type representing a number of months. Corresponds to YearMonth type in Schema.fbs (years are defined as 12 months).
-
class
arrow
::
DayTimeIntervalType
: public arrow::IntervalType¶ - #include <arrow/type.h>
Represents a number of days and milliseconds (fraction of day).
Public Functions
-
inline virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
struct
DayMilliseconds
¶ - #include <arrow/type.h>
-
inline virtual std::string
-
class
arrow
::
MonthDayNanoIntervalType
: public arrow::IntervalType¶ - #include <arrow/type.h>
Represents a number of months, days and nanoseconds between two dates.
All fields are independent from one another.
Public Functions
-
inline virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
struct
MonthDayNanos
¶ - #include <arrow/type.h>
-
inline virtual std::string
-
class
arrow
::
DurationType
: public arrow::TemporalType, public arrow::ParametricType¶ - #include <arrow/type.h>
Represents an elapsed time without any relation to a calendar artifact.
Binary-like¶
-
class
arrow
::
BinaryType
: public arrow::BaseBinaryType¶ - #include <arrow/type.h>
Concrete type class for variable-size binary data.
Subclassed by arrow::StringType
Public Functions
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
inline virtual DataTypeLayout
-
class
arrow
::
LargeBinaryType
: public arrow::BaseBinaryType¶ - #include <arrow/type.h>
Concrete type class for large variable-size binary data.
Subclassed by arrow::LargeStringType
Public Functions
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
inline virtual DataTypeLayout
-
class
arrow
::
StringType
: public arrow::BinaryType¶ - #include <arrow/type.h>
Concrete type class for variable-size string data, utf8-encoded.
-
class
arrow
::
LargeStringType
: public arrow::LargeBinaryType¶ - #include <arrow/type.h>
Concrete type class for large variable-size string data, utf8-encoded.
-
class
arrow
::
FixedSizeBinaryType
: public arrow::FixedWidthType, public arrow::ParametricType¶ - #include <arrow/type.h>
Concrete type class for fixed-size binary data.
Subclassed by arrow::DecimalType
Public Functions
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
Nested¶
-
class
BaseListType
: public arrow::NestedType¶ - #include <arrow/type.h>
Base class for all variable-size list data types.
Subclassed by arrow::FixedSizeListType, arrow::LargeListType, arrow::ListType
-
class
arrow
::
ListType
: public arrow::BaseListType¶ - #include <arrow/type.h>
Concrete type class for list data.
List data is nested data where each value is a variable number of child items. Lists can be recursively nested, for example list(list(int32)).
Subclassed by arrow::MapType
Public Functions
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
inline virtual DataTypeLayout
-
class
arrow
::
LargeListType
: public arrow::BaseListType¶ - #include <arrow/type.h>
Concrete type class for large list data.
LargeListType is like ListType but with 64-bit rather than 32-bit offsets.
Public Functions
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
inline virtual DataTypeLayout
-
class
arrow
::
MapType
: public arrow::ListType¶ - #include <arrow/type.h>
Concrete type class for map data.
Map data is nested data where each value is a variable number of key-item pairs. Its physical representation is the same as a list of
{key, item}
structs.Maps can be recursively nested, for example map(utf8, map(utf8, int32)).
-
class
arrow
::
FixedSizeListType
: public arrow::BaseListType¶ - #include <arrow/type.h>
Concrete type class for fixed size list data.
Public Functions
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
inline virtual DataTypeLayout
-
class
arrow
::
StructType
: public arrow::NestedType¶ - #include <arrow/type.h>
Concrete type class for struct data.
Public Functions
-
inline virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
std::shared_ptr<Field>
GetFieldByName
(const std::string &name) const¶ Returns null if name not found.
-
std::vector<std::shared_ptr<Field>>
GetAllFieldsByName
(const std::string &name) const¶ Return all fields having this name.
-
int
GetFieldIndex
(const std::string &name) const¶ Returns -1 if name not found or if there are multiple fields having the same name.
-
std::vector<int>
GetAllFieldIndices
(const std::string &name) const¶ Return the indices of all fields having this name in sorted order.
-
inline virtual DataTypeLayout
-
class
arrow
::
UnionType
: public arrow::NestedType¶ - #include <arrow/type.h>
Base type class for union data.
Subclassed by arrow::DenseUnionType, arrow::SparseUnionType
Public Functions
-
virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline const std::vector<int8_t> &
type_codes
() const¶ The array of logical type ids.
For example, the first type in the union might be denoted by the id 5 (instead of 0).
-
inline const std::vector<int> &
child_ids
() const¶ An array mapping logical type ids to physical child ids.
-
virtual DataTypeLayout
-
class
arrow
::
SparseUnionType
: public arrow::UnionType¶ - #include <arrow/type.h>
Concrete type class for sparse union data.
A sparse union is a nested type where each logical value is taken from a single child. A buffer of 8-bit type ids indicates which child a given logical value is to be taken from.
In a sparse union, each child array should have the same length as the union array, regardless of the actual number of union values that refer to it.
Note that, unlike most other types, unions don’t have a top-level validity bitmap.
Public Functions
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
inline virtual std::string
-
class
arrow
::
DenseUnionType
: public arrow::UnionType¶ - #include <arrow/type.h>
Concrete type class for dense union data.
A dense union is a nested type where each logical value is taken from a single child, at a specific offset. A buffer of 8-bit type ids indicates which child a given logical value is to be taken from, and a buffer of 32-bit offsets indicates at which physical position in the given child array the logical value is to be taken from.
Unlike a sparse union, a dense union allows encoding only the child array values which are actually referred to by the union array. This is counterbalanced by the additional footprint of the offsets buffer, and the additional indirection cost when looking up values.
Note that, unlike most other types, unions don’t have a top-level validity bitmap.
Public Functions
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
inline virtual std::string
Dictionary-encoded¶
-
class
arrow
::
DictionaryType
: public arrow::FixedWidthType¶ Dictionary-encoded value type with data-dependent dictionary.
Indices are represented by any integer types.
Public Functions
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
Extension types¶
-
class
arrow
::
ExtensionType
: public arrow::DataType¶ The base class for custom / user-defined types.
Subclassed by arrow::py::PyExtensionType
Public Functions
-
inline const std::shared_ptr<DataType> &
storage_type
() const¶ The type of array used to represent this extension type’s data.
-
virtual DataTypeLayout
layout
() const override¶ Return the data type layout.
Children are not included.
Note
Experimental API
-
virtual std::string
ToString
() const override¶ A string representation of the type, including any children.
-
inline virtual std::string
name
() const override¶ A string name of the type, omitting any child fields.
- Since
0.7.0
-
virtual std::string
extension_name
() const = 0¶ Unique name of extension type used to identify type for serialization.
- Returns
the string name of the extension
-
virtual bool
ExtensionEquals
(const ExtensionType &other) const = 0¶ Determine if two instances of the same extension types are equal.
Invoked from ExtensionType::Equals
- Parameters
[in] other – the type to compare this type with
- Returns
bool true if type instances are equal
Wrap built-in Array type in a user-defined ExtensionArray instance.
- Parameters
[in] data – the physical storage for the extension type
Create an instance of the ExtensionType given the actual storage type and the serialized representation.
- Parameters
[in] storage_type – the physical storage type of the extension
[in] serialized_data – the serialized representation produced by Serialize
-
virtual std::string
Serialize
() const = 0¶ Create a serialized representation of the extension type’s metadata.
The storage type will be handled automatically in IPC code paths
- Returns
the serialized representation
Public Static Functions
Wrap the given storage array as an extension array.
Wrap the given chunked storage array as a chunked extension array.
-
inline const std::shared_ptr<DataType> &
Fields and Schemas¶
Create a Field instance.
- Parameters
name – the field name
type – the field value type
nullable – whether the values are nullable, default true
metadata – any custom key-value metadata, default null
Create a Field instance with metadata.
The field will be assumed to be nullable.
- Parameters
name – the field name
type – the field value type
metadata – any custom key-value metadata
Create a Schema instance.
- Parameters
fields – the schema’s fields
metadata – any custom key-value metadata, default null
- Returns
schema shared_ptr to Schema
Create a Schema instance.
- Parameters
fields – the schema’s fields
endianness – the endianness of the data
metadata – any custom key-value metadata, default null
- Returns
schema shared_ptr to Schema
-
class
arrow
::
Field
: public arrow::detail::Fingerprintable¶ The combination of a field name and data type, with optional metadata.
Fields are used to describe the individual constituents of a nested DataType or a Schema.
A field’s metadata is represented by a KeyValueMetadata instance, which holds arbitrary key-value pairs.
Public Functions
-
inline std::shared_ptr<const KeyValueMetadata>
metadata
() const¶ Return the field’s attached metadata.
-
bool
HasMetadata
() const¶ Return whether the field has non-empty metadata.
Return a copy of this field with the given metadata attached to it.
EXPERIMENTAL: Return a copy of this field with the given metadata merged with existing metadata (any colliding keys will be overridden by the passed metadata)
-
std::shared_ptr<Field>
RemoveMetadata
() const¶ Return a copy of this field without any metadata attached to it.
Return a copy of this field with the replaced type.
-
std::shared_ptr<Field>
WithName
(const std::string &name) const¶ Return a copy of this field with the replaced name.
-
std::shared_ptr<Field>
WithNullable
(bool nullable) const¶ Return a copy of this field with the replaced nullability.
-
Result<std::shared_ptr<Field>>
MergeWith
(const Field &other, MergeOptions options = MergeOptions::Defaults()) const¶ Merge the current field with a field of the same name.
The two fields must be compatible, i.e:
have the same name
have the same type, or of compatible types according to
options
.
The metadata of the current field is preserved; the metadata of the other field is discarded.
-
bool
Equals
(const Field &other, bool check_metadata = false) const¶ Indicate if fields are equals.
- Parameters
[in] other – field to check equality with.
[in] check_metadata – controls if it should check for metadata equality.
- Returns
true if fields are equal, false otherwise.
-
bool
IsCompatibleWith
(const Field &other) const¶ Indicate if fields are compatibles.
See the criteria of MergeWith.
- Returns
true if fields are compatible, false otherwise.
-
std::string
ToString
(bool show_metadata = false) const¶ Return a string representation ot the field.
- Parameters
[in] show_metadata – when true, if KeyValueMetadata is non-empty, print keys and values in the output
-
inline const std::string &
name
() const¶ Return the field name.
-
inline bool
nullable
() const¶ Return whether the field is nullable.
-
struct
MergeOptions
¶ Options that control the behavior of
MergeWith
.Options are to be added to allow type conversions, including integer widening, promotion from integer to float, or conversion to or from boolean.
-
inline std::shared_ptr<const KeyValueMetadata>
-
class
arrow
::
Schema
: public arrow::detail::Fingerprintable, public arrow::util::EqualityComparable<Schema>, public arrow::util::ToStringOstreamable<Schema>¶ Sequence of arrow::Field objects describing the columns of a record batch or table data structure.
Public Functions
-
bool
Equals
(const Schema &other, bool check_metadata = false) const¶ Returns true if all of the schema fields are equal.
-
std::shared_ptr<Schema>
WithEndianness
(Endianness endianness) const¶ Set endianness in the schema.
- Returns
new Schema
-
Endianness
endianness
() const¶ Return endianness in the schema.
-
bool
is_native_endian
() const¶ Indicate if endianness is equal to platform-native endianness.
-
int
num_fields
() const¶ Return the number of fields (columns) in the schema.
-
const std::shared_ptr<Field> &
field
(int i) const¶ Return the ith schema element. Does not boundscheck.
-
std::shared_ptr<Field>
GetFieldByName
(const std::string &name) const¶ Returns null if name not found.
-
FieldVector
GetAllFieldsByName
(const std::string &name) const¶ Return the indices of all fields having this name in sorted order.
-
int
GetFieldIndex
(const std::string &name) const¶ Returns -1 if name not found.
-
std::vector<int>
GetAllFieldIndices
(const std::string &name) const¶ Return the indices of all fields having this name.
-
Status
CanReferenceFieldsByNames
(const std::vector<std::string> &names) const¶ Indicate if fields named
names
can be found unambiguously in the schema.
-
const std::shared_ptr<const KeyValueMetadata> &
metadata
() const¶ The custom key-value metadata, if any.
- Returns
metadata may be null
-
std::string
ToString
(bool show_metadata = false) const¶ Render a string representation of the schema suitable for debugging.
- Parameters
[in] show_metadata – when true, if KeyValueMetadata is non-empty, print keys and values in the output
Replace key-value metadata with new metadata.
- Parameters
[in] metadata – new KeyValueMetadata
- Returns
new Schema
-
bool