Data Types¶
See also
Data types govern how physical data is interpreted. Their specification allows binary interoperability between different Arrow
implementations, including from different programming languages and runtimes
(for example it is possible to access the same data, without copying, from
both Python and Java using the pyarrow.jvm bridge module).
Information about a data type in C++ can be represented in three ways:
Using a
arrow::DataTypeinstance (e.g. as a function argument)Using a
arrow::DataTypeconcrete subclass (e.g. as a template parameter)Using a
arrow::Type::typeenum value (e.g. as the condition of a switch statement)
The first form (using a arrow::DataType instance) is the most idiomatic
and flexible. Runtime-parametric types can only be fully represented with
a DataType instance. For example, a arrow::TimestampType needs to be
constructed at runtime with a arrow::TimeUnit::type parameter; a
arrow::Decimal128Type with scale and precision parameters;
a arrow::ListType with a full child type (itself a
arrow::DataType instance).
The two other forms can be used where performance is critical, in order to avoid paying the price of dynamic typing and polymorphism. However, some amount of runtime switching can still be required for parametric types. It is not possible to reify all possible types at compile time, since Arrow data types allows arbitrary nesting.
Creating data types¶
To instantiate data types, it is recommended to call the provided factory functions:
std::shared_ptr<arrow::DataType> type;
// A 16-bit integer type
type = arrow::int16();
// A 64-bit timestamp type (with microsecond granularity)
type = arrow::timestamp(arrow::TimeUnit::MICRO);
// A list type of single-precision floating-point values
type = arrow::list(arrow::float32());