pub struct ArrayData {
data_type: DataType,
len: usize,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>,
nulls: Option<NullBuffer>,
}
Expand description
A generic representation of Arrow array data which encapsulates common attributes and operations for Arrow array.
Specific operations for different arrays types (e.g., primitive, list, struct)
are implemented in Array
.
§Memory Layout
ArrayData
has references to one or more underlying data buffers
and optional child ArrayData, depending on type as illustrated
below. Bitmaps are not shown for simplicity but they are stored
similarly to the buffers.
offset
points to
┌───────────────────┐ start of ┌───────┐ Different
│ │ data │ │ ArrayData may
│ArrayData { │ │.... │ also refers to
│ data_type: ... │ ─ ─ ─ ─▶│1234 │ ┌ ─ the same
│ offset: ... ─ ─ ─│─ ┘ │4372 │ underlying
│ len: ... ─ ─ ─│─ ┐ │4888 │ │ buffer with different offset/len
│ buffers: [ │ │5882 │◀─
│ ... │ │ │4323 │
│ ] │ ─ ─ ─ ─▶│4859 │
│ child_data: [ │ │.... │
│ ... │ │ │
│ ] │ └───────┘
│} │
│ │ Shared Buffer uses
│ │ │ bytes::Bytes to hold
└───────────────────┘ actual data values
┌ ─ ─ ┘
▼
┌───────────────────┐
│ArrayData { │
│ ... │
│} │
│ │
└───────────────────┘
Child ArrayData may also have its own buffers and children
Fields§
§data_type: DataType
The data type for this array data
len: usize
The number of elements in this array data
offset: usize
The offset into this array data, in number of items
buffers: Vec<Buffer>
The buffers for this array data. Note that depending on the array types, this could hold different kinds of buffers (e.g., value buffer, value offset buffer) at different positions.
child_data: Vec<ArrayData>
The child(ren) of this array. Only non-empty for nested types, currently
ListArray
and StructArray
.
nulls: Option<NullBuffer>
The null bitmap. A None
value for this indicates all values are non-null in
this array.
Implementations§
Source§impl ArrayData
impl ArrayData
Sourcepub unsafe fn new_unchecked(
data_type: DataType,
len: usize,
null_count: Option<usize>,
null_bit_buffer: Option<Buffer>,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>,
) -> Self
pub unsafe fn new_unchecked( data_type: DataType, len: usize, null_count: Option<usize>, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData>, ) -> Self
Create a new ArrayData instance;
If null_count
is not specified, the number of nulls in
null_bit_buffer is calculated.
If the number of nulls is 0 then the null_bit_buffer
is set to None
.
§Safety
The input values must form a valid Arrow array for
data_type
, or undefined behavior can result.
Note: This is a low level API and most users of the arrow
crate should create arrays using the methods in the array
module.
Sourcepub fn try_new(
data_type: DataType,
len: usize,
null_bit_buffer: Option<Buffer>,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>,
) -> Result<Self, ArrowError>
pub fn try_new( data_type: DataType, len: usize, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData>, ) -> Result<Self, ArrowError>
Create a new ArrayData, validating that the provided buffers form a valid Arrow array of the specified data type.
If the number of nulls in null_bit_buffer
is 0 then the null_bit_buffer
is set to None
.
Internally this calls through to Self::validate_data
Note: This is a low level API and most users of the arrow crate should create arrays using the builders found in arrow_array
Sourcepub const fn builder(data_type: DataType) -> ArrayDataBuilder
pub const fn builder(data_type: DataType) -> ArrayDataBuilder
Returns a builder to construct a ArrayData
instance of the same [DataType
]
Sourcepub const fn data_type(&self) -> &DataType
pub const fn data_type(&self) -> &DataType
Returns a reference to the [DataType
] of this ArrayData
Sourcepub fn child_data(&self) -> &[ArrayData]
pub fn child_data(&self) -> &[ArrayData]
Returns a slice of children ArrayData
. This will be non
empty for type such as lists and structs.
Sourcepub fn nulls(&self) -> Option<&NullBuffer>
pub fn nulls(&self) -> Option<&NullBuffer>
Returns a reference to the null buffer of this ArrayData
if any
Note: ArrayData::offset
does NOT apply to the returned [NullBuffer
]
Sourcepub const fn len(&self) -> usize
pub const fn len(&self) -> usize
Returns the length (i.e., number of elements) of this ArrayData
.
Sourcepub fn null_count(&self) -> usize
pub fn null_count(&self) -> usize
Returns the total number of nulls in this array
Sourcepub fn get_buffer_memory_size(&self) -> usize
pub fn get_buffer_memory_size(&self) -> usize
Returns the total number of bytes of memory occupied by the
buffers owned by this ArrayData
and all of its
children. (See also diagram on ArrayData
).
Note that this ArrayData
may only refer to a subset of the
data in the underlying [Buffer
]s (due to offset
and
length
), but the size returned includes the entire size of
the buffers.
If multiple ArrayData
s refer to the same underlying
[Buffer
]s they will both report the same size.
Sourcepub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>
pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>
Returns the total number of the bytes of memory occupied by
the buffers by this slice of ArrayData
(See also diagram on ArrayData
).
This is approximately the number of bytes if a new
ArrayData
was formed by creating new [Buffer
]s with
exactly the data needed.
For example, a [DataType::Int64
] with 100
elements,
Self::get_slice_memory_size
would return 100 * 8 = 800
. If
the ArrayData
was then Self::slice
ed to refer to its
first 20
elements, then Self::get_slice_memory_size
on the
sliced ArrayData
would return 20 * 8 = 160
.
Sourcepub fn get_array_memory_size(&self) -> usize
pub fn get_array_memory_size(&self) -> usize
Returns the total number of bytes of memory occupied
physically by this ArrayData
and all its [Buffer
]s and
children. (See also diagram on ArrayData
).
Equivalent to:
size_of_val(self)
+
Self::get_buffer_memory_size
+
size_of_val(child)
for all children
Sourcepub fn buffer<T: ArrowNativeType>(&self, buffer: usize) -> &[T]
pub fn buffer<T: ArrowNativeType>(&self, buffer: usize) -> &[T]
Returns the buffer
as a slice of type T
starting at self.offset
§Panics
This function panics if:
- the buffer is not byte-aligned with type T, or
- the datatype is
Boolean
(it corresponds to a bit-packed buffer where the offset is not applicable)
Sourcepub fn new_null(data_type: &DataType, len: usize) -> Self
pub fn new_null(data_type: &DataType, len: usize) -> Self
Returns a new ArrayData
valid for data_type
containing len
null values
Sourcepub fn new_empty(data_type: &DataType) -> Self
pub fn new_empty(data_type: &DataType) -> Self
Returns a new empty ArrayData valid for data_type
.
Sourcepub fn align_buffers(&mut self)
pub fn align_buffers(&mut self)
Verifies that the buffers meet the minimum alignment requirements for the data type
Buffers that are not adequately aligned will be copied to a new aligned allocation
This can be useful for when interacting with data sent over IPC or FFI, that may not meet the minimum alignment requirements
This also aligns buffers of children data
Sourcepub fn validate(&self) -> Result<(), ArrowError>
pub fn validate(&self) -> Result<(), ArrowError>
“cheap” validation of an ArrayData
. Ensures buffers are
sufficiently sized to store len
+ offset
total elements of
data_type
and performs other inexpensive consistency checks.
This check is “cheap” in the sense that it does not validate the contents of the buffers (e.g. that all offsets for UTF8 arrays are within the bounds of the values buffer).
See ArrayData::validate_data to validate fully the offset content and the validity of utf8 data
Sourcefn typed_offsets<T: ArrowNativeType + Num>(&self) -> Result<&[T], ArrowError>
fn typed_offsets<T: ArrowNativeType + Num>(&self) -> Result<&[T], ArrowError>
Returns a reference to the data in buffer
as a typed slice
(typically &[i32]
or &[i64]
) after validating. The
returned slice is guaranteed to have at least self.len + 1
entries.
For an empty array, the buffer
can also be empty.
Sourcefn typed_buffer<T: ArrowNativeType + Num>(
&self,
idx: usize,
len: usize,
) -> Result<&[T], ArrowError>
fn typed_buffer<T: ArrowNativeType + Num>( &self, idx: usize, len: usize, ) -> Result<&[T], ArrowError>
Returns a reference to the data in buffers[idx]
as a typed slice after validating
Sourcefn validate_offsets<T: ArrowNativeType + Num + Display>(
&self,
values_length: usize,
) -> Result<(), ArrowError>
fn validate_offsets<T: ArrowNativeType + Num + Display>( &self, values_length: usize, ) -> Result<(), ArrowError>
Does a cheap sanity check that the self.len
values in buffer
are valid
offsets (of type T) into some other buffer of values_length
bytes long
Sourcefn validate_offsets_and_sizes<T: ArrowNativeType + Num + Display>(
&self,
values_length: usize,
) -> Result<(), ArrowError>
fn validate_offsets_and_sizes<T: ArrowNativeType + Num + Display>( &self, values_length: usize, ) -> Result<(), ArrowError>
Does a cheap sanity check that the self.len
values in buffer
are valid
offsets and sizes (of type T) into some other buffer of values_length
bytes long
Sourcefn validate_child_data(&self) -> Result<(), ArrowError>
fn validate_child_data(&self) -> Result<(), ArrowError>
Validates the layout of child_data
ArrayData structures
Sourcefn get_single_valid_child_data(
&self,
expected_type: &DataType,
) -> Result<&ArrayData, ArrowError>
fn get_single_valid_child_data( &self, expected_type: &DataType, ) -> Result<&ArrayData, ArrowError>
Ensures that this array data has a single child_data with the
expected type, and calls validate()
on it. Returns a
reference to that child_data
Sourcefn validate_num_child_data(&self, expected_len: usize) -> Result<(), ArrowError>
fn validate_num_child_data(&self, expected_len: usize) -> Result<(), ArrowError>
Returns Err
if self.child_data does not have exactly expected_len
elements
Sourcefn get_valid_child_data(
&self,
i: usize,
expected_type: &DataType,
) -> Result<&ArrayData, ArrowError>
fn get_valid_child_data( &self, i: usize, expected_type: &DataType, ) -> Result<&ArrayData, ArrowError>
Ensures that child_data[i]
has the expected type, calls
validate()
on it, and returns a reference to that child_data
Sourcepub fn validate_data(&self) -> Result<(), ArrowError>
pub fn validate_data(&self) -> Result<(), ArrowError>
Validate that the data contained within this ArrayData
is valid
- Null count is correct
- All offsets are valid
- All String data is valid UTF-8
- All dictionary offsets are valid
Internally this calls:
Note: this does not recurse into children, for a recursive variant
see Self::validate_full
Sourcepub fn validate_full(&self) -> Result<(), ArrowError>
pub fn validate_full(&self) -> Result<(), ArrowError>
Performs a full recursive validation of this ArrayData
and all its children
This is equivalent to calling Self::validate_data
on this ArrayData
and all its children recursively
Sourcepub fn validate_nulls(&self) -> Result<(), ArrowError>
pub fn validate_nulls(&self) -> Result<(), ArrowError>
Sourcefn validate_non_nullable(
&self,
mask: Option<&NullBuffer>,
child: &ArrayData,
) -> Result<(), ArrowError>
fn validate_non_nullable( &self, mask: Option<&NullBuffer>, child: &ArrayData, ) -> Result<(), ArrowError>
Verifies that child
contains no nulls not present in mask
Sourcepub fn validate_values(&self) -> Result<(), ArrowError>
pub fn validate_values(&self) -> Result<(), ArrowError>
Sourcefn validate_each_offset<T, V>(
&self,
offset_limit: usize,
validate: V,
) -> Result<(), ArrowError>
fn validate_each_offset<T, V>( &self, offset_limit: usize, validate: V, ) -> Result<(), ArrowError>
Calls the validate(item_index, range)
function for each of
the ranges specified in the arrow offsets buffer of type
T
. Also validates that each offset is smaller than
offset_limit
For an empty array, the offsets buffer can either be empty
or contain a single 0
.
For example, the offsets buffer contained [1, 2, 4]
, this
function would call validate([1,2])
, and validate([2,4])
Sourcefn validate_utf8<T>(&self) -> Result<(), ArrowError>
fn validate_utf8<T>(&self) -> Result<(), ArrowError>
Ensures that all strings formed by the offsets in buffers[0]
into buffers[1]
are valid utf8 sequences
Sourcefn validate_offsets_full<T>(
&self,
offset_limit: usize,
) -> Result<(), ArrowError>
fn validate_offsets_full<T>( &self, offset_limit: usize, ) -> Result<(), ArrowError>
Ensures that all offsets in buffers[0]
into buffers[1]
are
between 0
and offset_limit
Sourcefn check_bounds<T>(&self, max_value: i64) -> Result<(), ArrowError>
fn check_bounds<T>(&self, max_value: i64) -> Result<(), ArrowError>
Validates that each value in self.buffers (typed as T) is within the range [0, max_value], inclusive
Sourcefn check_run_ends<T>(&self) -> Result<(), ArrowError>
fn check_run_ends<T>(&self) -> Result<(), ArrowError>
Validates that each value in run_ends array is positive and strictly increasing.
Sourcepub fn ptr_eq(&self, other: &Self) -> bool
pub fn ptr_eq(&self, other: &Self) -> bool
Returns true if this ArrayData
is equal to other
, using pointer comparisons
to determine buffer equality. This is cheaper than PartialEq::eq
but may
return false when the arrays are logically equal
Sourcepub fn into_builder(self) -> ArrayDataBuilder
pub fn into_builder(self) -> ArrayDataBuilder
Converts this ArrayData
into an ArrayDataBuilder