pub struct ArrayData {
data_type: DataType,
len: usize,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>,
nulls: Option<NullBuffer>,
}
Expand description
A generic representation of Arrow array data which encapsulates common attributes and
operations for Arrow array. Specific operations for different arrays types (e.g.,
primitive, list, struct) are implemented in Array
.
§Memory Layout
ArrayData
has references to one or more underlying data buffers
and optional child ArrayData, depending on type as illustrated
below. Bitmaps are not shown for simplicity but they are stored
similarly to the buffers.
offset
points to
┌───────────────────┐ start of ┌───────┐ Different
│ │ data │ │ ArrayData may
│ArrayData { │ │.... │ also refers to
│ data_type: ... │ ─ ─ ─ ─▶│1234 │ ┌ ─ the same
│ offset: ... ─ ─ ─│─ ┘ │4372 │ underlying
│ len: ... ─ ─ ─│─ ┐ │4888 │ │ buffer with different offset/len
│ buffers: [ │ │5882 │◀─
│ ... │ │ │4323 │
│ ] │ ─ ─ ─ ─▶│4859 │
│ child_data: [ │ │.... │
│ ... │ │ │
│ ] │ └───────┘
│} │
│ │ Shared Buffer uses
│ │ │ bytes::Bytes to hold
└───────────────────┘ actual data values
┌ ─ ─ ┘
▼
┌───────────────────┐
│ArrayData { │
│ ... │
│} │
│ │
└───────────────────┘
Child ArrayData may also have its own buffers and children
Fields§
§data_type: DataType
§len: usize
§offset: usize
§buffers: Vec<Buffer>
§child_data: Vec<ArrayData>
§nulls: Option<NullBuffer>
Implementations§
§impl ArrayData
impl ArrayData
pub unsafe fn new_unchecked(
data_type: DataType,
len: usize,
null_count: Option<usize>,
null_bit_buffer: Option<Buffer>,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>,
) -> ArrayData
pub unsafe fn new_unchecked( data_type: DataType, len: usize, null_count: Option<usize>, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData>, ) -> ArrayData
Create a new ArrayData instance;
If null_count
is not specified, the number of nulls in
null_bit_buffer is calculated.
If the number of nulls is 0 then the null_bit_buffer
is set to None
.
§Safety
The input values must form a valid Arrow array for
data_type
, or undefined behavior can result.
Note: This is a low level API and most users of the arrow
crate should create arrays using the methods in the array
module.
pub fn try_new(
data_type: DataType,
len: usize,
null_bit_buffer: Option<Buffer>,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>,
) -> Result<ArrayData, ArrowError>
pub fn try_new( data_type: DataType, len: usize, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData>, ) -> Result<ArrayData, ArrowError>
Create a new ArrayData, validating that the provided buffers form a valid Arrow array of the specified data type.
If the number of nulls in null_bit_buffer
is 0 then the null_bit_buffer
is set to None
.
Internally this calls through to Self::validate_data
Note: This is a low level API and most users of the arrow crate should create arrays using the builders found in arrow_array
pub const fn builder(data_type: DataType) -> ArrayDataBuilder
pub const fn builder(data_type: DataType) -> ArrayDataBuilder
pub fn child_data(&self) -> &[ArrayData]
pub fn child_data(&self) -> &[ArrayData]
Returns a slice of children ArrayData
. This will be non
empty for type such as lists and structs.
pub fn nulls(&self) -> Option<&NullBuffer>
pub fn nulls(&self) -> Option<&NullBuffer>
Returns a reference to the null buffer of this ArrayData
if any
Note: ArrayData::offset
does NOT apply to the returned NullBuffer
pub fn null_count(&self) -> usize
pub fn null_count(&self) -> usize
Returns the total number of nulls in this array
pub fn get_buffer_memory_size(&self) -> usize
pub fn get_buffer_memory_size(&self) -> usize
Returns the total number of bytes of memory occupied by the
buffers owned by this ArrayData
and all of its
children. (See also diagram on ArrayData
).
Note that this ArrayData
may only refer to a subset of the
data in the underlying Buffer
s (due to offset
and
length
), but the size returned includes the entire size of
the buffers.
If multiple ArrayData
s refer to the same underlying
Buffer
s they will both report the same size.
pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>
pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>
Returns the total number of the bytes of memory occupied by
the buffers by this slice of ArrayData
(See also diagram on ArrayData
).
This is approximately the number of bytes if a new
ArrayData
was formed by creating new Buffer
s with
exactly the data needed.
For example, a DataType::Int64
with 100
elements,
Self::get_slice_memory_size
would return 100 * 8 = 800
. If
the ArrayData
was then Self::slice
ed to refer to its
first 20
elements, then Self::get_slice_memory_size
on the
sliced ArrayData
would return 20 * 8 = 160
.
pub fn get_array_memory_size(&self) -> usize
pub fn get_array_memory_size(&self) -> usize
Returns the total number of bytes of memory occupied
physically by this ArrayData
and all its Buffer
s and
children. (See also diagram on ArrayData
).
Equivalent to:
size_of_val(self)
+
Self::get_buffer_memory_size
+
size_of_val(child)
for all children
pub fn buffer<T>(&self, buffer: usize) -> &[T]where
T: ArrowNativeType,
pub fn buffer<T>(&self, buffer: usize) -> &[T]where
T: ArrowNativeType,
Returns the buffer
as a slice of type T
starting at self.offset
§Panics
This function panics if:
- the buffer is not byte-aligned with type T, or
- the datatype is
Boolean
(it corresponds to a bit-packed buffer where the offset is not applicable)
pub fn new_null(data_type: &DataType, len: usize) -> ArrayData
pub fn new_null(data_type: &DataType, len: usize) -> ArrayData
Returns a new ArrayData
valid for data_type
containing len
null values
pub fn new_empty(data_type: &DataType) -> ArrayData
pub fn new_empty(data_type: &DataType) -> ArrayData
Returns a new empty ArrayData valid for data_type
.
pub fn align_buffers(&mut self)
pub fn align_buffers(&mut self)
Verifies that the buffers meet the minimum alignment requirements for the data type
Buffers that are not adequately aligned will be copied to a new aligned allocation
This can be useful for when interacting with data sent over IPC or FFI, that may not meet the minimum alignment requirements
pub fn validate(&self) -> Result<(), ArrowError>
pub fn validate(&self) -> Result<(), ArrowError>
“cheap” validation of an ArrayData
. Ensures buffers are
sufficiently sized to store len
+ offset
total elements of
data_type
and performs other inexpensive consistency checks.
This check is “cheap” in the sense that it does not validate the contents of the buffers (e.g. that all offsets for UTF8 arrays are within the bounds of the values buffer).
See ArrayData::validate_data to validate fully the offset content and the validity of utf8 data
pub fn validate_data(&self) -> Result<(), ArrowError>
pub fn validate_data(&self) -> Result<(), ArrowError>
Validate that the data contained within this ArrayData
is valid
- Null count is correct
- All offsets are valid
- All String data is valid UTF-8
- All dictionary offsets are valid
Internally this calls:
Note: this does not recurse into children, for a recursive variant
see Self::validate_full
pub fn validate_full(&self) -> Result<(), ArrowError>
pub fn validate_full(&self) -> Result<(), ArrowError>
Performs a full recursive validation of this ArrayData
and all its children
This is equivalent to calling Self::validate_data
on this ArrayData
and all its children recursively
pub fn validate_nulls(&self) -> Result<(), ArrowError>
pub fn validate_nulls(&self) -> Result<(), ArrowError>
pub fn validate_values(&self) -> Result<(), ArrowError>
pub fn validate_values(&self) -> Result<(), ArrowError>
pub fn ptr_eq(&self, other: &ArrayData) -> bool
pub fn ptr_eq(&self, other: &ArrayData) -> bool
Returns true if this ArrayData
is equal to other
, using pointer comparisons
to determine buffer equality. This is cheaper than PartialEq::eq
but may
return false when the arrays are logically equal
pub fn into_builder(self) -> ArrayDataBuilder
pub fn into_builder(self) -> ArrayDataBuilder
Converts this ArrayData
into an ArrayDataBuilder
Trait Implementations§
§impl From<ArrayData> for ArrayDataBuilder
impl From<ArrayData> for ArrayDataBuilder
§fn from(d: ArrayData) -> ArrayDataBuilder
fn from(d: ArrayData) -> ArrayDataBuilder
§impl From<ArrayData> for BooleanArray
impl From<ArrayData> for BooleanArray
§fn from(data: ArrayData) -> BooleanArray
fn from(data: ArrayData) -> BooleanArray
§impl<T> From<ArrayData> for DictionaryArray<T>where
T: ArrowDictionaryKeyType,
impl<T> From<ArrayData> for DictionaryArray<T>where
T: ArrowDictionaryKeyType,
Constructs a DictionaryArray
from an array data reference.
§fn from(data: ArrayData) -> DictionaryArray<T>
fn from(data: ArrayData) -> DictionaryArray<T>
§impl From<ArrayData> for FixedSizeBinaryArray
impl From<ArrayData> for FixedSizeBinaryArray
§fn from(data: ArrayData) -> FixedSizeBinaryArray
fn from(data: ArrayData) -> FixedSizeBinaryArray
§impl From<ArrayData> for FixedSizeListArray
impl From<ArrayData> for FixedSizeListArray
§fn from(data: ArrayData) -> FixedSizeListArray
fn from(data: ArrayData) -> FixedSizeListArray
§impl<T> From<ArrayData> for GenericByteArray<T>where
T: ByteArrayType,
impl<T> From<ArrayData> for GenericByteArray<T>where
T: ByteArrayType,
§fn from(data: ArrayData) -> GenericByteArray<T>
fn from(data: ArrayData) -> GenericByteArray<T>
§impl<T> From<ArrayData> for GenericByteViewArray<T>where
T: ByteViewType + ?Sized,
impl<T> From<ArrayData> for GenericByteViewArray<T>where
T: ByteViewType + ?Sized,
§fn from(value: ArrayData) -> GenericByteViewArray<T>
fn from(value: ArrayData) -> GenericByteViewArray<T>
§impl<OffsetSize> From<ArrayData> for GenericListArray<OffsetSize>where
OffsetSize: OffsetSizeTrait,
impl<OffsetSize> From<ArrayData> for GenericListArray<OffsetSize>where
OffsetSize: OffsetSizeTrait,
§fn from(data: ArrayData) -> GenericListArray<OffsetSize>
fn from(data: ArrayData) -> GenericListArray<OffsetSize>
§impl<T> From<ArrayData> for PrimitiveArray<T>where
T: ArrowPrimitiveType,
impl<T> From<ArrayData> for PrimitiveArray<T>where
T: ArrowPrimitiveType,
Constructs a PrimitiveArray
from an array data reference.
§fn from(data: ArrayData) -> PrimitiveArray<T>
fn from(data: ArrayData) -> PrimitiveArray<T>
§impl<R> From<ArrayData> for RunArray<R>where
R: RunEndIndexType,
impl<R> From<ArrayData> for RunArray<R>where
R: RunEndIndexType,
§impl From<ArrayData> for StructArray
impl From<ArrayData> for StructArray
§fn from(data: ArrayData) -> StructArray
fn from(data: ArrayData) -> StructArray
§impl From<ArrayData> for UnionArray
impl From<ArrayData> for UnionArray
§fn from(data: ArrayData) -> UnionArray
fn from(data: ArrayData) -> UnionArray
§impl From<BooleanArray> for ArrayData
impl From<BooleanArray> for ArrayData
§fn from(array: BooleanArray) -> ArrayData
fn from(array: BooleanArray) -> ArrayData
§impl<T> From<DictionaryArray<T>> for ArrayDatawhere
T: ArrowDictionaryKeyType,
impl<T> From<DictionaryArray<T>> for ArrayDatawhere
T: ArrowDictionaryKeyType,
§fn from(array: DictionaryArray<T>) -> ArrayData
fn from(array: DictionaryArray<T>) -> ArrayData
§impl From<FixedSizeBinaryArray> for ArrayData
impl From<FixedSizeBinaryArray> for ArrayData
§fn from(array: FixedSizeBinaryArray) -> ArrayData
fn from(array: FixedSizeBinaryArray) -> ArrayData
§impl From<FixedSizeListArray> for ArrayData
impl From<FixedSizeListArray> for ArrayData
§fn from(array: FixedSizeListArray) -> ArrayData
fn from(array: FixedSizeListArray) -> ArrayData
§impl<T> From<GenericByteArray<T>> for ArrayDatawhere
T: ByteArrayType,
impl<T> From<GenericByteArray<T>> for ArrayDatawhere
T: ByteArrayType,
§fn from(array: GenericByteArray<T>) -> ArrayData
fn from(array: GenericByteArray<T>) -> ArrayData
§impl<T> From<GenericByteViewArray<T>> for ArrayDatawhere
T: ByteViewType + ?Sized,
impl<T> From<GenericByteViewArray<T>> for ArrayDatawhere
T: ByteViewType + ?Sized,
§fn from(array: GenericByteViewArray<T>) -> ArrayData
fn from(array: GenericByteViewArray<T>) -> ArrayData
§impl<OffsetSize> From<GenericListArray<OffsetSize>> for ArrayDatawhere
OffsetSize: OffsetSizeTrait,
impl<OffsetSize> From<GenericListArray<OffsetSize>> for ArrayDatawhere
OffsetSize: OffsetSizeTrait,
§fn from(array: GenericListArray<OffsetSize>) -> ArrayData
fn from(array: GenericListArray<OffsetSize>) -> ArrayData
§impl<T> From<PrimitiveArray<T>> for ArrayDatawhere
T: ArrowPrimitiveType,
impl<T> From<PrimitiveArray<T>> for ArrayDatawhere
T: ArrowPrimitiveType,
§fn from(array: PrimitiveArray<T>) -> ArrayData
fn from(array: PrimitiveArray<T>) -> ArrayData
§impl<R> From<RunArray<R>> for ArrayDatawhere
R: RunEndIndexType,
impl<R> From<RunArray<R>> for ArrayDatawhere
R: RunEndIndexType,
§impl From<StructArray> for ArrayData
impl From<StructArray> for ArrayData
§fn from(array: StructArray) -> ArrayData
fn from(array: StructArray) -> ArrayData
§impl From<UnionArray> for ArrayData
impl From<UnionArray> for ArrayData
§fn from(array: UnionArray) -> ArrayData
fn from(array: UnionArray) -> ArrayData
source§impl FromPyArrow for ArrayData
impl FromPyArrow for ArrayData
fn from_pyarrow_bound(value: &Bound<'_, PyAny>) -> PyResult<Self>
Auto Trait Implementations§
impl Freeze for ArrayData
impl RefUnwindSafe for ArrayData
impl Send for ArrayData
impl Sync for ArrayData
impl Unpin for ArrayData
impl UnwindSafe for ArrayData
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§unsafe fn clone_to_uninit(&self, dst: *mut T)
unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)