Struct arrow::array::ArrayData

pub struct ArrayData {
    data_type: DataType,
    len: usize,
    offset: usize,
    buffers: Vec<Buffer>,
    child_data: Vec<ArrayData>,
    nulls: Option<NullBuffer>,
}
Expand description

A generic representation of Arrow array data which encapsulates common attributes and operations for Arrow array. Specific operations for different arrays types (e.g., primitive, list, struct) are implemented in Array.

§Memory Layout

ArrayData has references to one or more underlying data buffers and optional child ArrayData, depending on type as illustrated below. Bitmaps are not shown for simplicity but they are stored similarly to the buffers.

                       offset
                      points to
┌───────────────────┐ start of  ┌───────┐       Different
│                   │   data    │       │     ArrayData may
│ArrayData {        │           │....   │     also refers to
│  data_type: ...   │   ─ ─ ─ ─▶│1234   │  ┌ ─  the same
│  offset: ... ─ ─ ─│─ ┘        │4372   │      underlying
│  len: ...    ─ ─ ─│─ ┐        │4888   │  │     buffer with different offset/len
│  buffers: [       │           │5882   │◀─
│    ...            │  │        │4323   │
│  ]                │   ─ ─ ─ ─▶│4859   │
│  child_data: [    │           │....   │
│    ...            │           │       │
│  ]                │           └───────┘
│}                  │
│                   │            Shared Buffer uses
│               │   │            bytes::Bytes to hold
└───────────────────┘            actual data values
          ┌ ─ ─ ┘

          ▼
┌───────────────────┐
│ArrayData {        │
│  ...              │
│}                  │
│                   │
└───────────────────┘

Child ArrayData may also have its own buffers and children

Fields§

§data_type: DataType§len: usize§offset: usize§buffers: Vec<Buffer>§child_data: Vec<ArrayData>§nulls: Option<NullBuffer>

Implementations§

§

impl ArrayData

pub unsafe fn new_unchecked( data_type: DataType, len: usize, null_count: Option<usize>, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData>, ) -> ArrayData

Create a new ArrayData instance;

If null_count is not specified, the number of nulls in null_bit_buffer is calculated.

If the number of nulls is 0 then the null_bit_buffer is set to None.

§Safety

The input values must form a valid Arrow array for data_type, or undefined behavior can result.

Note: This is a low level API and most users of the arrow crate should create arrays using the methods in the array module.

pub fn try_new( data_type: DataType, len: usize, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData>, ) -> Result<ArrayData, ArrowError>

Create a new ArrayData, validating that the provided buffers form a valid Arrow array of the specified data type.

If the number of nulls in null_bit_buffer is 0 then the null_bit_buffer is set to None.

Internally this calls through to Self::validate_data

Note: This is a low level API and most users of the arrow crate should create arrays using the builders found in arrow_array

pub const fn builder(data_type: DataType) -> ArrayDataBuilder

Returns a builder to construct a ArrayData instance of the same DataType

pub const fn data_type(&self) -> &DataType

Returns a reference to the DataType of this ArrayData

pub fn buffers(&self) -> &[Buffer]

Returns the Buffer storing data for this ArrayData

pub fn child_data(&self) -> &[ArrayData]

Returns a slice of children ArrayData. This will be non empty for type such as lists and structs.

pub fn is_null(&self, i: usize) -> bool

Returns whether the element at index i is null

pub fn nulls(&self) -> Option<&NullBuffer>

Returns a reference to the null buffer of this ArrayData if any

Note: ArrayData::offset does NOT apply to the returned NullBuffer

pub fn is_valid(&self, i: usize) -> bool

Returns whether the element at index i is not null

pub const fn len(&self) -> usize

Returns the length (i.e., number of elements) of this ArrayData.

pub const fn is_empty(&self) -> bool

Returns whether this ArrayData is empty

pub const fn offset(&self) -> usize

Returns the offset of this ArrayData

pub fn null_count(&self) -> usize

Returns the total number of nulls in this array

pub fn get_buffer_memory_size(&self) -> usize

Returns the total number of bytes of memory occupied by the buffers owned by this ArrayData and all of its children. (See also diagram on ArrayData).

Note that this ArrayData may only refer to a subset of the data in the underlying Buffers (due to offset and length), but the size returned includes the entire size of the buffers.

If multiple ArrayDatas refer to the same underlying Buffers they will both report the same size.

pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>

Returns the total number of the bytes of memory occupied by the buffers by this slice of ArrayData (See also diagram on ArrayData).

This is approximately the number of bytes if a new ArrayData was formed by creating new Buffers with exactly the data needed.

For example, a DataType::Int64 with 100 elements, Self::get_slice_memory_size would return 100 * 8 = 800. If the ArrayData was then Self::sliceed to refer to its first 20 elements, then Self::get_slice_memory_size on the sliced ArrayData would return 20 * 8 = 160.

pub fn get_array_memory_size(&self) -> usize

Returns the total number of bytes of memory occupied physically by this ArrayData and all its Buffers and children. (See also diagram on ArrayData).

Equivalent to: size_of_val(self) + Self::get_buffer_memory_size + size_of_val(child) for all children

pub fn slice(&self, offset: usize, length: usize) -> ArrayData

Creates a zero-copy slice of itself. This creates a new ArrayData pointing at the same underlying Buffers with a different offset and len

§Panics

Panics if offset + length > self.len().

pub fn buffer<T>(&self, buffer: usize) -> &[T]
where T: ArrowNativeType,

Returns the buffer as a slice of type T starting at self.offset

§Panics

This function panics if:

  • the buffer is not byte-aligned with type T, or
  • the datatype is Boolean (it corresponds to a bit-packed buffer where the offset is not applicable)

pub fn new_null(data_type: &DataType, len: usize) -> ArrayData

Returns a new ArrayData valid for data_type containing len null values

pub fn new_empty(data_type: &DataType) -> ArrayData

Returns a new empty ArrayData valid for data_type.

pub fn align_buffers(&mut self)

Verifies that the buffers meet the minimum alignment requirements for the data type

Buffers that are not adequately aligned will be copied to a new aligned allocation

This can be useful for when interacting with data sent over IPC or FFI, that may not meet the minimum alignment requirements

pub fn validate(&self) -> Result<(), ArrowError>

“cheap” validation of an ArrayData. Ensures buffers are sufficiently sized to store len + offset total elements of data_type and performs other inexpensive consistency checks.

This check is “cheap” in the sense that it does not validate the contents of the buffers (e.g. that all offsets for UTF8 arrays are within the bounds of the values buffer).

See ArrayData::validate_data to validate fully the offset content and the validity of utf8 data

pub fn validate_data(&self) -> Result<(), ArrowError>

Validate that the data contained within this ArrayData is valid

  1. Null count is correct
  2. All offsets are valid
  3. All String data is valid UTF-8
  4. All dictionary offsets are valid

Internally this calls:

Note: this does not recurse into children, for a recursive variant see Self::validate_full

pub fn validate_full(&self) -> Result<(), ArrowError>

Performs a full recursive validation of this ArrayData and all its children

This is equivalent to calling Self::validate_data on this ArrayData and all its children recursively

pub fn validate_nulls(&self) -> Result<(), ArrowError>

Validates the values stored within this ArrayData are valid without recursing into child ArrayData

Does not (yet) check

  1. Union type_ids are valid see #85
  2. the the null count is correct and that any
  3. nullability requirements of its children are correct

pub fn validate_values(&self) -> Result<(), ArrowError>

Validates the values stored within this ArrayData are valid without recursing into child ArrayData

Does not (yet) check

  1. Union type_ids are valid see #85

pub fn ptr_eq(&self, other: &ArrayData) -> bool

Returns true if this ArrayData is equal to other, using pointer comparisons to determine buffer equality. This is cheaper than PartialEq::eq but may return false when the arrays are logically equal

pub fn into_builder(self) -> ArrayDataBuilder

Converts this ArrayData into an ArrayDataBuilder

Trait Implementations§

§

impl Clone for ArrayData

§

fn clone(&self) -> ArrayData

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
§

impl Debug for ArrayData

§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
§

impl From<ArrayData> for ArrayDataBuilder

§

fn from(d: ArrayData) -> ArrayDataBuilder

Converts to this type from the input type.
§

impl From<ArrayData> for BooleanArray

§

fn from(data: ArrayData) -> BooleanArray

Converts to this type from the input type.
§

impl<T> From<ArrayData> for DictionaryArray<T>

Constructs a DictionaryArray from an array data reference.

§

fn from(data: ArrayData) -> DictionaryArray<T>

Converts to this type from the input type.
§

impl From<ArrayData> for FixedSizeBinaryArray

§

fn from(data: ArrayData) -> FixedSizeBinaryArray

Converts to this type from the input type.
§

impl From<ArrayData> for FixedSizeListArray

§

fn from(data: ArrayData) -> FixedSizeListArray

Converts to this type from the input type.
§

impl<T> From<ArrayData> for GenericByteArray<T>
where T: ByteArrayType,

§

fn from(data: ArrayData) -> GenericByteArray<T>

Converts to this type from the input type.
§

impl<T> From<ArrayData> for GenericByteViewArray<T>
where T: ByteViewType + ?Sized,

§

fn from(value: ArrayData) -> GenericByteViewArray<T>

Converts to this type from the input type.
§

impl<OffsetSize> From<ArrayData> for GenericListArray<OffsetSize>
where OffsetSize: OffsetSizeTrait,

§

fn from(data: ArrayData) -> GenericListArray<OffsetSize>

Converts to this type from the input type.
§

impl From<ArrayData> for MapArray

§

fn from(data: ArrayData) -> MapArray

Converts to this type from the input type.
§

impl From<ArrayData> for NullArray

§

fn from(data: ArrayData) -> NullArray

Converts to this type from the input type.
§

impl<T> From<ArrayData> for PrimitiveArray<T>

Constructs a PrimitiveArray from an array data reference.

§

fn from(data: ArrayData) -> PrimitiveArray<T>

Converts to this type from the input type.
§

impl<R> From<ArrayData> for RunArray<R>
where R: RunEndIndexType,

§

fn from(data: ArrayData) -> RunArray<R>

Converts to this type from the input type.
§

impl From<ArrayData> for StructArray

§

fn from(data: ArrayData) -> StructArray

Converts to this type from the input type.
§

impl From<ArrayData> for UnionArray

§

fn from(data: ArrayData) -> UnionArray

Converts to this type from the input type.
§

impl From<BooleanArray> for ArrayData

§

fn from(array: BooleanArray) -> ArrayData

Converts to this type from the input type.
§

impl<T> From<DictionaryArray<T>> for ArrayData

§

fn from(array: DictionaryArray<T>) -> ArrayData

Converts to this type from the input type.
§

impl From<FixedSizeBinaryArray> for ArrayData

§

fn from(array: FixedSizeBinaryArray) -> ArrayData

Converts to this type from the input type.
§

impl From<FixedSizeListArray> for ArrayData

§

fn from(array: FixedSizeListArray) -> ArrayData

Converts to this type from the input type.
§

impl<T> From<GenericByteArray<T>> for ArrayData
where T: ByteArrayType,

§

fn from(array: GenericByteArray<T>) -> ArrayData

Converts to this type from the input type.
§

impl<T> From<GenericByteViewArray<T>> for ArrayData
where T: ByteViewType + ?Sized,

§

fn from(array: GenericByteViewArray<T>) -> ArrayData

Converts to this type from the input type.
§

impl<OffsetSize> From<GenericListArray<OffsetSize>> for ArrayData
where OffsetSize: OffsetSizeTrait,

§

fn from(array: GenericListArray<OffsetSize>) -> ArrayData

Converts to this type from the input type.
§

impl From<MapArray> for ArrayData

§

fn from(array: MapArray) -> ArrayData

Converts to this type from the input type.
§

impl From<NullArray> for ArrayData

§

fn from(array: NullArray) -> ArrayData

Converts to this type from the input type.
§

impl<T> From<PrimitiveArray<T>> for ArrayData

§

fn from(array: PrimitiveArray<T>) -> ArrayData

Converts to this type from the input type.
§

impl<R> From<RunArray<R>> for ArrayData
where R: RunEndIndexType,

§

fn from(array: RunArray<R>) -> ArrayData

Converts to this type from the input type.
§

impl From<StructArray> for ArrayData

§

fn from(array: StructArray) -> ArrayData

Converts to this type from the input type.
§

impl From<UnionArray> for ArrayData

§

fn from(array: UnionArray) -> ArrayData

Converts to this type from the input type.
source§

impl FromPyArrow for ArrayData

source§

fn from_pyarrow_bound(value: &Bound<'_, PyAny>) -> PyResult<Self>

§

impl PartialEq for ArrayData

§

fn eq(&self, other: &ArrayData) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl ToPyArrow for ArrayData

source§

fn to_pyarrow(&self, py: Python<'_>) -> PyResult<PyObject>

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoPyArrow for T
where T: ToPyArrow,

source§

fn into_pyarrow(self, py: Python<'_>) -> Result<Py<PyAny>, PyErr>

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

§

impl<T> Ungil for T
where T: Send,