arrow_data

Struct ArrayData

Source
pub struct ArrayData {
    data_type: DataType,
    len: usize,
    offset: usize,
    buffers: Vec<Buffer>,
    child_data: Vec<ArrayData>,
    nulls: Option<NullBuffer>,
}
Expand description

A generic representation of Arrow array data which encapsulates common attributes and operations for Arrow array.

Specific operations for different arrays types (e.g., primitive, list, struct) are implemented in Array.

§Memory Layout

ArrayData has references to one or more underlying data buffers and optional child ArrayData, depending on type as illustrated below. Bitmaps are not shown for simplicity but they are stored similarly to the buffers.

                       offset
                      points to
┌───────────────────┐ start of  ┌───────┐       Different
│                   │   data    │       │     ArrayData may
│ArrayData {        │           │....   │     also refers to
│  data_type: ...   │   ─ ─ ─ ─▶│1234   │  ┌ ─  the same
│  offset: ... ─ ─ ─│─ ┘        │4372   │      underlying
│  len: ...    ─ ─ ─│─ ┐        │4888   │  │     buffer with different offset/len
│  buffers: [       │           │5882   │◀─
│    ...            │  │        │4323   │
│  ]                │   ─ ─ ─ ─▶│4859   │
│  child_data: [    │           │....   │
│    ...            │           │       │
│  ]                │           └───────┘
│}                  │
│                   │            Shared Buffer uses
│               │   │            bytes::Bytes to hold
└───────────────────┘            actual data values
          ┌ ─ ─ ┘

          ▼
┌───────────────────┐
│ArrayData {        │
│  ...              │
│}                  │
│                   │
└───────────────────┘

Child ArrayData may also have its own buffers and children

Fields§

§data_type: DataType

The data type for this array data

§len: usize

The number of elements in this array data

§offset: usize

The offset into this array data, in number of items

§buffers: Vec<Buffer>

The buffers for this array data. Note that depending on the array types, this could hold different kinds of buffers (e.g., value buffer, value offset buffer) at different positions.

§child_data: Vec<ArrayData>

The child(ren) of this array. Only non-empty for nested types, currently ListArray and StructArray.

§nulls: Option<NullBuffer>

The null bitmap. A None value for this indicates all values are non-null in this array.

Implementations§

Source§

impl ArrayData

Source

pub unsafe fn new_unchecked( data_type: DataType, len: usize, null_count: Option<usize>, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData>, ) -> Self

Create a new ArrayData instance;

If null_count is not specified, the number of nulls in null_bit_buffer is calculated.

If the number of nulls is 0 then the null_bit_buffer is set to None.

§Safety

The input values must form a valid Arrow array for data_type, or undefined behavior can result.

Note: This is a low level API and most users of the arrow crate should create arrays using the methods in the array module.

Source

pub fn try_new( data_type: DataType, len: usize, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData>, ) -> Result<Self, ArrowError>

Create a new ArrayData, validating that the provided buffers form a valid Arrow array of the specified data type.

If the number of nulls in null_bit_buffer is 0 then the null_bit_buffer is set to None.

Internally this calls through to Self::validate_data

Note: This is a low level API and most users of the arrow crate should create arrays using the builders found in arrow_array

Source

pub const fn builder(data_type: DataType) -> ArrayDataBuilder

Returns a builder to construct a ArrayData instance of the same [DataType]

Source

pub const fn data_type(&self) -> &DataType

Returns a reference to the [DataType] of this ArrayData

Source

pub fn buffers(&self) -> &[Buffer]

Returns the [Buffer] storing data for this ArrayData

Source

pub fn child_data(&self) -> &[ArrayData]

Returns a slice of children ArrayData. This will be non empty for type such as lists and structs.

Source

pub fn is_null(&self, i: usize) -> bool

Returns whether the element at index i is null

Source

pub fn nulls(&self) -> Option<&NullBuffer>

Returns a reference to the null buffer of this ArrayData if any

Note: ArrayData::offset does NOT apply to the returned [NullBuffer]

Source

pub fn is_valid(&self, i: usize) -> bool

Returns whether the element at index i is not null

Source

pub const fn len(&self) -> usize

Returns the length (i.e., number of elements) of this ArrayData.

Source

pub const fn is_empty(&self) -> bool

Returns whether this ArrayData is empty

Source

pub const fn offset(&self) -> usize

Returns the offset of this ArrayData

Source

pub fn null_count(&self) -> usize

Returns the total number of nulls in this array

Source

pub fn get_buffer_memory_size(&self) -> usize

Returns the total number of bytes of memory occupied by the buffers owned by this ArrayData and all of its children. (See also diagram on ArrayData).

Note that this ArrayData may only refer to a subset of the data in the underlying [Buffer]s (due to offset and length), but the size returned includes the entire size of the buffers.

If multiple ArrayDatas refer to the same underlying [Buffer]s they will both report the same size.

Source

pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>

Returns the total number of the bytes of memory occupied by the buffers by this slice of ArrayData (See also diagram on ArrayData).

This is approximately the number of bytes if a new ArrayData was formed by creating new [Buffer]s with exactly the data needed.

For example, a [DataType::Int64] with 100 elements, Self::get_slice_memory_size would return 100 * 8 = 800. If the ArrayData was then Self::sliceed to refer to its first 20 elements, then Self::get_slice_memory_size on the sliced ArrayData would return 20 * 8 = 160.

Source

pub fn get_array_memory_size(&self) -> usize

Returns the total number of bytes of memory occupied physically by this ArrayData and all its [Buffer]s and children. (See also diagram on ArrayData).

Equivalent to: size_of_val(self) + Self::get_buffer_memory_size + size_of_val(child) for all children

Source

pub fn slice(&self, offset: usize, length: usize) -> ArrayData

Creates a zero-copy slice of itself. This creates a new ArrayData pointing at the same underlying [Buffer]s with a different offset and len

§Panics

Panics if offset + length > self.len().

Source

pub fn buffer<T: ArrowNativeType>(&self, buffer: usize) -> &[T]

Returns the buffer as a slice of type T starting at self.offset

§Panics

This function panics if:

  • the buffer is not byte-aligned with type T, or
  • the datatype is Boolean (it corresponds to a bit-packed buffer where the offset is not applicable)
Source

pub fn new_null(data_type: &DataType, len: usize) -> Self

Returns a new ArrayData valid for data_type containing len null values

Source

pub fn new_empty(data_type: &DataType) -> Self

Returns a new empty ArrayData valid for data_type.

Source

pub fn align_buffers(&mut self)

Verifies that the buffers meet the minimum alignment requirements for the data type

Buffers that are not adequately aligned will be copied to a new aligned allocation

This can be useful for when interacting with data sent over IPC or FFI, that may not meet the minimum alignment requirements

This also aligns buffers of children data

Source

pub fn validate(&self) -> Result<(), ArrowError>

“cheap” validation of an ArrayData. Ensures buffers are sufficiently sized to store len + offset total elements of data_type and performs other inexpensive consistency checks.

This check is “cheap” in the sense that it does not validate the contents of the buffers (e.g. that all offsets for UTF8 arrays are within the bounds of the values buffer).

See ArrayData::validate_data to validate fully the offset content and the validity of utf8 data

Source

fn typed_offsets<T: ArrowNativeType + Num>(&self) -> Result<&[T], ArrowError>

Returns a reference to the data in buffer as a typed slice (typically &[i32] or &[i64]) after validating. The returned slice is guaranteed to have at least self.len + 1 entries.

For an empty array, the buffer can also be empty.

Source

fn typed_buffer<T: ArrowNativeType + Num>( &self, idx: usize, len: usize, ) -> Result<&[T], ArrowError>

Returns a reference to the data in buffers[idx] as a typed slice after validating

Source

fn validate_offsets<T: ArrowNativeType + Num + Display>( &self, values_length: usize, ) -> Result<(), ArrowError>

Does a cheap sanity check that the self.len values in buffer are valid offsets (of type T) into some other buffer of values_length bytes long

Source

fn validate_offsets_and_sizes<T: ArrowNativeType + Num + Display>( &self, values_length: usize, ) -> Result<(), ArrowError>

Does a cheap sanity check that the self.len values in buffer are valid offsets and sizes (of type T) into some other buffer of values_length bytes long

Source

fn validate_child_data(&self) -> Result<(), ArrowError>

Validates the layout of child_data ArrayData structures

Source

fn get_single_valid_child_data( &self, expected_type: &DataType, ) -> Result<&ArrayData, ArrowError>

Ensures that this array data has a single child_data with the expected type, and calls validate() on it. Returns a reference to that child_data

Source

fn validate_num_child_data(&self, expected_len: usize) -> Result<(), ArrowError>

Returns Err if self.child_data does not have exactly expected_len elements

Source

fn get_valid_child_data( &self, i: usize, expected_type: &DataType, ) -> Result<&ArrayData, ArrowError>

Ensures that child_data[i] has the expected type, calls validate() on it, and returns a reference to that child_data

Source

pub fn validate_data(&self) -> Result<(), ArrowError>

Validate that the data contained within this ArrayData is valid

  1. Null count is correct
  2. All offsets are valid
  3. All String data is valid UTF-8
  4. All dictionary offsets are valid

Internally this calls:

Note: this does not recurse into children, for a recursive variant see Self::validate_full

Source

pub fn validate_full(&self) -> Result<(), ArrowError>

Performs a full recursive validation of this ArrayData and all its children

This is equivalent to calling Self::validate_data on this ArrayData and all its children recursively

Source

pub fn validate_nulls(&self) -> Result<(), ArrowError>

Validates the values stored within this ArrayData are valid without recursing into child ArrayData

Does not (yet) check

  1. Union type_ids are valid see #85
  2. the the null count is correct and that any
  3. nullability requirements of its children are correct
Source

fn validate_non_nullable( &self, mask: Option<&NullBuffer>, child: &ArrayData, ) -> Result<(), ArrowError>

Verifies that child contains no nulls not present in mask

Source

pub fn validate_values(&self) -> Result<(), ArrowError>

Validates the values stored within this ArrayData are valid without recursing into child ArrayData

Does not (yet) check

  1. Union type_ids are valid see #85
Source

fn validate_each_offset<T, V>( &self, offset_limit: usize, validate: V, ) -> Result<(), ArrowError>
where T: ArrowNativeType + TryInto<usize> + Num + Display, V: Fn(usize, Range<usize>) -> Result<(), ArrowError>,

Calls the validate(item_index, range) function for each of the ranges specified in the arrow offsets buffer of type T. Also validates that each offset is smaller than offset_limit

For an empty array, the offsets buffer can either be empty or contain a single 0.

For example, the offsets buffer contained [1, 2, 4], this function would call validate([1,2]), and validate([2,4])

Source

fn validate_utf8<T>(&self) -> Result<(), ArrowError>
where T: ArrowNativeType + TryInto<usize> + Num + Display,

Ensures that all strings formed by the offsets in buffers[0] into buffers[1] are valid utf8 sequences

Source

fn validate_offsets_full<T>( &self, offset_limit: usize, ) -> Result<(), ArrowError>
where T: ArrowNativeType + TryInto<usize> + Num + Display,

Ensures that all offsets in buffers[0] into buffers[1] are between 0 and offset_limit

Source

fn check_bounds<T>(&self, max_value: i64) -> Result<(), ArrowError>
where T: ArrowNativeType + TryInto<i64> + Num + Display,

Validates that each value in self.buffers (typed as T) is within the range [0, max_value], inclusive

Source

fn check_run_ends<T>(&self) -> Result<(), ArrowError>
where T: ArrowNativeType + TryInto<i64> + Num + Display,

Validates that each value in run_ends array is positive and strictly increasing.

Source

pub fn ptr_eq(&self, other: &Self) -> bool

Returns true if this ArrayData is equal to other, using pointer comparisons to determine buffer equality. This is cheaper than PartialEq::eq but may return false when the arrays are logically equal

Source

pub fn into_builder(self) -> ArrayDataBuilder

Converts this ArrayData into an ArrayDataBuilder

Trait Implementations§

Source§

impl Clone for ArrayData

Source§

fn clone(&self) -> ArrayData

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ArrayData

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl From<ArrayData> for ArrayDataBuilder

Source§

fn from(d: ArrayData) -> Self

Converts to this type from the input type.
Source§

impl PartialEq for ArrayData

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,