Struct GenericByteViewArray

Source

pub struct GenericByteViewArray<T: ByteViewType + ?Sized> {
    data_type: DataType,
    views: ScalarBuffer<u128>,
    buffers: Vec<Buffer>,
    phantom: PhantomData<T>,
    nulls: Option<NullBuffer>,
}

Expand description

Variable-size Binary View Layout: An array of variable length bytes views.

This array type is used to store variable length byte data (e.g. Strings, Binary) and has efficient operations such as take, filter, and comparison.

This is different from GenericByteArray, which also stores variable length byte data, as it represents strings with an offset and length. take and filter like operations are implemented by manipulating the “views” (u128) without modifying the bytes. Each view also stores an inlined prefix which speed up comparisons.

§See Also

StringViewArray for storing utf8 encoded string data
BinaryViewArray for storing bytes
ByteView to interpret u128s layout of the views.

§Layout: “views” and buffers

A GenericByteViewArray stores variable length byte strings. An array of N elements is stored as N fixed length “views” and a variable number of variable length “buffers”.

Each view is a u128 value whose layout is different depending on the length of the string stored at that location:

                        ┌──────┬────────────────────────┐
                        │length│      string value      │
   Strings (len <= 12)  │      │    (padded with 0)     │
                        └──────┴────────────────────────┘
                         0    31                      127

                        ┌───────┬───────┬───────┬───────┐
                        │length │prefix │  buf  │offset │
   Strings (len > 12)   │       │       │ index │       │
                        └───────┴───────┴───────┴───────┘
                         0    31       63      95    127

Strings with length <= 12 are stored directly in the view. See Self::inline_value to access the inlined prefix from a short view.
Strings with length > 12: The first four bytes are stored inline in the view and the entire string is stored in one of the buffers. See ByteView to access the fields of the these views.

As with other arrays, the optimized kernels in arrow_compute are likely the easiest and fastest way to work with this data. However, it is possible to access the views and buffers directly for more control.

For example

use arrow_data::ByteView;
let array = StringViewArray::from(vec![
  "hello",
  "this string is longer than 12 bytes",
  "this string is also longer than 12 bytes"
]);

// ** Examine the first view (short string) **
assert!(array.is_valid(0)); // Check for nulls
let short_view: u128 = array.views()[0]; // "hello"
// get length of the string
let len = short_view as u32;
assert_eq!(len, 5); // strings less than 12 bytes are stored in the view
// SAFETY: `view` is a valid view
let value = unsafe {
  StringViewArray::inline_value(&short_view, len as usize)
};
assert_eq!(value, b"hello");

// ** Examine the third view (long string) **
assert!(array.is_valid(12)); // Check for nulls
let long_view: u128 = array.views()[2]; // "this string is also longer than 12 bytes"
let len = long_view as u32;
assert_eq!(len, 40); // strings longer than 12 bytes are stored in the buffer
let view = ByteView::from(long_view); // use ByteView to access the fields
assert_eq!(view.length, 40);
assert_eq!(view.buffer_index, 0);
assert_eq!(view.offset, 35); // data starts after the first long string
// Views for long strings store a 4 byte prefix
let prefix = view.prefix.to_le_bytes();
assert_eq!(&prefix, b"this");
let value = array.value(2); // get the string value (see `value` implementation for how to access the bytes directly)
assert_eq!(value, "this string is also longer than 12 bytes");

Unlike GenericByteArray, there are no constraints on the offsets other than they must point into a valid buffer. However, they can be out of order, non continuous and overlapping.

For example, in the following diagram, the strings “FishWasInTownToday” and “CrumpleFacedFish” are both longer than 12 bytes and thus are stored in a separate buffer while the string “LavaMonster” is stored inlined in the view. In this case, the same bytes for “Fish” are used to store both strings.

                                                                           ┌───┐
                        ┌──────┬──────┬──────┬──────┐               offset │...│
"FishWasInTownTodayYay" │  21  │ Fish │  0   │ 115  │─ ─              103  │Mr.│
                        └──────┴──────┴──────┴──────┘   │      ┌ ─ ─ ─ ─ ▶ │Cru│
                        ┌──────┬──────┬──────┬──────┐                      │mpl│
"CrumpleFacedFish"      │  16  │ Crum │  0   │ 103  │─ ─│─ ─ ─ ┘           │eFa│
                        └──────┴──────┴──────┴──────┘                      │ced│
                        ┌──────┬────────────────────┐   └ ─ ─ ─ ─ ─ ─ ─ ─ ▶│Fis│
"LavaMonster"           │  11  │   LavaMonster\0    │                      │hWa│
                        └──────┴────────────────────┘               offset │sIn│
                                                                      115  │Tow│
                                                                           │nTo│
                                                                           │day│
                                 u128 "views"                              │Yay│
                                                                  buffer 0 │...│
                                                                           └───┘

Fields§

§data_type: DataType§views: ScalarBuffer<u128>§buffers: Vec<Buffer>§phantom: PhantomData<T>§nulls: Option<NullBuffer>

Struct GenericByteViewArrayCopy item path

§See Also

§Layout: “views” and buffers

Fields§

Implementations§

impl<T: ByteViewType + ?Sized> GenericByteViewArray<T>

pub fn new( views: ScalarBuffer<u128>, buffers: Vec<Buffer>, nulls: Option<NullBuffer>, ) -> Self

§Panics

pub fn try_new( views: ScalarBuffer<u128>, buffers: Vec<Buffer>, nulls: Option<NullBuffer>, ) -> Result<Self, ArrowError>

§Errors

pub unsafe fn new_unchecked( views: ScalarBuffer<u128>, buffers: Vec<Buffer>, nulls: Option<NullBuffer>, ) -> Self

§Safety

pub fn new_null(len: usize) -> Self

pub fn new_scalar(value: impl AsRef<T::Native>) -> Scalar<Self>

pub fn from_iter_values<Ptr, I>(iter: I) -> Selfwhere Ptr: AsRef<T::Native>, I: IntoIterator<Item = Ptr>,

pub fn into_parts(self) -> (ScalarBuffer<u128>, Vec<Buffer>, Option<NullBuffer>)

pub fn views(&self) -> &ScalarBuffer<u128>

pub fn data_buffers(&self) -> &[Buffer]

pub fn value(&self, i: usize) -> &T::Native

§Panics

pub unsafe fn value_unchecked(&self, idx: usize) -> &T::Native

§Safety

pub unsafe fn inline_value(view: &u128, len: usize) -> &[u8] ⓘ

§Safety

pub fn iter(&self) -> ArrayIter<&Self> ⓘ

pub fn bytes_iter(&self) -> impl Iterator<Item = &[u8]>

pub fn prefix_bytes_iter( &self, prefix_len: usize, ) -> impl Iterator<Item = &[u8]>

pub fn suffix_bytes_iter( &self, suffix_len: usize, ) -> impl Iterator<Item = &[u8]>

pub fn slice(&self, offset: usize, length: usize) -> Self

pub fn gc(&self) -> Self

§Garbage Collection

pub unsafe fn compare_unchecked( left: &GenericByteViewArray<T>, left_idx: usize, right: &GenericByteViewArray<T>, right_idx: usize, ) -> Ordering

§Order check flow

§Safety

impl GenericByteViewArray<BinaryViewType>

pub fn to_string_view(self) -> Result<StringViewArray, ArrowError>

pub unsafe fn to_string_view_unchecked(self) -> StringViewArray

§Safety

impl GenericByteViewArray<StringViewType>

pub fn to_binary_view(self) -> BinaryViewArray

pub fn is_ascii(&self) -> bool

Trait Implementations§

impl<T: ByteViewType + ?Sized> Array for GenericByteViewArray<T>

fn as_any(&self) -> &dyn Any

fn to_data(&self) -> ArrayData

fn into_data(self) -> ArrayData

fn data_type(&self) -> &DataType

fn slice(&self, offset: usize, length: usize) -> ArrayRef

fn len(&self) -> usize

fn is_empty(&self) -> bool

fn shrink_to_fit(&mut self)

fn offset(&self) -> usize

fn nulls(&self) -> Option<&NullBuffer>

fn logical_null_count(&self) -> usize

fn get_buffer_memory_size(&self) -> usize

fn get_array_memory_size(&self) -> usize

fn logical_nulls(&self) -> Option<NullBuffer>

fn is_null(&self, index: usize) -> bool

fn is_valid(&self, index: usize) -> bool

fn null_count(&self) -> usize

fn is_nullable(&self) -> bool

impl<'a, T: ByteViewType + ?Sized> ArrayAccessor for &'a GenericByteViewArray<T>

type Item = &'a <T as ByteViewType>::Native

fn value(&self, index: usize) -> Self::Item

unsafe fn value_unchecked(&self, index: usize) -> Self::Item

impl<T: ByteViewType + ?Sized> Clone for GenericByteViewArray<T>

fn clone(&self) -> Self

fn clone_from(&mut self, source: &Self)

impl<T: ByteViewType + ?Sized> Debug for GenericByteViewArray<T>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl<FROM, V> From<&GenericByteArray<FROM>> for GenericByteViewArray<V>where FROM: ByteArrayType, FROM::Offset: OffsetSizeTrait + ToPrimitive, V: ByteViewType<Native = FROM::Native>,

fn from(byte_array: &GenericByteArray<FROM>) -> Self

impl<T: ByteViewType + ?Sized> From<ArrayData> for GenericByteViewArray<T>

fn from(value: ArrayData) -> Self

impl<T: ByteViewType + ?Sized> From<GenericByteViewArray<T>> for ArrayData

fn from(array: GenericByteViewArray<T>) -> Self

impl<'a, Ptr, T> FromIterator<&'a Option<Ptr>> for GenericByteViewArray<T>where Ptr: AsRef<T::Native> + 'a, T: ByteViewType + ?Sized,

fn from_iter<I: IntoIterator<Item = &'a Option<Ptr>>>(iter: I) -> Self

impl<Ptr, T: ByteViewType + ?Sized> FromIterator<Option<Ptr>> for GenericByteViewArray<T>where Ptr: AsRef<T::Native>,

fn from_iter<I: IntoIterator<Item = Option<Ptr>>>(iter: I) -> Self

Struct GenericByteViewArray

pub fn from_iter_values<Ptr, I>(iter: I) -> Self
where Ptr: AsRef<T::Native>, I: IntoIterator<Item = Ptr>,

impl<FROM, V> From<&GenericByteArray<FROM>> for GenericByteViewArray<V>
where FROM: ByteArrayType, FROM::Offset: OffsetSizeTrait + ToPrimitive, V: ByteViewType<Native = FROM::Native>,

impl<'a, Ptr, T> FromIterator<&'a Option<Ptr>> for GenericByteViewArray<T>
where Ptr: AsRef<T::Native> + 'a, T: ByteViewType + ?Sized,

impl<Ptr, T: ByteViewType + ?Sized> FromIterator<Option<Ptr>> for GenericByteViewArray<T>
where Ptr: AsRef<T::Native>,

impl<T> Freeze for GenericByteViewArray<T>
where T: ?Sized,

impl<T> RefUnwindSafe for GenericByteViewArray<T>
where T: RefUnwindSafe + ?Sized,

impl<T> Send for GenericByteViewArray<T>
where T: ?Sized,

impl<T> Sync for GenericByteViewArray<T>
where T: ?Sized,

impl<T> Unpin for GenericByteViewArray<T>
where T: Unpin + ?Sized,

impl<T> UnwindSafe for GenericByteViewArray<T>
where T: UnwindSafe + ?Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T> Datum for T
where T: Array,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,