Struct RunEndBuffer

pub struct RunEndBuffer<E>where
    E: ArrowNativeType,{
    run_ends: ScalarBuffer<E>,
    logical_length: usize,
    logical_offset: usize,
}

Expand description

A buffer of monotonically increasing, positive integers used to store run-ends.

Used to compactly represent runs of the same value. Values being represented are stored in a separate buffer from this struct. See RunArray for an example of how this is used with a companion array to represent the values.

§Logical vs Physical

Physically, each value in the run_ends buffer is the cumulative length of all runs in the logical representation, up to that physical index. Consider the following example:

          physical                        logical
    ┌─────────┬─────────┐           ┌─────────┬─────────┐
    │    3    │    0    │ ◄──────┬─ │    A    │    0    │
    ├─────────┼─────────┤        │  ├─────────┼─────────┤
    │    4    │    1    │ ◄────┐ ├─ │    A    │    1    │
    ├─────────┼─────────┤      │ │  ├─────────┼─────────┤
    │    6    │    2    │ ◄──┐ │ └─ │    A    │    2    │
    └─────────┴─────────┘    │ │    ├─────────┼─────────┤
     run-ends    index       │ └─── │    B    │    3    │
                             │      ├─────────┼─────────┤
     logical_offset = 0      ├───── │    C    │    4    │
     logical_length = 6      │      ├─────────┼─────────┤
                             └───── │    C    │    5    │
                                    └─────────┴─────────┘
                                      values     index

A RunEndBuffer is physically the buffer and offset with length on the left. In this case, the offset and length represent the whole buffer, so it is essentially unsliced. See the section below on slicing for more details on how this buffer handles slicing.

This means that multiple logical values are represented in the same physical index, and multiple logical indices map to the same physical index. The RunEndBuffer containing [3, 4, 6] is essentially the physical indices [0, 0, 0, 1, 2, 2], and having a separately stored buffer of values such as [A, B, C] can turn this into a representation of [A, A, A, B, C, C].

§Slicing

In order to provide zero-copy slicing, this struct stores a separate logical offset and length. Consider the following example:

          physical                        logical
    ┌─────────┬─────────┐           ┌ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ┐
    │    3    │    0    │ ◄──────┐       A         0
    ├─────────┼─────────┤        │  ├── ─ ─ ─ ┼ ─ ─ ─ ─ ┤
    │    4    │    1    │ ◄────┐ │       A         1
    ├─────────┼─────────┤      │ │  ├─────────┼─────────┤
    │    6    │    2    │ ◄──┐ │ └─ │    A    │    2    │◄─── logical_offset
    └─────────┴─────────┘    │ │    ├─────────┼─────────┤
     run-ends    index       │ └─── │    B    │    3    │
                             │      ├─────────┼─────────┤
     logical_offset = 2      └───── │    C    │    4    │
     logical_length = 3             ├─────────┼─────────┤
                                         C         5     ◄─── logical_offset + logical_length
                                    └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ┘
                                      values     index

The physical run_ends ScalarBuffer remains unchanged, in order to facilitate zero-copy. However, we now offset into the logical representation with an accompanying length. This allows us to represent values [A, B, C] using physical indices 0, 1, 2 with the same underlying physical buffer, at the cost of two extra usizes to represent the logical slice that was taken.

(A RunEndBuffer is considered unsliced when logical_offset is 0 and logical_length is equal to the last value in run_ends)

Fields§

§run_ends: ScalarBuffer<E>§logical_length: usize§logical_offset: usize

Implementations§

§

impl<E> RunEndBuffer<E>
where E: ArrowNativeType,

pub fn new( run_ends: ScalarBuffer<E>, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

Create a new RunEndBuffer from a ScalarBuffer, logical_offset and logical_length.

§Panics

run_ends does not contain strictly increasing values greater than zero
The last value of run_ends is less than logical_offset + logical_length

pub unsafe fn new_unchecked( run_ends: ScalarBuffer<E>, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

Create a new RunEndBuffer from a ScalarBuffer, logical_offset and logical_length.

§Safety

run_ends must contain strictly increasing values greater than zero
The last value of run_ends must be greater than or equal to logical_offset + logical_len

pub fn offset(&self) -> usize

Returns the logical offset into the run-ends stored by this buffer.

pub fn len(&self) -> usize

Returns the logical length of the run-ends stored by this buffer.

pub fn is_empty(&self) -> bool

Returns true if this buffer is logically empty.

pub fn shrink_to_fit(&mut self)

Free up unused memory.

pub fn values(&self) -> &[E]

Returns the physical (unsliced) run ends of this buffer.

Take care when operating on these values as it doesn’t take into account any logical slicing that may have occurred.

pub fn sliced_values(&self) -> impl Iterator<Item = E>

Returns an iterator yielding run ends adjusted for the logical slice.

Each yielded value is subtracted by the logical_offset and capped at the logical_length.

pub fn max_value(&self) -> usize

Returns the maximum run-end encoded in the underlying buffer; that is, the last physical run of the buffer. This does not take into account any logical slicing that may have occurred.

pub fn get_physical_index(&self, logical_index: usize) -> usize

Performs a binary search to find the physical index for the given logical index.

Useful for extracting the corresponding physical run_ends when this buffer is logically sliced.

The result is arbitrary if logical_index >= self.len().

pub fn get_start_physical_index(&self) -> usize

Returns the physical index at which the logical array starts.

The same as calling get_physical_index(0) but with a fast path if the buffer is not logically sliced, in which case it always returns 0.

pub fn get_end_physical_index(&self) -> usize

Returns the physical index at which the logical array ends.

The same as calling get_physical_index(length - 1) but with a fast path if the buffer is not logically sliced, in which case it returns length - 1.

pub fn slice( &self, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

Slices this RunEndBuffer by the provided logical_offset and logical_length.

§Panics

Specified slice (logical_offset + logical_length) exceeds existing logical length

pub fn inner(&self) -> &ScalarBuffer<E>

Returns the inner ScalarBuffer.

pub fn into_inner(self) -> ScalarBuffer<E>

Returns the inner ScalarBuffer, consuming self.

pub fn get_physical_indices( &self, logical_indices: &[I], ) -> Result<Vec<usize>, I>
where I: ArrowNativeType,

Returns the physical indices corresponding to the provided logical indices.

Given a slice of logical indices, this method returns a Vec containing the corresponding physical indices into the run-ends buffer.

This method operates by iterating the logical indices in sorted order, instead of finding the physical index for each logical index using binary search via the function RunEndBuffer::get_physical_index.

Running benchmarks on both approaches showed that the approach used here scaled well for larger inputs.

See https://github.com/apache/arrow-rs/pull/3622#issuecomment-1407753727 for more details.

§Errors

If any logical index is out of bounds (>= self.len()), returns an error containing the invalid index.

Trait Implementations§

§

impl<E> Clone for RunEndBuffer<E>
where E: Clone + ArrowNativeType,

§

fn clone(&self) -> RunEndBuffer<E>

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

§

impl<E> Debug for RunEndBuffer<E>
where E: Debug + ArrowNativeType,

§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

Struct RunEndBuffer Copy item path

§Logical vs Physical

§Slicing

Fields§

Implementations§

impl<E> RunEndBuffer<E>where E: ArrowNativeType,

pub fn new( run_ends: ScalarBuffer<E>, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

§Panics

pub unsafe fn new_unchecked( run_ends: ScalarBuffer<E>, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

§Safety

pub fn offset(&self) -> usize

pub fn len(&self) -> usize

pub fn is_empty(&self) -> bool

pub fn shrink_to_fit(&mut self)

pub fn values(&self) -> &[E]

pub fn sliced_values(&self) -> impl Iterator<Item = E>

pub fn max_value(&self) -> usize

pub fn get_physical_index(&self, logical_index: usize) -> usize

pub fn get_start_physical_index(&self) -> usize

pub fn get_end_physical_index(&self) -> usize

pub fn slice( &self, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

§Panics

pub fn inner(&self) -> &ScalarBuffer<E>

pub fn into_inner(self) -> ScalarBuffer<E>

pub fn get_physical_indices<I>( &self, logical_indices: &[I], ) -> Result<Vec<usize>, I>where I: ArrowNativeType,

§Errors

Trait Implementations§

impl<E> Clone for RunEndBuffer<E>where E: Clone + ArrowNativeType,

fn clone(&self) -> RunEndBuffer<E>

fn clone_from(&mut self, source: &Self)

impl<E> Debug for RunEndBuffer<E>where E: Debug + ArrowNativeType,

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Auto Trait Implementations§

impl<E> Freeze for RunEndBuffer<E>

impl<E> RefUnwindSafe for RunEndBuffer<E>where E: RefUnwindSafe,

impl<E> Send for RunEndBuffer<E>

impl<E> Sync for RunEndBuffer<E>

impl<E> Unpin for RunEndBuffer<E>where E: Unpin,

impl<E> UnsafeUnpin for RunEndBuffer<E>

impl<E> UnwindSafe for RunEndBuffer<E>where E: UnwindSafe,

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

impl<T> Allocation for Twhere T: RefUnwindSafe + Send + Sync,

impl<T> Ungil for Twhere T: Send,

Struct RunEndBuffer

impl<E> RunEndBuffer<E>
where E: ArrowNativeType,

pub fn get_physical_indices<I>( &self, logical_indices: &[I], ) -> Result<Vec<usize>, I>
where I: ArrowNativeType,

impl<E> Clone for RunEndBuffer<E>
where E: Clone + ArrowNativeType,

impl<E> Debug for RunEndBuffer<E>
where E: Debug + ArrowNativeType,

impl<E> RefUnwindSafe for RunEndBuffer<E>
where E: RefUnwindSafe,

impl<E> Unpin for RunEndBuffer<E>
where E: Unpin,

impl<E> UnwindSafe for RunEndBuffer<E>
where E: UnwindSafe,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

impl<T> Ungil for T
where T: Send,