RunEndBuffer

Struct RunEndBuffer 

pub struct RunEndBuffer<E>
where E: ArrowNativeType,
{ run_ends: ScalarBuffer<E>, logical_length: usize, logical_offset: usize, }
Expand description

A buffer of monotonically increasing, positive integers used to store run-ends.

Used to compactly represent runs of the same value. Values being represented are stored in a separate buffer from this struct. See RunArray for an example of how this is used with a companion array to represent the values.

§Logical vs Physical

Physically, each value in the run_ends buffer is the cumulative length of all runs in the logical representation, up to that physical index. Consider the following example:

          physical                        logical
    ┌─────────┬─────────┐           ┌─────────┬─────────┐
    │    3    │    0    │ ◄──────┬─ │    A    │    0    │
    ├─────────┼─────────┤        │  ├─────────┼─────────┤
    │    4    │    1    │ ◄────┐ ├─ │    A    │    1    │
    ├─────────┼─────────┤      │ │  ├─────────┼─────────┤
    │    6    │    2    │ ◄──┐ │ └─ │    A    │    2    │
    └─────────┴─────────┘    │ │    ├─────────┼─────────┤
     run-ends    index       │ └─── │    B    │    3    │
                             │      ├─────────┼─────────┤
     logical_offset = 0      ├───── │    C    │    4    │
     logical_length = 6      │      ├─────────┼─────────┤
                             └───── │    C    │    5    │
                                    └─────────┴─────────┘
                                      values     index

A RunEndBuffer is physically the buffer and offset with length on the left. In this case, the offset and length represent the whole buffer, so it is essentially unsliced. See the section below on slicing for more details on how this buffer handles slicing.

This means that multiple logical values are represented in the same physical index, and multiple logical indices map to the same physical index. The RunEndBuffer containing [3, 4, 6] is essentially the physical indices [0, 0, 0, 1, 2, 2], and having a separately stored buffer of values such as [A, B, C] can turn this into a representation of [A, A, A, B, C, C].

§Slicing

In order to provide zero-copy slicing, this struct stores a separate logical offset and length. Consider the following example:

          physical                        logical
    ┌─────────┬─────────┐           ┌ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ┐
    │    3    │    0    │ ◄──────┐       A         0
    ├─────────┼─────────┤        │  ├── ─ ─ ─ ┼ ─ ─ ─ ─ ┤
    │    4    │    1    │ ◄────┐ │       A         1
    ├─────────┼─────────┤      │ │  ├─────────┼─────────┤
    │    6    │    2    │ ◄──┐ │ └─ │    A    │    2    │◄─── logical_offset
    └─────────┴─────────┘    │ │    ├─────────┼─────────┤
     run-ends    index       │ └─── │    B    │    3    │
                             │      ├─────────┼─────────┤
     logical_offset = 2      └───── │    C    │    4    │
     logical_length = 3             ├─────────┼─────────┤
                                         C         5     ◄─── logical_offset + logical_length
                                    └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ┘
                                      values     index

The physical run_ends ScalarBuffer remains unchanged, in order to facilitate zero-copy. However, we now offset into the logical representation with an accompanying length. This allows us to represent values [A, B, C] using physical indices 0, 1, 2 with the same underlying physical buffer, at the cost of two extra usizes to represent the logical slice that was taken.

(A RunEndBuffer is considered unsliced when logical_offset is 0 and logical_length is equal to the last value in run_ends)

Fields§

§run_ends: ScalarBuffer<E>§logical_length: usize§logical_offset: usize

Implementations§

§

impl<E> RunEndBuffer<E>
where E: ArrowNativeType,

pub fn new( run_ends: ScalarBuffer<E>, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

Create a new RunEndBuffer from a ScalarBuffer, logical_offset and logical_length.

§Panics
  • run_ends does not contain strictly increasing values greater than zero
  • The last value of run_ends is less than logical_offset + logical_length

pub unsafe fn new_unchecked( run_ends: ScalarBuffer<E>, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

Create a new RunEndBuffer from a ScalarBuffer, logical_offset and logical_length.

§Safety
  • run_ends must contain strictly increasing values greater than zero
  • The last value of run_ends must be greater than or equal to logical_offset + logical_len

pub fn offset(&self) -> usize

Returns the logical offset into the run-ends stored by this buffer.

pub fn len(&self) -> usize

Returns the logical length of the run-ends stored by this buffer.

pub fn is_empty(&self) -> bool

Returns true if this buffer is logically empty.

pub fn shrink_to_fit(&mut self)

Free up unused memory.

pub fn values(&self) -> &[E]

Returns the physical (unsliced) run ends of this buffer.

Take care when operating on these values as it doesn’t take into account any logical slicing that may have occurred.

pub fn max_value(&self) -> usize

Returns the maximum run-end encoded in the underlying buffer; that is, the last physical run of the buffer. This does not take into account any logical slicing that may have occurred.

pub fn get_physical_index(&self, logical_index: usize) -> usize

Performs a binary search to find the physical index for the given logical index.

Useful for extracting the corresponding physical run_ends when this buffer is logically sliced.

The result is arbitrary if logical_index >= self.len().

pub fn get_start_physical_index(&self) -> usize

Returns the physical index at which the logical array starts.

The same as calling get_physical_index(0) but with a fast path if the buffer is not logically sliced, in which case it always returns 0.

pub fn get_end_physical_index(&self) -> usize

Returns the physical index at which the logical array ends.

The same as calling get_physical_index(length - 1) but with a fast path if the buffer is not logically sliced, in which case it returns length - 1.

pub fn slice( &self, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>

Slices this RunEndBuffer by the provided logical_offset and logical_length.

§Panics
  • Specified slice (logical_offset + logical_length) exceeds existing logical length

pub fn inner(&self) -> &ScalarBuffer<E>

Returns the inner ScalarBuffer.

pub fn into_inner(self) -> ScalarBuffer<E>

Returns the inner ScalarBuffer, consuming self.

pub fn get_physical_indices<I>( &self, logical_indices: &[I], ) -> Result<Vec<usize>, I>
where I: ArrowNativeType,

Returns the physical indices corresponding to the provided logical indices.

Given a slice of logical indices, this method returns a Vec containing the corresponding physical indices into the run-ends buffer.

This method operates by iterating the logical indices in sorted order, instead of finding the physical index for each logical index using binary search via the function RunEndBuffer::get_physical_index.

Running benchmarks on both approaches showed that the approach used here scaled well for larger inputs.

See https://github.com/apache/arrow-rs/pull/3622#issuecomment-1407753727 for more details.

§Errors

If any logical index is out of bounds (>= self.len()), returns an error containing the invalid index.

Trait Implementations§

§

impl<E> Clone for RunEndBuffer<E>

§

fn clone(&self) -> RunEndBuffer<E>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
§

impl<E> Debug for RunEndBuffer<E>

§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<E> Freeze for RunEndBuffer<E>

§

impl<E> RefUnwindSafe for RunEndBuffer<E>
where E: RefUnwindSafe,

§

impl<E> Send for RunEndBuffer<E>

§

impl<E> Sync for RunEndBuffer<E>

§

impl<E> Unpin for RunEndBuffer<E>
where E: Unpin,

§

impl<E> UnwindSafe for RunEndBuffer<E>
where E: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

§

impl<T> Ungil for T
where T: Send,