Struct RunEndBuffer
pub struct RunEndBuffer<E>where
E: ArrowNativeType,{
run_ends: ScalarBuffer<E>,
logical_length: usize,
logical_offset: usize,
}Expand description
A buffer of monotonically increasing, positive integers used to store run-ends.
Used to compactly represent runs of the same value. Values being represented
are stored in a separate buffer from this struct. See RunArray for an example
of how this is used with a companion array to represent the values.
§Logical vs Physical
Physically, each value in the run_ends buffer is the cumulative length of
all runs in the logical representation, up to that physical index. Consider
the following example:
physical logical
┌─────────┬─────────┐ ┌─────────┬─────────┐
│ 3 │ 0 │ ◄──────┬─ │ A │ 0 │
├─────────┼─────────┤ │ ├─────────┼─────────┤
│ 4 │ 1 │ ◄────┐ ├─ │ A │ 1 │
├─────────┼─────────┤ │ │ ├─────────┼─────────┤
│ 6 │ 2 │ ◄──┐ │ └─ │ A │ 2 │
└─────────┴─────────┘ │ │ ├─────────┼─────────┤
run-ends index │ └─── │ B │ 3 │
│ ├─────────┼─────────┤
logical_offset = 0 ├───── │ C │ 4 │
logical_length = 6 │ ├─────────┼─────────┤
└───── │ C │ 5 │
└─────────┴─────────┘
values indexA RunEndBuffer is physically the buffer and offset with length on the left.
In this case, the offset and length represent the whole buffer, so it is essentially
unsliced. See the section below on slicing for more details on how this buffer
handles slicing.
This means that multiple logical values are represented in the same physical index,
and multiple logical indices map to the same physical index. The RunEndBuffer
containing [3, 4, 6] is essentially the physical indices [0, 0, 0, 1, 2, 2],
and having a separately stored buffer of values such as [A, B, C] can turn
this into a representation of [A, A, A, B, C, C].
§Slicing
In order to provide zero-copy slicing, this struct stores a separate logical offset and length. Consider the following example:
physical logical
┌─────────┬─────────┐ ┌ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ┐
│ 3 │ 0 │ ◄──────┐ A 0
├─────────┼─────────┤ │ ├── ─ ─ ─ ┼ ─ ─ ─ ─ ┤
│ 4 │ 1 │ ◄────┐ │ A 1
├─────────┼─────────┤ │ │ ├─────────┼─────────┤
│ 6 │ 2 │ ◄──┐ │ └─ │ A │ 2 │◄─── logical_offset
└─────────┴─────────┘ │ │ ├─────────┼─────────┤
run-ends index │ └─── │ B │ 3 │
│ ├─────────┼─────────┤
logical_offset = 2 └───── │ C │ 4 │
logical_length = 3 ├─────────┼─────────┤
C 5 ◄─── logical_offset + logical_length
└ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ┘
values indexThe physical run_ends ScalarBuffer remains unchanged, in order to facilitate
zero-copy. However, we now offset into the logical representation with an
accompanying length. This allows us to represent values [A, B, C] using physical
indices 0, 1, 2 with the same underlying physical buffer, at the cost of two
extra usizes to represent the logical slice that was taken.
(A RunEndBuffer is considered unsliced when logical_offset is 0 and
logical_length is equal to the last value in run_ends)
Fields§
§run_ends: ScalarBuffer<E>§logical_length: usize§logical_offset: usizeImplementations§
§impl<E> RunEndBuffer<E>where
E: ArrowNativeType,
impl<E> RunEndBuffer<E>where
E: ArrowNativeType,
pub fn new(
run_ends: ScalarBuffer<E>,
logical_offset: usize,
logical_length: usize,
) -> RunEndBuffer<E>
pub fn new( run_ends: ScalarBuffer<E>, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>
Create a new RunEndBuffer from a ScalarBuffer, logical_offset
and logical_length.
§Panics
run_endsdoes not contain strictly increasing values greater than zero- The last value of
run_endsis less thanlogical_offset + logical_length
pub unsafe fn new_unchecked(
run_ends: ScalarBuffer<E>,
logical_offset: usize,
logical_length: usize,
) -> RunEndBuffer<E>
pub unsafe fn new_unchecked( run_ends: ScalarBuffer<E>, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>
Create a new RunEndBuffer from a ScalarBuffer, logical_offset
and logical_length.
§Safety
run_endsmust contain strictly increasing values greater than zero- The last value of
run_endsmust be greater than or equal tological_offset + logical_len
pub fn shrink_to_fit(&mut self)
pub fn shrink_to_fit(&mut self)
Free up unused memory.
pub fn values(&self) -> &[E]
pub fn values(&self) -> &[E]
Returns the physical (unsliced) run ends of this buffer.
Take care when operating on these values as it doesn’t take into account any logical slicing that may have occurred.
pub fn max_value(&self) -> usize
pub fn max_value(&self) -> usize
Returns the maximum run-end encoded in the underlying buffer; that is, the last physical run of the buffer. This does not take into account any logical slicing that may have occurred.
pub fn get_physical_index(&self, logical_index: usize) -> usize
pub fn get_physical_index(&self, logical_index: usize) -> usize
Performs a binary search to find the physical index for the given logical index.
Useful for extracting the corresponding physical run_ends when this buffer
is logically sliced.
The result is arbitrary if logical_index >= self.len().
pub fn get_start_physical_index(&self) -> usize
pub fn get_start_physical_index(&self) -> usize
Returns the physical index at which the logical array starts.
The same as calling get_physical_index(0) but with a fast path if the
buffer is not logically sliced, in which case it always returns 0.
pub fn get_end_physical_index(&self) -> usize
pub fn get_end_physical_index(&self) -> usize
Returns the physical index at which the logical array ends.
The same as calling get_physical_index(length - 1) but with a fast path
if the buffer is not logically sliced, in which case it returns length - 1.
pub fn slice(
&self,
logical_offset: usize,
logical_length: usize,
) -> RunEndBuffer<E>
pub fn slice( &self, logical_offset: usize, logical_length: usize, ) -> RunEndBuffer<E>
Slices this RunEndBuffer by the provided logical_offset and logical_length.
§Panics
- Specified slice (
logical_offset+logical_length) exceeds existing logical length
pub fn inner(&self) -> &ScalarBuffer<E>
pub fn inner(&self) -> &ScalarBuffer<E>
Returns the inner ScalarBuffer.
pub fn into_inner(self) -> ScalarBuffer<E>
pub fn into_inner(self) -> ScalarBuffer<E>
Returns the inner ScalarBuffer, consuming self.
pub fn get_physical_indices<I>(
&self,
logical_indices: &[I],
) -> Result<Vec<usize>, I>where
I: ArrowNativeType,
pub fn get_physical_indices<I>(
&self,
logical_indices: &[I],
) -> Result<Vec<usize>, I>where
I: ArrowNativeType,
Returns the physical indices corresponding to the provided logical indices.
Given a slice of logical indices, this method returns a Vec containing the
corresponding physical indices into the run-ends buffer.
This method operates by iterating the logical indices in sorted order, instead of
finding the physical index for each logical index using binary search via
the function RunEndBuffer::get_physical_index.
Running benchmarks on both approaches showed that the approach used here scaled well for larger inputs.
See https://github.com/apache/arrow-rs/pull/3622#issuecomment-1407753727 for more details.
§Errors
If any logical index is out of bounds (>= self.len()), returns an error containing the invalid index.
Trait Implementations§
§impl<E> Clone for RunEndBuffer<E>where
E: Clone + ArrowNativeType,
impl<E> Clone for RunEndBuffer<E>where
E: Clone + ArrowNativeType,
§fn clone(&self) -> RunEndBuffer<E>
fn clone(&self) -> RunEndBuffer<E>
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more