pub struct RunArray<R: RunEndIndexType> {
data_type: DataType,
run_ends: RunEndBuffer<R::Native>,
values: ArrayRef,
}
Expand description
An array of run-end encoded values
This encoding is variation on run-length encoding (RLE) and is good for representing data containing same values repeated consecutively.
RunArray
contains run_ends
array and values
array of same length.
The run_ends
array stores the indexes at which the run ends. The values
array
stores the value of each run. Below example illustrates how a logical array is represented in
RunArray
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐
┌─────────────────┐ ┌─────────┐ ┌─────────────────┐
│ │ A │ │ 2 │ │ │ A │
├─────────────────┤ ├─────────┤ ├─────────────────┤
│ │ D │ │ 3 │ │ │ A │ run length of 'A' = runs_ends[0] - 0 = 2
├─────────────────┤ ├─────────┤ ├─────────────────┤
│ │ B │ │ 6 │ │ │ D │ run length of 'D' = run_ends[1] - run_ends[0] = 1
└─────────────────┘ └─────────┘ ├─────────────────┤
│ values run_ends │ │ B │
├─────────────────┤
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┘ │ B │
├─────────────────┤
RunArray │ B │ run length of 'B' = run_ends[2] - run_ends[1] = 3
length = 3 └─────────────────┘
Logical array
Contents
Fields§
§data_type: DataType
§run_ends: RunEndBuffer<R::Native>
§values: ArrayRef
Implementations§
Source§impl<R: RunEndIndexType> RunArray<R>
impl<R: RunEndIndexType> RunArray<R>
Sourcepub fn logical_len(run_ends: &PrimitiveArray<R>) -> usize
pub fn logical_len(run_ends: &PrimitiveArray<R>) -> usize
Calculates the logical length of the array encoded by the given run_ends array.
Sourcepub fn try_new(
run_ends: &PrimitiveArray<R>,
values: &dyn Array,
) -> Result<Self, ArrowError>
pub fn try_new( run_ends: &PrimitiveArray<R>, values: &dyn Array, ) -> Result<Self, ArrowError>
Attempts to create RunArray using given run_ends (index where a run ends) and the values (value of the run). Returns an error if the given data is not compatible with RunEndEncoded specification.
Sourcepub fn values(&self) -> &ArrayRef
pub fn values(&self) -> &ArrayRef
Returns a reference to values array
Note: any slicing of this RunArray
array is not applied to the returned array
and must be handled separately
Sourcepub fn get_start_physical_index(&self) -> usize
pub fn get_start_physical_index(&self) -> usize
Returns the physical index at which the array slice starts.
Sourcepub fn get_end_physical_index(&self) -> usize
pub fn get_end_physical_index(&self) -> usize
Returns the physical index at which the array slice ends.
Sourcepub fn downcast<V: 'static>(&self) -> Option<TypedRunArray<'_, R, V>>
pub fn downcast<V: 'static>(&self) -> Option<TypedRunArray<'_, R, V>>
Downcast this RunArray
to a TypedRunArray
use arrow_array::{Array, ArrayAccessor, RunArray, StringArray, types::Int32Type};
let orig = [Some("a"), Some("b"), None];
let run_array = RunArray::<Int32Type>::from_iter(orig);
let typed = run_array.downcast::<StringArray>().unwrap();
assert_eq!(typed.value(0), "a");
assert_eq!(typed.value(1), "b");
assert!(typed.values().is_null(2));
Sourcepub fn get_physical_index(&self, logical_index: usize) -> usize
pub fn get_physical_index(&self, logical_index: usize) -> usize
Returns index to the physical array for the given index to the logical array.
This function adjusts the input logical index based on ArrayData::offset
Performs a binary search on the run_ends array for the input index.
The result is arbitrary if logical_index >= self.len()
Sourcepub fn get_physical_indices<I>(
&self,
logical_indices: &[I],
) -> Result<Vec<usize>, ArrowError>where
I: ArrowNativeType,
pub fn get_physical_indices<I>(
&self,
logical_indices: &[I],
) -> Result<Vec<usize>, ArrowError>where
I: ArrowNativeType,
Returns the physical indices of the input logical indices. Returns error if any of the logical
index cannot be converted to physical index. The logical indices are sorted and iterated along
with run_ends array to find matching physical index. The approach used here was chosen over
finding physical index for each logical index using binary search using the function
get_physical_index
. Running benchmarks on both approaches showed that the approach used here
scaled well for larger inputs.
See https://github.com/apache/arrow-rs/pull/3622#issuecomment-1407753727 for more details.
Trait Implementations§
Source§impl<T: RunEndIndexType> Array for RunArray<T>
impl<T: RunEndIndexType> Array for RunArray<T>
Source§fn data_type(&self) -> &DataType
fn data_type(&self) -> &DataType
DataType
] of this array. Read moreSource§fn slice(&self, offset: usize, length: usize) -> ArrayRef
fn slice(&self, offset: usize, length: usize) -> ArrayRef
Source§fn shrink_to_fit(&mut self)
fn shrink_to_fit(&mut self)
Source§fn offset(&self) -> usize
fn offset(&self) -> usize
0
. Read moreSource§fn nulls(&self) -> Option<&NullBuffer>
fn nulls(&self) -> Option<&NullBuffer>
Source§fn logical_nulls(&self) -> Option<NullBuffer>
fn logical_nulls(&self) -> Option<NullBuffer>
NullBuffer
] that represents the logical
null values of this array, if any. Read moreSource§fn is_nullable(&self) -> bool
fn is_nullable(&self) -> bool
false
if the array is guaranteed to not contain any logical nulls Read moreSource§fn get_buffer_memory_size(&self) -> usize
fn get_buffer_memory_size(&self) -> usize
Source§fn get_array_memory_size(&self) -> usize
fn get_array_memory_size(&self) -> usize
get_buffer_memory_size()
and
includes the overhead of the data structures that contain the pointers to the various buffers.Source§fn null_count(&self) -> usize
fn null_count(&self) -> usize
Source§fn logical_null_count(&self) -> usize
fn logical_null_count(&self) -> usize
Source§impl<R: RunEndIndexType> Clone for RunArray<R>
impl<R: RunEndIndexType> Clone for RunArray<R>
Source§impl<R: RunEndIndexType> Debug for RunArray<R>
impl<R: RunEndIndexType> Debug for RunArray<R>
Source§impl<'a, T: RunEndIndexType> FromIterator<&'a str> for RunArray<T>
impl<'a, T: RunEndIndexType> FromIterator<&'a str> for RunArray<T>
Constructs a RunArray
from an iterator of strings.
§Example:
use arrow_array::{RunArray, PrimitiveArray, StringArray, types::Int16Type};
let test = vec!["a", "a", "b", "c"];
let array: RunArray<Int16Type> = test.into_iter().collect();
assert_eq!(
"RunArray {run_ends: [2, 3, 4], values: StringArray\n[\n \"a\",\n \"b\",\n \"c\",\n]}\n",
format!("{:?}", array)
);
Source§impl<'a, T: RunEndIndexType> FromIterator<Option<&'a str>> for RunArray<T>
impl<'a, T: RunEndIndexType> FromIterator<Option<&'a str>> for RunArray<T>
Constructs a RunArray
from an iterator of optional strings.
§Example:
use arrow_array::{RunArray, PrimitiveArray, StringArray, types::Int16Type};
let test = vec!["a", "a", "b", "c", "c"];
let array: RunArray<Int16Type> = test
.iter()
.map(|&x| if x == "b" { None } else { Some(x) })
.collect();
assert_eq!(
"RunArray {run_ends: [2, 3, 5], values: StringArray\n[\n \"a\",\n null,\n \"c\",\n]}\n",
format!("{:?}", array)
);