Struct RunArray
pub struct RunArray<R>where
R: RunEndIndexType,{
data_type: DataType,
run_ends: RunEndBuffer<<R as ArrowPrimitiveType>::Native>,
values: Arc<dyn Array>,
}
Expand description
An array of run-end encoded values
This encoding is variation on run-length encoding (RLE) and is good for representing data containing same values repeated consecutively.
RunArray
contains run_ends
array and values
array of same length.
The run_ends
array stores the indexes at which the run ends. The values
array
stores the value of each run. Below example illustrates how a logical array is represented in
RunArray
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐
┌─────────────────┐ ┌─────────┐ ┌─────────────────┐
│ │ A │ │ 2 │ │ │ A │
├─────────────────┤ ├─────────┤ ├─────────────────┤
│ │ D │ │ 3 │ │ │ A │ run length of 'A' = runs_ends[0] - 0 = 2
├─────────────────┤ ├─────────┤ ├─────────────────┤
│ │ B │ │ 6 │ │ │ D │ run length of 'D' = run_ends[1] - run_ends[0] = 1
└─────────────────┘ └─────────┘ ├─────────────────┤
│ values run_ends │ │ B │
├─────────────────┤
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┘ │ B │
├─────────────────┤
RunArray │ B │ run length of 'B' = run_ends[2] - run_ends[1] = 3
length = 3 └─────────────────┘
Logical array
Contents
Fields§
§data_type: DataType
§run_ends: RunEndBuffer<<R as ArrowPrimitiveType>::Native>
§values: Arc<dyn Array>
Implementations§
§impl<R> RunArray<R>where
R: RunEndIndexType,
impl<R> RunArray<R>where
R: RunEndIndexType,
pub fn logical_len(run_ends: &PrimitiveArray<R>) -> usize
pub fn logical_len(run_ends: &PrimitiveArray<R>) -> usize
Calculates the logical length of the array encoded by the given run_ends array.
pub fn try_new(
run_ends: &PrimitiveArray<R>,
values: &dyn Array,
) -> Result<RunArray<R>, ArrowError>
pub fn try_new( run_ends: &PrimitiveArray<R>, values: &dyn Array, ) -> Result<RunArray<R>, ArrowError>
Attempts to create RunArray using given run_ends (index where a run ends) and the values (value of the run). Returns an error if the given data is not compatible with RunEndEncoded specification.
pub fn run_ends(&self) -> &RunEndBuffer<<R as ArrowPrimitiveType>::Native>
pub fn run_ends(&self) -> &RunEndBuffer<<R as ArrowPrimitiveType>::Native>
Returns a reference to RunEndBuffer
pub fn values(&self) -> &Arc<dyn Array>
pub fn values(&self) -> &Arc<dyn Array>
Returns a reference to values array
Note: any slicing of this RunArray
array is not applied to the returned array
and must be handled separately
pub fn get_start_physical_index(&self) -> usize
pub fn get_start_physical_index(&self) -> usize
Returns the physical index at which the array slice starts.
pub fn get_end_physical_index(&self) -> usize
pub fn get_end_physical_index(&self) -> usize
Returns the physical index at which the array slice ends.
pub fn downcast<V>(&self) -> Option<TypedRunArray<'_, R, V>>where
V: 'static,
pub fn downcast<V>(&self) -> Option<TypedRunArray<'_, R, V>>where
V: 'static,
Downcast this RunArray
to a TypedRunArray
use arrow_array::{Array, ArrayAccessor, RunArray, StringArray, types::Int32Type};
let orig = [Some("a"), Some("b"), None];
let run_array = RunArray::<Int32Type>::from_iter(orig);
let typed = run_array.downcast::<StringArray>().unwrap();
assert_eq!(typed.value(0), "a");
assert_eq!(typed.value(1), "b");
assert!(typed.values().is_null(2));
pub fn get_physical_index(&self, logical_index: usize) -> usize
pub fn get_physical_index(&self, logical_index: usize) -> usize
Returns index to the physical array for the given index to the logical array.
This function adjusts the input logical index based on ArrayData::offset
Performs a binary search on the run_ends array for the input index.
The result is arbitrary if logical_index >= self.len()
pub fn get_physical_indices<I>(
&self,
logical_indices: &[I],
) -> Result<Vec<usize>, ArrowError>where
I: ArrowNativeType,
pub fn get_physical_indices<I>(
&self,
logical_indices: &[I],
) -> Result<Vec<usize>, ArrowError>where
I: ArrowNativeType,
Returns the physical indices of the input logical indices. Returns error if any of the logical
index cannot be converted to physical index. The logical indices are sorted and iterated along
with run_ends array to find matching physical index. The approach used here was chosen over
finding physical index for each logical index using binary search using the function
get_physical_index
. Running benchmarks on both approaches showed that the approach used here
scaled well for larger inputs.
See https://github.com/apache/arrow-rs/pull/3622#issuecomment-1407753727 for more details.
Trait Implementations§
§impl<T> Array for RunArray<T>where
T: RunEndIndexType,
impl<T> Array for RunArray<T>where
T: RunEndIndexType,
§fn slice(&self, offset: usize, length: usize) -> Arc<dyn Array>
fn slice(&self, offset: usize, length: usize) -> Arc<dyn Array>
§fn shrink_to_fit(&mut self)
fn shrink_to_fit(&mut self)
§fn offset(&self) -> usize
fn offset(&self) -> usize
0
. Read more§fn nulls(&self) -> Option<&NullBuffer>
fn nulls(&self) -> Option<&NullBuffer>
§fn logical_nulls(&self) -> Option<NullBuffer>
fn logical_nulls(&self) -> Option<NullBuffer>
NullBuffer
that represents the logical
null values of this array, if any. Read more§fn is_nullable(&self) -> bool
fn is_nullable(&self) -> bool
false
if the array is guaranteed to not contain any logical nulls Read more§fn get_buffer_memory_size(&self) -> usize
fn get_buffer_memory_size(&self) -> usize
§fn get_array_memory_size(&self) -> usize
fn get_array_memory_size(&self) -> usize
get_buffer_memory_size()
and
includes the overhead of the data structures that contain the pointers to the various buffers.§fn null_count(&self) -> usize
fn null_count(&self) -> usize
§fn logical_null_count(&self) -> usize
fn logical_null_count(&self) -> usize
§impl<R> Clone for RunArray<R>where
R: RunEndIndexType,
impl<R> Clone for RunArray<R>where
R: RunEndIndexType,
§impl<R> Debug for RunArray<R>where
R: RunEndIndexType,
impl<R> Debug for RunArray<R>where
R: RunEndIndexType,
§impl<R> From<ArrayData> for RunArray<R>where
R: RunEndIndexType,
impl<R> From<ArrayData> for RunArray<R>where
R: RunEndIndexType,
§impl<R> From<RunArray<R>> for ArrayDatawhere
R: RunEndIndexType,
impl<R> From<RunArray<R>> for ArrayDatawhere
R: RunEndIndexType,
§impl<'a, T> FromIterator<&'a str> for RunArray<T>where
T: RunEndIndexType,
impl<'a, T> FromIterator<&'a str> for RunArray<T>where
T: RunEndIndexType,
Constructs a RunArray
from an iterator of strings.
§Example:
use arrow_array::{RunArray, PrimitiveArray, StringArray, types::Int16Type};
let test = vec!["a", "a", "b", "c"];
let array: RunArray<Int16Type> = test.into_iter().collect();
assert_eq!(
"RunArray {run_ends: [2, 3, 4], values: StringArray\n[\n \"a\",\n \"b\",\n \"c\",\n]}\n",
format!("{:?}", array)
);
§impl<'a, T> FromIterator<Option<&'a str>> for RunArray<T>where
T: RunEndIndexType,
impl<'a, T> FromIterator<Option<&'a str>> for RunArray<T>where
T: RunEndIndexType,
Constructs a RunArray
from an iterator of optional strings.
§Example:
use arrow_array::{RunArray, PrimitiveArray, StringArray, types::Int16Type};
let test = vec!["a", "a", "b", "c", "c"];
let array: RunArray<Int16Type> = test
.iter()
.map(|&x| if x == "b" { None } else { Some(x) })
.collect();
assert_eq!(
"RunArray {run_ends: [2, 3, 5], values: StringArray\n[\n \"a\",\n null,\n \"c\",\n]}\n",
format!("{:?}", array)
);