arrow_array::array

Trait ArrayAccessor

Source
pub trait ArrayAccessor: Array {
    type Item: Send + Sync;

    // Required methods
    fn value(&self, index: usize) -> Self::Item;
    unsafe fn value_unchecked(&self, index: usize) -> Self::Item;
}
Expand description

A generic trait for accessing the values of an Array

This trait helps write specialized implementations of algorithms for different array types. Specialized implementations allow the compiler to optimize the code for the specific array type, which can lead to significant performance improvements.

§Example

For example, to write three different implementations of a string length function for StringArray, LargeStringArray, and StringViewArray, you can write

/// This function takes a dynamically typed `ArrayRef` and calls
/// calls one of three specialized implementations
fn character_length(arg: ArrayRef) -> Result<ArrayRef, ArrowError> {
    match arg.data_type() {
        DataType::Utf8 => {
            // downcast the ArrayRef to a StringArray and call the specialized implementation
            let string_array = arg.as_string::<i32>();
            character_length_general::<Int32Type, _>(string_array)
        }
        DataType::LargeUtf8 => {
            character_length_general::<Int64Type, _>(arg.as_string::<i64>())
        }
        DataType::Utf8View => {
            character_length_general::<Int32Type, _>(arg.as_string_view())
        }
        _ => Err(ArrowError::InvalidArgumentError("Unsupported data type".to_string())),
    }
}

/// A generic implementation of the character_length function
/// This function uses the `ArrayAccessor` trait to access the values of the array
/// so the compiler can generated specialized implementations for different array types
///
/// Returns a new array with the length of each string in the input array
/// * Int32Array for Utf8 and Utf8View arrays (lengths are 32-bit integers)
/// * Int64Array for LargeUtf8 arrays (lengths are 64-bit integers)
///
/// This is generic on the type of the primitive array (different string arrays have
/// different lengths) and the type of the array accessor (different string arrays
/// have different ways to access the values)
fn character_length_general<'a, T: ArrowPrimitiveType, V: ArrayAccessor<Item = &'a str>>(
    array: V,
) -> Result<ArrayRef, ArrowError>
where
    T::Native: OffsetSizeTrait,
{
    let iter = ArrayIter::new(array);
    // Create a Int32Array / Int64Array with the length of each string
    let result = iter
        .map(|string| {
            string.map(|string: &str| {
                T::Native::from_usize(string.chars().count())
                    .expect("should not fail as string.chars will always return integer")
            })
        })
        .collect::<PrimitiveArray<T>>();

    /// Return the result as a new ArrayRef (dynamically typed)
    Ok(Arc::new(result) as ArrayRef)
}

§Validity

An ArrayAccessor must always return a well-defined value for an index that is within the bounds 0..Array::len, including for null indexes where Array::is_null is true.

The value at null indexes is unspecified, and implementations must not rely on a specific value such as Default::default being returned, however, it must not be undefined

Required Associated Types§

Source

type Item: Send + Sync

The Arrow type of the element being accessed.

Required Methods§

Source

fn value(&self, index: usize) -> Self::Item

Returns the element at index i

§Panics

Panics if the value is outside the bounds of the array

Source

unsafe fn value_unchecked(&self, index: usize) -> Self::Item

Returns the element at index i

§Safety

Caller is responsible for ensuring that the index is within the bounds of the array

Implementors§