arrow::array

Function make_comparator

pub fn make_comparator(
    left: &dyn Array,
    right: &dyn Array,
    opts: SortOptions,
) -> Result<Box<dyn Fn(usize, usize) -> Ordering + Sync + Send>, ArrowError>
Expand description

Returns a comparison function that compares two values at two different positions between the two arrays.

For comparing arrays element-wise, see also the vectorised kernels in crate::cmp.

If nulls_first is true NULL values will be considered less than any non-null value, otherwise they will be considered greater.

§Basic Usage

let array1 = Int32Array::from(vec![1, 2]);
let array2 = Int32Array::from(vec![3, 4]);

let cmp = make_comparator(&array1, &array2, SortOptions::default()).unwrap();
// 1 (index 0 of array1) is smaller than 4 (index 1 of array2)
assert_eq!(cmp(0, 1), Ordering::Less);

let array1 = Int32Array::from(vec![Some(1), None]);
let array2 = Int32Array::from(vec![None, Some(2)]);
let cmp = make_comparator(&array1, &array2, SortOptions::default()).unwrap();

assert_eq!(cmp(0, 1), Ordering::Less); // Some(1) vs Some(2)
assert_eq!(cmp(1, 1), Ordering::Less); // None vs Some(2)
assert_eq!(cmp(1, 0), Ordering::Equal); // None vs None
assert_eq!(cmp(0, 0), Ordering::Greater); // Some(1) vs None

§Postgres-compatible Nested Comparison

Whilst SQL prescribes ternary logic for nulls, that is comparing a value against a NULL yields a NULL, many systems, including postgres, instead apply a total ordering to comparison of nested nulls. That is nulls within nested types are either greater than any value (postgres), or less than any value (Spark).

In particular

{ a: 1, b: null } == { a: 1, b: null } => true
{ a: 1, b: null } == { a: 1, b: 1 } => false
{ a: 1, b: null } == null => null
null == null => null

This could be implemented as below

fn eq(a: &dyn Array, b: &dyn Array) -> Result<BooleanArray, ArrowError> {
    if !a.data_type().is_nested() {
        return cmp::eq(&a, &b); // Use faster vectorised kernel
    }

    let cmp = make_comparator(a, b, SortOptions::default())?;
    let len = a.len().min(b.len());
    let values = (0..len).map(|i| cmp(i, i).is_eq()).collect();
    let nulls = NullBuffer::union(a.nulls(), b.nulls());
    Ok(BooleanArray::new(values, nulls))
}