arrow::compute::kernels::cmp

Function compare_byte_view_unchecked

pub unsafe fn compare_byte_view_unchecked<T>(
    left: &GenericByteViewArray<T>,
    left_idx: usize,
    right: &GenericByteViewArray<T>,
    right_idx: usize,
) -> Ordering
where T: ByteViewType,
👎Deprecated since 52.2.0: Use GenericByteViewArray::compare_unchecked instead
Expand description

Comparing two GenericByteViewArray at index left_idx and right_idx

Comparing two ByteView types are non-trivial. It takes a bit of patience to understand why we don’t just compare two &u8 directly.

ByteView types give us the following two advantages, and we need to be careful not to lose them: (1) For string/byte smaller than 12 bytes, the entire data is inlined in the view. Meaning that reading one array element requires only one memory access (two memory access required for StringArray, one for offset buffer, the other for value buffer).

(2) For string/byte larger than 12 bytes, we can still be faster than (for certain operations) StringArray/ByteArray, thanks to the inlined 4 bytes. Consider equality check: If the first four bytes of the two strings are different, we can return false immediately (with just one memory access).

If we directly compare two &u8, we materialize the entire string (i.e., make multiple memory accesses), which might be unnecessary.

  • Most of the time (eq, ord), we only need to look at the first 4 bytes to know the answer, e.g., if the inlined 4 bytes are different, we can directly return unequal without looking at the full string.

§Order check flow

(1) if both string are smaller than 12 bytes, we can directly compare the data inlined to the view. (2) if any of the string is larger than 12 bytes, we need to compare the full string. (2.1) if the inlined 4 bytes are different, we can return the result immediately. (2.2) o.w., we need to compare the full string.

§Safety

The left/right_idx must within range of each array