Utilities¶

Decimal Numbers¶

class arrow::BasicDecimal128 : public arrow::GenericBasicDecimal<BasicDecimal128, 128>¶

Represents a signed 128-bit integer in two’s complement.

This class is also compiled into LLVM IR - so, it should not have cpp references like streams and boost.

Subclassed by arrow::Decimal128

Public Functions

inline constexpr BasicDecimal128(int64_t high, uint64_t low) noexcept¶: Create a BasicDecimal128 from the two’s complement representation.

template<typename T, typename = typename std::enable_if<std::is_integral<T>::value && (sizeof(T) <= sizeof(uint64_t)), T>::type> inline constexpr BasicDecimal128(T value) noexcept¶: Convert any integer value into a BasicDecimal128.

BasicDecimal128 &Negate()¶: Negate the current value (in-place)

BasicDecimal128 &Abs()¶: Absolute value (in-place)

BasicDecimal128 &operator+=(const BasicDecimal128 &right)¶: Add a number to this one. The result is truncated to 128 bits.

BasicDecimal128 &operator-=(const BasicDecimal128 &right)¶: Subtract a number from this one. The result is truncated to 128 bits.

BasicDecimal128 &operator*=(const BasicDecimal128 &right)¶: Multiply this number by another number. The result is truncated to 128 bits.

DecimalStatus Divide(const BasicDecimal128 &divisor, BasicDecimal128 *result, BasicDecimal128 *remainder) const¶

Divide this number by right and return the result.

This operation is not destructive. The answer rounds to zero. Signs work like: 21 / 5 -> 4, 1 -21 / 5 -> -4, -1 21 / -5 -> -4, 1 -21 / -5 -> 4, -1

Parameters

divisor – [in] the number to divide by
result – [out] the quotient
remainder – [out] the remainder after the division

BasicDecimal128 &operator/=(const BasicDecimal128 &right)¶: In-place division.

BasicDecimal128 &operator|=(const BasicDecimal128 &right)¶: Bitwise “or” between two BasicDecimal128.

BasicDecimal128 &operator&=(const BasicDecimal128 &right)¶: Bitwise “and” between two BasicDecimal128.

BasicDecimal128 &operator<<=(uint32_t bits)¶: Shift left by the given number of bits.

BasicDecimal128 &operator>>=(uint32_t bits)¶: Shift right by the given number of bits. Negative values will.

inline constexpr int64_t high_bits() const¶: Get the high bits of the two’s complement representation of the number.

inline constexpr uint64_t low_bits() const¶: Get the low bits of the two’s complement representation of the number.

void GetWholeAndFraction(int32_t scale, BasicDecimal128 *whole, BasicDecimal128 *fraction) const¶: separate the integer and fractional parts for the given scale.

DecimalStatus Rescale(int32_t original_scale, int32_t new_scale, BasicDecimal128 *out) const¶: Convert BasicDecimal128 from one scale to another.

BasicDecimal128 IncreaseScaleBy(int32_t increase_by) const¶: Scale up.

BasicDecimal128 ReduceScaleBy(int32_t reduce_by, bool round = true) const¶

Scale down.

If ‘round’ is true, the right-most digits are dropped and the result value is rounded up (+1 for +ve, -1 for -ve) based on the value of the dropped digits (>= 10^reduce_by / 2).
If ‘round’ is false, the right-most digits are simply dropped.

bool FitsInPrecision(int32_t precision) const¶

Whether this number fits in the given precision.

Return true if the number of significant digits is less or equal to precision.

int32_t CountLeadingBinaryZeros() const¶: count the number of leading binary zeroes.

inline constexpr GenericBasicDecimal() noexcept¶: Empty constructor creates a decimal with a value of 0.

inline constexpr GenericBasicDecimal(const WordArray &array) noexcept¶

Create a decimal from the two’s complement representation.

Input array is assumed to be in native endianness.

inline GenericBasicDecimal(LittleEndianArrayTag, const WordArray &array) noexcept¶

Create a decimal from the two’s complement representation.

Input array is assumed to be in little endianness, with native endian elements.

inline explicit GenericBasicDecimal(const uint8_t *bytes)¶

Create a decimal from an array of bytes.

Bytes are assumed to be in native-endian byte order.

Public Static Functions

static BasicDecimal128 Abs(const BasicDecimal128 &left)¶: Absolute value.

static const BasicDecimal128 &GetScaleMultiplier(int32_t scale)¶: Scale multiplier for given scale value.

static const BasicDecimal128 &GetHalfScaleMultiplier(int32_t scale)¶: Half-scale multiplier for given scale value.

static const BasicDecimal128 &GetMaxValue()¶: Get the maximum valid unscaled decimal value.

static BasicDecimal128 GetMaxValue(int32_t precision)¶: Get the maximum valid unscaled decimal value for the given precision.

static inline constexpr BasicDecimal128 GetMaxSentinel()¶: Get the maximum decimal value (is not a valid value).

static inline constexpr BasicDecimal128 GetMinSentinel()¶: Get the minimum decimal value (is not a valid value).

class arrow::Decimal128 : public arrow::BasicDecimal128 ¶

Represents a signed 128-bit integer in two’s complement.

Calculations wrap around and overflow is ignored. The max decimal precision that can be safely represented is 38 significant digits.

For a discussion of the algorithms, look at Knuth’s volume 2, Semi-numerical Algorithms section 4.3.1.

Adapted from the Apache ORC C++ implementation

The implementation is split into two parts :

BasicDecimal128
- can be safely compiled to IR without references to libstdc++.
Decimal128
- has additional functionality on top of BasicDecimal128 to deal with strings and streams.

Public Functions

inline constexpr Decimal128(const BasicDecimal128 &value) noexcept¶: constructor creates a Decimal128 from a BasicDecimal128.

explicit Decimal128(const std::string &value)¶: Parse the number from a base 10 string representation.

inline constexpr Decimal128() noexcept¶: Empty constructor creates a Decimal128 with a value of 0.

inline Result<std::pair<Decimal128, Decimal128>> Divide(const Decimal128 &divisor) const¶

Divide this number by right and return the result.

This operation is not destructive. The answer rounds to zero. Signs work like: 21 / 5 -> 4, 1 -21 / 5 -> -4, -1 21 / -5 -> -4, 1 -21 / -5 -> 4, -1

Parameters: divisor – [in] the number to divide by
Returns: the pair of the quotient and the remainder

std::string ToString(int32_t scale) const¶: Convert the Decimal128 value to a base 10 decimal string with the given scale.

std::string ToIntegerString() const¶: Convert the value to an integer string.

explicit operator int64_t() const¶: Cast this value to an int64_t.

inline Result<Decimal128> Rescale(int32_t original_scale, int32_t new_scale) const¶: Convert Decimal128 from one scale to another.

template<typename T, typename = internal::EnableIfIsOneOf<T, int32_t, int64_t>> inline Result<T> ToInteger() const¶: Convert to a signed integer.

template<typename T, typename = internal::EnableIfIsOneOf<T, int32_t, int64_t>> inline Status ToInteger(T *out) const¶: Convert to a signed integer.

float ToFloat(int32_t scale) const¶: Convert to a floating-point number (scaled)

double ToDouble(int32_t scale) const¶: Convert to a floating-point number (scaled)

template<typename T> inline T ToReal(int32_t scale) const¶: Convert to a floating-point number (scaled)

Public Static Functions

static Status FromString(const util::string_view &s, Decimal128 *out, int32_t *precision, int32_t *scale = NULLPTR)¶: Convert a decimal string to a Decimal128 value, optionally including precision and scale if they’re passed in and not null.

static Result<Decimal128> FromBigEndian(const uint8_t *data, int32_t length)¶

Convert from a big-endian byte representation.

The length must be between 1 and 16.

Returns: error status if the length is an invalid value

template<> struct ToRealConversion<double>¶

template<> struct ToRealConversion<float>¶

class arrow::BasicDecimal256 : public arrow::GenericBasicDecimal<BasicDecimal256, 256>¶

Subclassed by arrow::Decimal256

Public Functions

template<typename T, typename = typename std::enable_if<std::is_integral<T>::value && (sizeof(T) <= sizeof(uint64_t)), T>::type> inline constexpr BasicDecimal256(T value) noexcept¶: Convert any integer value into a BasicDecimal256.

BasicDecimal256 &Negate()¶: Negate the current value (in-place)

BasicDecimal256 &Abs()¶: Absolute value (in-place)

BasicDecimal256 &operator+=(const BasicDecimal256 &right)¶: Add a number to this one. The result is truncated to 256 bits.

BasicDecimal256 &operator-=(const BasicDecimal256 &right)¶: Subtract a number from this one. The result is truncated to 256 bits.

inline uint64_t low_bits() const¶: Get the lowest bits of the two’s complement representation of the number.

DecimalStatus Rescale(int32_t original_scale, int32_t new_scale, BasicDecimal256 *out) const¶: Convert BasicDecimal256 from one scale to another.

BasicDecimal256 IncreaseScaleBy(int32_t increase_by) const¶: Scale up.

BasicDecimal256 ReduceScaleBy(int32_t reduce_by, bool round = true) const¶

Scale down.

If ‘round’ is true, the right-most digits are dropped and the result value is rounded up (+1 for positive, -1 for negative) based on the value of the dropped digits (>= 10^reduce_by / 2).
If ‘round’ is false, the right-most digits are simply dropped.

bool FitsInPrecision(int32_t precision) const¶

Whether this number fits in the given precision.

Return true if the number of significant digits is less or equal to precision.

BasicDecimal256 &operator*=(const BasicDecimal256 &right)¶: Multiply this number by another number. The result is truncated to 256 bits.

DecimalStatus Divide(const BasicDecimal256 &divisor, BasicDecimal256 *result, BasicDecimal256 *remainder) const¶

Divide this number by right and return the result.

This operation is not destructive. The answer rounds to zero. Signs work like: 21 / 5 -> 4, 1 -21 / 5 -> -4, -1 21 / -5 -> -4, 1 -21 / -5 -> 4, -1

Parameters

divisor – [in] the number to divide by
result – [out] the quotient
remainder – [out] the remainder after the division

BasicDecimal256 &operator<<=(uint32_t bits)¶: Shift left by the given number of bits.

BasicDecimal256 &operator/=(const BasicDecimal256 &right)¶: In-place division.

inline constexpr GenericBasicDecimal() noexcept¶: Empty constructor creates a decimal with a value of 0.

inline constexpr GenericBasicDecimal(const WordArray &array) noexcept¶

Create a decimal from the two’s complement representation.

Input array is assumed to be in native endianness.

inline GenericBasicDecimal(LittleEndianArrayTag, const WordArray &array) noexcept¶

Create a decimal from the two’s complement representation.

Input array is assumed to be in little endianness, with native endian elements.

inline explicit GenericBasicDecimal(const uint8_t *bytes)¶

Create a decimal from an array of bytes.

Bytes are assumed to be in native-endian byte order.

Public Static Functions

static BasicDecimal256 Abs(const BasicDecimal256 &left)¶: Absolute value.

static const BasicDecimal256 &GetScaleMultiplier(int32_t scale)¶: Scale multiplier for given scale value.

static const BasicDecimal256 &GetHalfScaleMultiplier(int32_t scale)¶: Half-scale multiplier for given scale value.

static BasicDecimal256 GetMaxValue(int32_t precision)¶: Get the maximum valid unscaled decimal value for the given precision.

static inline constexpr BasicDecimal256 GetMaxSentinel()¶: Get the maximum decimal value (is not a valid value).

static inline constexpr BasicDecimal256 GetMinSentinel()¶: Get the minimum decimal value (is not a valid value).

class arrow::Decimal256 : public arrow::BasicDecimal256 ¶

Represents a signed 256-bit integer in two’s complement.

The max decimal precision that can be safely represented is 76 significant digits.

The implementation is split into two parts :

BasicDecimal256
- can be safely compiled to IR without references to libstdc++.
Decimal256
- (TODO) has additional functionality on top of BasicDecimal256 to deal with strings and streams.

Public Functions

inline constexpr Decimal256(const BasicDecimal256 &value) noexcept¶: constructor creates a Decimal256 from a BasicDecimal256.

explicit Decimal256(const std::string &value)¶: Parse the number from a base 10 string representation.

inline constexpr Decimal256() noexcept¶: Empty constructor creates a Decimal256 with a value of 0.

std::string ToString(int32_t scale) const¶: Convert the Decimal256 value to a base 10 decimal string with the given scale.

std::string ToIntegerString() const¶: Convert the value to an integer string.

inline Result<Decimal256> Rescale(int32_t original_scale, int32_t new_scale) const¶: Convert Decimal256 from one scale to another.

inline Result<std::pair<Decimal256, Decimal256>> Divide(const Decimal256 &divisor) const¶

Divide this number by right and return the result.

This operation is not destructive. The answer rounds to zero. Signs work like: 21 / 5 -> 4, 1 -21 / 5 -> -4, -1 21 / -5 -> -4, 1 -21 / -5 -> 4, -1

Parameters: divisor – [in] the number to divide by
Returns: the pair of the quotient and the remainder

float ToFloat(int32_t scale) const¶

Convert to a floating-point number (scaled).

May return infinity in case of overflow.

double ToDouble(int32_t scale) const¶: Convert to a floating-point number (scaled)

template<typename T> inline T ToReal(int32_t scale) const¶: Convert to a floating-point number (scaled)

Public Static Functions

static Status FromString(const util::string_view &s, Decimal256 *out, int32_t *precision, int32_t *scale = NULLPTR)¶: Convert a decimal string to a Decimal256 value, optionally including precision and scale if they’re passed in and not null.

static Result<Decimal256> FromBigEndian(const uint8_t *data, int32_t length)¶

Convert from a big-endian byte representation.

The length must be between 1 and 32.

Returns: error status if the length is an invalid value

template<> struct ToRealConversion<double>¶

template<> struct ToRealConversion<float>¶

Iterators¶

template<typename T> class arrow::Iterator¶

A generic Iterator that can return errors.

Public Functions

template<typename Wrapped> inline explicit Iterator(Wrapped has_next)¶

Iterator may be constructed from any type which has a member function with signature Result<T> Next(); End of iterator is signalled by returning IteratorTraits<T>::End();.

The argument is moved or copied to the heap and kept in a unique_ptr<void>. Only its destructor and its Next method (which are stored in function pointers) are referenced after construction.

This approach is used to dodge MSVC linkage hell (ARROW-6244, ARROW-6558) when using an abstract template base class: instead of being inlined as usual for a template function the base’s virtual destructor will be exported, leading to multiple definition errors when linking to any other TU where the base is instantiated.

inline Result<T> Next()¶

Return the next element of the sequence, IterationTraits<T>::End() when the iteration is completed.

Calling this on a default constructed Iterator will result in undefined behavior.

template<typename Visitor> inline Status Visit(Visitor &&visitor)¶

Pass each element of the sequence to a visitor.

Will return any error status returned by the visitor, terminating iteration.

inline bool Equals(const Iterator &other) const¶

Iterators will only compare equal if they are both null.

Equality comparability is required to make an Iterator of Iterators (to check for the end condition).

inline Result<std::vector<T>> ToVector()¶: Move every element of this iterator into a vector.

class RangeIterator¶

template<typename T> class VectorIterator¶: Simple iterator which yields the elements of a std::vector.

Compression¶

enum arrow::Compression::type¶

Compression algorithm.

Values:

enumerator UNCOMPRESSED¶

enumerator SNAPPY¶

enumerator GZIP¶

enumerator BROTLI¶

enumerator ZSTD¶

enumerator LZ4¶

enumerator LZ4_FRAME¶

enumerator LZO¶

enumerator BZ2¶

enumerator LZ4_HADOOP¶

class arrow::util::Codec¶

Compression codec.

Public Functions

virtual int minimum_compression_level() const = 0¶: Return the smallest supported compression level.

virtual int maximum_compression_level() const = 0¶: Return the largest supported compression level.

virtual int default_compression_level() const = 0¶: Return the default compression level.

virtual Result<int64_t> Decompress(int64_t input_len, const uint8_t *input, int64_t output_buffer_len, uint8_t *output_buffer) = 0¶

One-shot decompression function.

output_buffer_len must be correct and therefore be obtained in advance. The actual decompressed length is returned.

Note

One-shot decompression is not always compatible with streaming compression. Depending on the codec (e.g. LZ4), different formats may be used.

virtual Result<int64_t> Compress(int64_t input_len, const uint8_t *input, int64_t output_buffer_len, uint8_t *output_buffer) = 0¶

One-shot compression function.

output_buffer_len must first have been computed using MaxCompressedLen(). The actual compressed length is returned.

Note

One-shot compression is not always compatible with streaming decompression. Depending on the codec (e.g. LZ4), different formats may be used.

virtual Result<std::shared_ptr<Compressor>> MakeCompressor() = 0¶: Create a streaming compressor instance.

virtual Result<std::shared_ptr<Decompressor>> MakeDecompressor() = 0¶: Create a streaming compressor instance.

virtual Compression::type compression_type() const = 0¶: This Codec’s compression type.

inline const std::string &name() const¶: The name of this Codec’s compression type.

inline virtual int compression_level() const¶: This Codec’s compression level, if applicable.

Public Static Functions

static int UseDefaultCompressionLevel()¶: Return special value to indicate that a codec implementation should use its default compression level.

static const std::string &GetCodecAsString(Compression::type t)¶: Return a string name for compression type.

static Result<Compression::type> GetCompressionType(const std::string &name)¶: Return compression type for name (all upper case)

static Result<std::unique_ptr<Codec>> Create(Compression::type codec, int compression_level = kUseDefaultCompressionLevel)¶: Create a codec for the given compression algorithm.

static bool IsAvailable(Compression::type codec)¶: Return true if support for indicated codec has been enabled.

static bool SupportsCompressionLevel(Compression::type codec)¶: Return true if indicated codec supports setting a compression level.

static Result<int> MinimumCompressionLevel(Compression::type codec)¶: Return the smallest supported compression level for the codec Note: This function creates a temporary Codec instance.

static Result<int> MaximumCompressionLevel(Compression::type codec)¶: Return the largest supported compression level for the codec Note: This function creates a temporary Codec instance.

static Result<int> DefaultCompressionLevel(Compression::type codec)¶: Return the default compression level Note: This function creates a temporary Codec instance.

class arrow::util::Compressor¶

Streaming compressor interface.

Public Functions

virtual Result<CompressResult> Compress(int64_t input_len, const uint8_t *input, int64_t output_len, uint8_t *output) = 0¶

Compress some input.

If bytes_read is 0 on return, then a larger output buffer should be supplied.

virtual Result<FlushResult> Flush(int64_t output_len, uint8_t *output) = 0¶

Flush part of the compressed output.

If should_retry is true on return, Flush() should be called again with a larger buffer.

virtual Result<EndResult> End(int64_t output_len, uint8_t *output) = 0¶

End compressing, doing whatever is necessary to end the stream.

If should_retry is true on return, End() should be called again with a larger buffer. Otherwise, the Compressor should not be used anymore.

End() implies Flush().

struct CompressResult¶

struct EndResult¶

struct FlushResult¶

class arrow::util::Decompressor¶

Streaming decompressor interface.

Public Functions

virtual Result<DecompressResult> Decompress(int64_t input_len, const uint8_t *input, int64_t output_len, uint8_t *output) = 0¶

Decompress some input.

If need_more_output is true on return, a larger output buffer needs to be supplied.

virtual bool IsFinished() = 0¶

Return whether the compressed stream is finished.

This is a heuristic. If true is returned, then it is guaranteed that the stream is finished. If false is returned, however, it may simply be that the underlying library isn’t able to provide the information.

virtual Status Reset() = 0¶: Reinitialize decompressor, making it ready for a new compressed stream.

struct DecompressResult¶

Tensors

Asynchronous programming