Apache Arrow (C++)
A columnar in-memory analytics layer designed to accelerate big data.
Public Member Functions | Static Public Member Functions | List of all members
arrow::RleEncoder Class Reference

Class to incrementally build the rle data. More...

#include <arrow/util/rle-encoding.h>

Public Member Functions

 RleEncoder (uint8_t *buffer, int buffer_len, int bit_width)
 buffer/buffer_len: preallocated output buffer. More...
 
bool Put (uint64_t value)
 Encode value. More...
 
int Flush ()
 Flushes any pending values to the underlying buffer. More...
 
void Clear ()
 Resets all the state in the encoder. More...
 
uint8_t * buffer ()
 Returns pointer to underlying buffer. More...
 
int32_t len ()
 

Static Public Member Functions

static int MinBufferSize (int bit_width)
 Returns the minimum buffer size needed to use the encoder for 'bit_width' This is the maximum length of a single run for 'bit_width'. More...
 
static int MaxBufferSize (int bit_width, int num_values)
 Returns the maximum byte size it could take to encode 'num_values'. More...
 

Detailed Description

Class to incrementally build the rle data.

This class does not allocate any memory. The encoding has two modes: encoding repeated runs and literal runs. If the run is sufficiently short, it is more efficient to encode as a literal run. This class does so by buffering 8 values at a time. If they are not all the same they are added to the literal run. If they are the same, they are added to the repeated run. When we switch modes, the previous run is flushed out.

Constructor & Destructor Documentation

◆ RleEncoder()

arrow::RleEncoder::RleEncoder ( uint8_t *  buffer,
int  buffer_len,
int  bit_width 
)
inline

buffer/buffer_len: preallocated output buffer.

bit_width: max number of bits for value. TODO: consider adding a min_repeated_run_length so the caller can control when values should be encoded as repeated runs. Currently this is derived based on the bit_width, which can determine a storage optimal choice. TODO: allow 0 bit_width (and have dict encoder use it)

Member Function Documentation

◆ buffer()

uint8_t* arrow::RleEncoder::buffer ( )
inline

Returns pointer to underlying buffer.

◆ Clear()

void arrow::RleEncoder::Clear ( )
inline

Resets all the state in the encoder.

◆ Flush()

int arrow::RleEncoder::Flush ( )
inline

Flushes any pending values to the underlying buffer.

Returns the total number of bytes written

◆ len()

int32_t arrow::RleEncoder::len ( )
inline

◆ MaxBufferSize()

static int arrow::RleEncoder::MaxBufferSize ( int  bit_width,
int  num_values 
)
inlinestatic

Returns the maximum byte size it could take to encode 'num_values'.

◆ MinBufferSize()

static int arrow::RleEncoder::MinBufferSize ( int  bit_width)
inlinestatic

Returns the minimum buffer size needed to use the encoder for 'bit_width' This is the maximum length of a single run for 'bit_width'.

It is not valid to pass a buffer less than this length.

1 indicator byte and MAX_VALUES_PER_LITERAL_RUN 'bit_width' values.

Up to MAX_VLQ_BYTE_LEN indicator and a single 'bit_width' value.

◆ Put()

bool arrow::RleEncoder::Put ( uint64_t  value)
inline

Encode value.

This function buffers input values 8 at a time.

Returns true if the value fits in buffer, false otherwise. This value must be representable with bit_width_ bits.

After seeing all 8 values, it decides whether they should be encoded as a literal or repeated run.


The documentation for this class was generated from the following file: