Struct RecordBatchDecoder

Source

struct RecordBatchDecoder<'a> {
    batch: RecordBatch<'a>,
    schema: SchemaRef,
    dictionaries_by_id: &'a HashMap<i64, ArrayRef>,
    compression: Option<CompressionCodec>,
    version: MetadataVersion,
    data: &'a Buffer,
    nodes: VectorIter<'a, FieldNode>,
    buffers: VectorIter<'a, Buffer>,
    projection: Option<&'a [usize]>,
    require_alignment: bool,
    skip_validation: UnsafeFlag,
}

Expand description

State for decoding Arrow arrays from an IPC RecordBatch structure to [RecordBatch]

Fields§

§batch: RecordBatch<'a>

The flatbuffers encoded record batch

§schema: SchemaRef

The output schema

§dictionaries_by_id: &'a HashMap<i64, ArrayRef>

Decoded dictionaries indexed by dictionary id

§compression: Option<CompressionCodec>

Optional compression codec

§version: MetadataVersion

The format version

§data: &'a Buffer

The raw data buffer

§nodes: VectorIter<'a, FieldNode>

The fields comprising this array

§buffers: VectorIter<'a, Buffer>

The buffers comprising this array

§projection: Option<&'a [usize]>

Projection (subset of columns) to read, if any See RecordBatchDecoder::with_projection for details

§require_alignment: bool

Are buffers required to already be aligned? See RecordBatchDecoder::with_require_alignment for details

§skip_validation: UnsafeFlag

Should validation be skipped when reading data? Defaults to false.

See FileDecoder::with_skip_validation for details.

Implementations§

Source §

impl RecordBatchDecoder<'_>

Source

fn create_array( &mut self, field: &Field, variadic_counts: &mut VecDeque<i64>, ) -> Result<ArrayRef, ArrowError>

Coordinates reading arrays based on data types.

variadic_counts encodes the number of buffers to read for variadic types (e.g., Utf8View, BinaryView) When encounter such types, we pop from the front of the queue to get the number of buffers to read.

Notes:

In the IPC format, null buffers are always set, but may be empty. We discard them if an array has 0 nulls
Numeric values inside list arrays are often stored as 64-bit values regardless of their data type size. We thus:
- check if the bit width of non-64-bit numbers is 64, and
- read the buffer as 64-bit (signed integer or float), and
- cast the 64-bit array to the appropriate data type

Source

fn create_primitive_array( &self, field_node: &FieldNode, data_type: &DataType, buffers: &[Buffer], ) -> Result<ArrayRef, ArrowError>

Reads the correct number of buffers based on data type and null_count, and creates a primitive array ref

Source

fn create_array_from_builder( &self, builder: ArrayDataBuilder, ) -> Result<ArrayRef, ArrowError>

Update the ArrayDataBuilder based on settings in this decoder

Source

fn create_list_array( &self, field_node: &FieldNode, data_type: &DataType, buffers: &[Buffer], child_array: ArrayRef, ) -> Result<ArrayRef, ArrowError>

Reads the correct number of buffers based on list type and null_count, and creates a list array ref

Source

fn create_struct_array( &self, struct_node: &FieldNode, null_buffer: Buffer, struct_fields: &Fields, struct_arrays: Vec<ArrayRef>, ) -> Result<ArrayRef, ArrowError>

Source

fn create_dictionary_array( &self, field_node: &FieldNode, data_type: &DataType, buffers: &[Buffer], value_array: ArrayRef, ) -> Result<ArrayRef, ArrowError>

Reads the correct number of buffers based on list type and null_count, and creates a list array ref

Source §

impl<'a> RecordBatchDecoder<'a>

Source

fn try_new( buf: &'a Buffer, batch: RecordBatch<'a>, schema: SchemaRef, dictionaries_by_id: &'a HashMap<i64, ArrayRef>, metadata: &'a MetadataVersion, ) -> Result<Self, ArrowError>

Create a reader for decoding arrays from an encoded [RecordBatch]

Source

pub fn with_projection(self, projection: Option<&'a [usize]>) -> Self

Set the projection (default: None)

If set, the projection is the list of column indices that will be read

Source

pub fn with_require_alignment(self, require_alignment: bool) -> Self

Set require_alignment (default: false)

If true, buffers must be aligned appropriately or error will result. If false, buffers will be copied to aligned buffers if necessary.

Source

pub(crate) fn with_skip_validation(self, skip_validation: UnsafeFlag) -> Self

Specifies if validation should be skipped when reading data (defaults to false)

Note this API is somewhat “funky” as it allows the caller to skip validation without having to use unsafe code. If this is ever made public it should be made clearer that this is a potentially unsafe by using an unsafe function that takes a boolean flag.