struct RecordBatchDecoder<'a> {
batch: RecordBatch<'a>,
schema: SchemaRef,
dictionaries_by_id: &'a HashMap<i64, ArrayRef>,
compression: Option<CompressionCodec>,
version: MetadataVersion,
data: &'a Buffer,
nodes: VectorIter<'a, FieldNode>,
buffers: VectorIter<'a, Buffer>,
projection: Option<&'a [usize]>,
require_alignment: bool,
skip_validation: UnsafeFlag,
}
Expand description
State for decoding Arrow arrays from an IPC RecordBatch structure to
[RecordBatch
]
Fields§
§batch: RecordBatch<'a>
The flatbuffers encoded record batch
schema: SchemaRef
The output schema
dictionaries_by_id: &'a HashMap<i64, ArrayRef>
Decoded dictionaries indexed by dictionary id
compression: Option<CompressionCodec>
Optional compression codec
version: MetadataVersion
The format version
data: &'a Buffer
The raw data buffer
nodes: VectorIter<'a, FieldNode>
The fields comprising this array
buffers: VectorIter<'a, Buffer>
The buffers comprising this array
projection: Option<&'a [usize]>
Projection (subset of columns) to read, if any
See RecordBatchDecoder::with_projection
for details
require_alignment: bool
Are buffers required to already be aligned? See
RecordBatchDecoder::with_require_alignment
for details
skip_validation: UnsafeFlag
Should validation be skipped when reading data? Defaults to false.
See FileDecoder::with_skip_validation
for details.
Implementations§
Source§impl RecordBatchDecoder<'_>
impl RecordBatchDecoder<'_>
Sourcefn create_array(
&mut self,
field: &Field,
variadic_counts: &mut VecDeque<i64>,
) -> Result<ArrayRef, ArrowError>
fn create_array( &mut self, field: &Field, variadic_counts: &mut VecDeque<i64>, ) -> Result<ArrayRef, ArrowError>
Coordinates reading arrays based on data types.
variadic_counts
encodes the number of buffers to read for variadic types (e.g., Utf8View, BinaryView)
When encounter such types, we pop from the front of the queue to get the number of buffers to read.
Notes:
- In the IPC format, null buffers are always set, but may be empty. We discard them if an array has 0 nulls
- Numeric values inside list arrays are often stored as 64-bit values regardless of their data type size.
We thus:
- check if the bit width of non-64-bit numbers is 64, and
- read the buffer as 64-bit (signed integer or float), and
- cast the 64-bit array to the appropriate data type
Sourcefn create_primitive_array(
&self,
field_node: &FieldNode,
data_type: &DataType,
buffers: &[Buffer],
) -> Result<ArrayRef, ArrowError>
fn create_primitive_array( &self, field_node: &FieldNode, data_type: &DataType, buffers: &[Buffer], ) -> Result<ArrayRef, ArrowError>
Reads the correct number of buffers based on data type and null_count, and creates a primitive array ref
Sourcefn create_array_from_builder(
&self,
builder: ArrayDataBuilder,
) -> Result<ArrayRef, ArrowError>
fn create_array_from_builder( &self, builder: ArrayDataBuilder, ) -> Result<ArrayRef, ArrowError>
Update the ArrayDataBuilder based on settings in this decoder
Sourcefn create_list_array(
&self,
field_node: &FieldNode,
data_type: &DataType,
buffers: &[Buffer],
child_array: ArrayRef,
) -> Result<ArrayRef, ArrowError>
fn create_list_array( &self, field_node: &FieldNode, data_type: &DataType, buffers: &[Buffer], child_array: ArrayRef, ) -> Result<ArrayRef, ArrowError>
Reads the correct number of buffers based on list type and null_count, and creates a list array ref
fn create_struct_array( &self, struct_node: &FieldNode, null_buffer: Buffer, struct_fields: &Fields, struct_arrays: Vec<ArrayRef>, ) -> Result<ArrayRef, ArrowError>
Sourcefn create_dictionary_array(
&self,
field_node: &FieldNode,
data_type: &DataType,
buffers: &[Buffer],
value_array: ArrayRef,
) -> Result<ArrayRef, ArrowError>
fn create_dictionary_array( &self, field_node: &FieldNode, data_type: &DataType, buffers: &[Buffer], value_array: ArrayRef, ) -> Result<ArrayRef, ArrowError>
Reads the correct number of buffers based on list type and null_count, and creates a list array ref
Source§impl<'a> RecordBatchDecoder<'a>
impl<'a> RecordBatchDecoder<'a>
Sourcefn try_new(
buf: &'a Buffer,
batch: RecordBatch<'a>,
schema: SchemaRef,
dictionaries_by_id: &'a HashMap<i64, ArrayRef>,
metadata: &'a MetadataVersion,
) -> Result<Self, ArrowError>
fn try_new( buf: &'a Buffer, batch: RecordBatch<'a>, schema: SchemaRef, dictionaries_by_id: &'a HashMap<i64, ArrayRef>, metadata: &'a MetadataVersion, ) -> Result<Self, ArrowError>
Create a reader for decoding arrays from an encoded [RecordBatch
]
Sourcepub fn with_projection(self, projection: Option<&'a [usize]>) -> Self
pub fn with_projection(self, projection: Option<&'a [usize]>) -> Self
Set the projection (default: None)
If set, the projection is the list of column indices that will be read
Sourcepub fn with_require_alignment(self, require_alignment: bool) -> Self
pub fn with_require_alignment(self, require_alignment: bool) -> Self
Set require_alignment (default: false)
If true, buffers must be aligned appropriately or error will result. If false, buffers will be copied to aligned buffers if necessary.
Sourcepub(crate) fn with_skip_validation(self, skip_validation: UnsafeFlag) -> Self
pub(crate) fn with_skip_validation(self, skip_validation: UnsafeFlag) -> Self
Specifies if validation should be skipped when reading data (defaults to false
)
Note this API is somewhat “funky” as it allows the caller to skip validation
without having to use unsafe
code. If this is ever made public
it should be made clearer that this is a potentially unsafe by
using an unsafe
function that takes a boolean flag.
§Safety
Relies on the caller only passing a flag with true
value if they are
certain that the data is valid
Sourcefn read_record_batch(self) -> Result<RecordBatch, ArrowError>
fn read_record_batch(self) -> Result<RecordBatch, ArrowError>
Read the record batch, consuming the reader