pub struct RecordBatch {
schema: SchemaRef,
columns: Vec<Arc<dyn Array>>,
row_count: usize,
}
Expand description
A two-dimensional batch of column-oriented data with a defined schema.
A RecordBatch
is a two-dimensional dataset of a number of
contiguous arrays, each the same length.
A record batch has a schema which must match its arrays’
datatypes.
Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental.
Use the record_batch!
macro to create a RecordBatch
from
literal slice of values, useful for rapid prototyping and testing.
Example:
use arrow_array::record_batch;
let batch = record_batch!(
("a", Int32, [1, 2, 3]),
("b", Float64, [Some(4.0), None, Some(5.0)]),
("c", Utf8, ["alpha", "beta", "gamma"])
);
Fields§
§schema: SchemaRef
§columns: Vec<Arc<dyn Array>>
§row_count: usize
The number of rows in this RecordBatch
This is stored separately from the columns to handle the case of no columns
Implementations§
Source§impl RecordBatch
impl RecordBatch
Sourcepub fn try_new(
schema: SchemaRef,
columns: Vec<ArrayRef>,
) -> Result<Self, ArrowError>
pub fn try_new( schema: SchemaRef, columns: Vec<ArrayRef>, ) -> Result<Self, ArrowError>
Creates a RecordBatch
from a schema and columns.
Expects the following:
- the vec of columns to not be empty
- the schema and column data types to have equal lengths and match
- each array in columns to have the same length
If the conditions are not met, an error is returned.
§Example
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, false)
]);
let batch = RecordBatch::try_new(
Arc::new(schema),
vec![Arc::new(id_array)]
).unwrap();
Sourcepub fn try_new_with_options(
schema: SchemaRef,
columns: Vec<ArrayRef>,
options: &RecordBatchOptions,
) -> Result<Self, ArrowError>
pub fn try_new_with_options( schema: SchemaRef, columns: Vec<ArrayRef>, options: &RecordBatchOptions, ) -> Result<Self, ArrowError>
Creates a RecordBatch
from a schema and columns, with additional options,
such as whether to strictly validate field names.
See RecordBatch::try_new
for the expected conditions.
Sourcepub fn new_empty(schema: SchemaRef) -> Self
pub fn new_empty(schema: SchemaRef) -> Self
Creates a new empty RecordBatch
.
Sourcefn try_new_impl(
schema: SchemaRef,
columns: Vec<ArrayRef>,
options: &RecordBatchOptions,
) -> Result<Self, ArrowError>
fn try_new_impl( schema: SchemaRef, columns: Vec<ArrayRef>, options: &RecordBatchOptions, ) -> Result<Self, ArrowError>
Validate the schema and columns using RecordBatchOptions
. Returns an error
if any validation check fails, otherwise returns the created Self
Sourcepub fn with_schema(self, schema: SchemaRef) -> Result<Self, ArrowError>
pub fn with_schema(self, schema: SchemaRef) -> Result<Self, ArrowError>
Override the schema of this RecordBatch
Returns an error if schema
is not a superset of the current schema
as determined by [Schema::contains
]
Sourcepub fn schema_ref(&self) -> &SchemaRef
pub fn schema_ref(&self) -> &SchemaRef
Returns a reference to the [Schema
] of the record batch.
Sourcepub fn project(&self, indices: &[usize]) -> Result<RecordBatch, ArrowError>
pub fn project(&self, indices: &[usize]) -> Result<RecordBatch, ArrowError>
Projects the schema onto the specified columns
Sourcepub fn num_columns(&self) -> usize
pub fn num_columns(&self) -> usize
Returns the number of columns in the record batch.
§Example
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, false)
]);
let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array)]).unwrap();
assert_eq!(batch.num_columns(), 1);
Sourcepub fn num_rows(&self) -> usize
pub fn num_rows(&self) -> usize
Returns the number of rows in each column.
§Example
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, false)
]);
let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array)]).unwrap();
assert_eq!(batch.num_rows(), 5);
Sourcepub fn column_by_name(&self, name: &str) -> Option<&ArrayRef>
pub fn column_by_name(&self, name: &str) -> Option<&ArrayRef>
Get a reference to a column’s array by name.
Sourcepub fn remove_column(&mut self, index: usize) -> ArrayRef
pub fn remove_column(&mut self, index: usize) -> ArrayRef
Remove column by index and return it.
Return the ArrayRef
if the column is removed.
§Panics
Panics if `index`` out of bounds.
§Example
use std::sync::Arc;
use arrow_array::{BooleanArray, Int32Array, RecordBatch};
use arrow_schema::{DataType, Field, Schema};
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let bool_array = BooleanArray::from(vec![true, false, false, true, true]);
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, false),
Field::new("bool", DataType::Boolean, false),
]);
let mut batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array), Arc::new(bool_array)]).unwrap();
let removed_column = batch.remove_column(0);
assert_eq!(removed_column.as_any().downcast_ref::<Int32Array>().unwrap(), &Int32Array::from(vec![1, 2, 3, 4, 5]));
assert_eq!(batch.num_columns(), 1);
Sourcepub fn slice(&self, offset: usize, length: usize) -> RecordBatch
pub fn slice(&self, offset: usize, length: usize) -> RecordBatch
Return a new RecordBatch where each column is sliced
according to offset
and length
§Panics
Panics if offset
with length
is greater than column length.
Sourcepub fn try_from_iter<I, F>(value: I) -> Result<Self, ArrowError>
pub fn try_from_iter<I, F>(value: I) -> Result<Self, ArrowError>
Create a RecordBatch
from an iterable list of pairs of the
form (field_name, array)
, with the same requirements on
fields and arrays as RecordBatch::try_new
. This method is
often used to create a single RecordBatch
from arrays,
e.g. for testing.
The resulting schema is marked as nullable for each column if
the array for that column is has any nulls. To explicitly
specify nullibility, use RecordBatch::try_from_iter_with_nullable
Example:
let a: ArrayRef = Arc::new(Int32Array::from(vec![1, 2]));
let b: ArrayRef = Arc::new(StringArray::from(vec!["a", "b"]));
let record_batch = RecordBatch::try_from_iter(vec![
("a", a),
("b", b),
]);
Another way to quickly create a RecordBatch
is to use the record_batch!
macro,
which is particularly helpful for rapid prototyping and testing.
Example:
use arrow_array::record_batch;
let batch = record_batch!(
("a", Int32, [1, 2, 3]),
("b", Float64, [Some(4.0), None, Some(5.0)]),
("c", Utf8, ["alpha", "beta", "gamma"])
);
Sourcepub fn try_from_iter_with_nullable<I, F>(value: I) -> Result<Self, ArrowError>
pub fn try_from_iter_with_nullable<I, F>(value: I) -> Result<Self, ArrowError>
Create a RecordBatch
from an iterable list of tuples of the
form (field_name, array, nullable)
, with the same requirements on
fields and arrays as RecordBatch::try_new
. This method is often
used to create a single RecordBatch
from arrays, e.g. for
testing.
Example:
let a: ArrayRef = Arc::new(Int32Array::from(vec![1, 2]));
let b: ArrayRef = Arc::new(StringArray::from(vec![Some("a"), Some("b")]));
// Note neither `a` nor `b` has any actual nulls, but we mark
// b an nullable
let record_batch = RecordBatch::try_from_iter_with_nullable(vec![
("a", a, false),
("b", b, true),
]);
Sourcepub fn get_array_memory_size(&self) -> usize
pub fn get_array_memory_size(&self) -> usize
Returns the total number of bytes of memory occupied physically by this batch.
Note that this does not always correspond to the exact memory usage of a
RecordBatch
(might overestimate), since multiple columns can share the same
buffers or slices thereof, the memory used by the shared buffers might be
counted multiple times.
Trait Implementations§
Source§impl Clone for RecordBatch
impl Clone for RecordBatch
Source§fn clone(&self) -> RecordBatch
fn clone(&self) -> RecordBatch
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more