VectorSchemaRoot¶
A VectorSchemaRoot
is a container that can hold batches, batches flow through VectorSchemaRoot
as part of a pipeline. Note this is different from other implementations (i.e. in C++ and Python,
a RecordBatch
is a collection of equal-length vector instances and was created each time for a new batch).
The recommended usage for VectorSchemaRoot
is creating a single VectorSchemaRoot
based on the known schema and populated data over and over into the same VectorSchemaRoot in a stream
of batches rather than creating a new VectorSchemaRoot
instance each time
(see Numba or
ArrowFileWriter
for better understanding). Thus at any one point a VectorSchemaRoot may have data or
may have no data (say it was transferred downstream or not yet populated).
Here is the example of building a VectorSchemaRoot
BitVector bitVector = new BitVector("boolean", allocator);
VarCharVector varCharVector = new VarCharVector("varchar", allocator);
bitVector.allocateNew();
varCharVector.allocateNew();
for (int i = 0; i < 10; i++) {
bitVector.setSafe(i, i % 2 == 0 ? 0 : 1);
varCharVector.setSafe(i, ("test" + i).getBytes(StandardCharsets.UTF_8));
}
bitVector.setValueCount(10);
varCharVector.setValueCount(10);
List<Field> fields = Arrays.asList(bitVector.getField(), varCharVector.getField());
List<FieldVector> vectors = Arrays.asList(bitVector, varCharVector);
VectorSchemaRoot vectorSchemaRoot = new VectorSchemaRoot(fields, vectors);
The vectors within a VectorSchemaRoot
could be loaded/unloaded via VectorLoader
and VectorUnloader
.
VectorLoader
and VectorUnloader
handles converting between VectorSchemaRoot
and ArrowRecordBatch`(
representation of a RecordBatch :doc:`IPC
message). Examples as below
// create a VectorSchemaRoot root1 and convert its data into recordBatch
VectorSchemaRoot root1 = new VectorSchemaRoot(fields, vectors);
VectorUnloader unloader = new VectorUnloader(root1);
ArrowRecordBatch recordBatch = unloader.getRecordBatch();
// create a VectorSchemaRoot root2 and load the recordBatch
VectorSchemaRoot root2 = VectorSchemaRoot.create(root1.getSchema(), allocator);
VectorLoader loader = new VectorLoader(root2);
loader.load(recordBatch);
A new VectorSchemaRoot
could be sliced from an existing instance with zero-copy
// 0 indicates start index (inclusive) and 5 indicated length (exclusive).
VectorSchemaRoot newRoot = vectorSchemaRoot.slice(0, 5);