Class VectorSchemaRoot

java.lang.Object
org.apache.arrow.vector.VectorSchemaRoot
All Implemented Interfaces:
AutoCloseable

public class VectorSchemaRoot extends Object implements AutoCloseable
Holder for a set of vectors to be loaded/unloaded. A VectorSchemaRoot is a container that can hold batches, batches flow through VectorSchemaRoot as part of a pipeline. Note this is different from other implementations (i.e. in C++ and Python, a RecordBatch is a collection of equal-length vector instances and was created each time for a new batch). The recommended usage for VectorSchemaRoot is creating a single VectorSchemaRoot based on the known schema and populated data over and over into the same VectorSchemaRoot in a stream of batches rather than create a new VectorSchemaRoot instance each time (see Flight or ArrowFileWriter for better understanding). Thus at any one point a VectorSchemaRoot may have data or may have no data (say it was transferred downstream or not yet populated).
  • Constructor Details

    • VectorSchemaRoot

      public VectorSchemaRoot(Iterable<FieldVector> vectors)
      Constructs new instance containing each of the vectors.
    • VectorSchemaRoot

      public VectorSchemaRoot(FieldVector parent)
      Constructs a new instance containing the children of parent but not the parent itself.
    • VectorSchemaRoot

      public VectorSchemaRoot(List<Field> fields, List<FieldVector> fieldVectors)
      Constructs a new instance.
      Parameters:
      fields - The types of each vector.
      fieldVectors - The data vectors (must be equal in size to fields.
    • VectorSchemaRoot

      public VectorSchemaRoot(List<Field> fields, List<FieldVector> fieldVectors, int rowCount)
      Constructs a new instance.
      Parameters:
      fields - The types of each vector.
      fieldVectors - The data vectors (must be equal in size to fields.
      rowCount - The number of rows contained.
    • VectorSchemaRoot

      public VectorSchemaRoot(Schema schema, List<FieldVector> fieldVectors, int rowCount)
      Constructs a new instance.
      Parameters:
      schema - The schema for the vectors.
      fieldVectors - The data vectors.
      rowCount - The number of rows
  • Method Details

    • create

      public static VectorSchemaRoot create(Schema schema, BufferAllocator allocator)
      Creates a new set of empty vectors corresponding to the given schema.
    • of

      public static VectorSchemaRoot of(FieldVector... vectors)
      Constructs a new instance from vectors.
    • allocateNew

      public void allocateNew()
      Do an adaptive allocation of each vector for memory purposes. Sizes will be based on previously defined initial allocation for each vector (and subsequent size learned).
    • clear

      public void clear()
      Release all the memory for each vector held in this root. This DOES NOT remove vectors from the container.
    • getFieldVectors

      public List<FieldVector> getFieldVectors()
    • getVector

      public FieldVector getVector(String name)
      gets a vector by name. if name occurs multiple times this returns the first inserted entry for name
    • getVector

      public FieldVector getVector(Field field)
    • getVector

      public FieldVector getVector(int index)
    • addVector

      public VectorSchemaRoot addVector(int index, FieldVector vector)
      Add vector to the record batch, producing a new VectorSchemaRoot.
      Parameters:
      index - field index
      vector - vector to be added.
      Returns:
      out VectorSchemaRoot with vector added
    • removeVector

      public VectorSchemaRoot removeVector(int index)
      Remove vector from the record batch, producing a new VectorSchemaRoot.
      Parameters:
      index - field index
      Returns:
      out VectorSchemaRoot with vector removed
    • getSchema

      public Schema getSchema()
    • getRowCount

      public int getRowCount()
    • setRowCount

      public void setRowCount(int rowCount)
      Set the row count of all the vectors in this container. Also sets the value count for each root level contained FieldVector.
      Parameters:
      rowCount - Number of records.
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
    • contentToTSVString

      public String contentToTSVString()
      Returns a tab separated value of vectors (based on their java object representation).
    • syncSchema

      public boolean syncSchema()
      Synchronizes the schema from the current vectors. In some cases, the schema and the actual vector structure may be different. This can be caused by a promoted writer (For details, please see PromotableWriter). For example, when writing different types of data to a ListVector may lead to such a case. When this happens, this method should be called to bring the schema and vector structure in a synchronized state.
      Returns:
      true if the schema is updated, false otherwise.
    • slice

      public VectorSchemaRoot slice(int index)
      Slice this root from desired index.
      Parameters:
      index - start position of the slice
      Returns:
      the sliced root
    • slice

      public VectorSchemaRoot slice(int index, int length)
      Slice this root at desired index and length.
      Parameters:
      index - start position of the slice
      length - length of the slice
      Returns:
      the sliced root
    • equals

      public boolean equals(VectorSchemaRoot other)
      Determine if two VectorSchemaRoots are exactly equal.
    • approxEquals

      public boolean approxEquals(VectorSchemaRoot other, VectorValueEqualizer<Float4Vector> floatDiffFunction, VectorValueEqualizer<Float8Vector> doubleDiffFunction)
      Determine if two VectorSchemaRoots are approximately equal using the given functions to calculate difference between float/double values. Note that approx equals are in regards to floating point values, other values are comparing to exactly equals.
      Parameters:
      floatDiffFunction - function to calculate difference between float values.
      doubleDiffFunction - function to calculate difference between double values.
    • approxEquals

      public boolean approxEquals(VectorSchemaRoot other)
      Determine if two VectorSchemaRoots are approximately equal using default functions to calculate difference between float/double values.