Interface ValueVector

All Superinterfaces:
AutoCloseable, Closeable, Iterable<ValueVector>
All Known Subinterfaces:
BaseIntVector, BaseListVector, ElementAddressableVector, FieldVector, FixedWidthVector, FloatingPointVector, RepeatedValueVector, ValueIterableVector<T>, VariableWidthFieldVector, VariableWidthVector
All Known Implementing Classes:
AbstractContainerVector, AbstractStructVector, BaseFixedWidthVector, BaseLargeRepeatedValueViewVector, BaseLargeVariableWidthVector, BaseRepeatedValueVector, BaseRepeatedValueViewVector, BaseValueVector, BaseVariableWidthVector, BaseVariableWidthViewVector, BigIntVector, BitVector, DateDayVector, DateMilliVector, Decimal256Vector, DecimalVector, DenseUnionVector, DurationVector, ExtensionTypeVector, FixedSizeBinaryVector, FixedSizeListVector, Float2Vector, Float4Vector, Float8Vector, IntervalDayVector, IntervalMonthDayNanoVector, IntervalYearVector, IntVector, LargeListVector, LargeListViewVector, LargeVarBinaryVector, LargeVarCharVector, ListVector, ListViewVector, MapVector, NonNullableStructVector, NullVector, OpaqueVector, RunEndEncodedVector, SmallIntVector, StructVector, TimeMicroVector, TimeMilliVector, TimeNanoVector, TimeSecVector, TimeStampMicroTZVector, TimeStampMicroVector, TimeStampMilliTZVector, TimeStampMilliVector, TimeStampNanoTZVector, TimeStampNanoVector, TimeStampSecTZVector, TimeStampSecVector, TimeStampVector, TinyIntVector, UInt1Vector, UInt2Vector, UInt4Vector, UInt8Vector, UnionVector, VarBinaryVector, VarCharVector, ViewVarBinaryVector, ViewVarCharVector, ZeroVector

public interface ValueVector extends Closeable, Iterable<ValueVector>
An abstraction that is used to store a sequence of values in an individual column.

A value vector stores underlying data in-memory in a columnar fashion that is compact and efficient. The column whose data is stored, is referred by getField().

It is important that vector is allocated before attempting to read or write.

There are a few "rules" around vectors:

  • values need to be written in order (e.g. index 0, 1, 2, 5)
  • null vectors start with all values as null before writing anything
  • for variable width types, the offset vector should be all zeros before writing
  • you must call setValueCount before a vector can be read
  • you should never write to a vector once it has been read.

Please note that the current implementation doesn't enforce those rules, hence we may find few places that deviate from these rules (e.g. offset vectors in Variable Length and Repeated vector)

This interface "should" strive to guarantee this order of operation:

allocate > mutate > setvaluecount > access > clear (or allocate to start the process over).
  • Method Details

    • allocateNew

      void allocateNew() throws OutOfMemoryException
      Allocate new buffers. ValueVector implements logic to determine how much to allocate.
      Throws:
      OutOfMemoryException - Thrown if no memory can be allocated.
    • allocateNewSafe

      boolean allocateNewSafe()
      Allocates new buffers. ValueVector implements logic to determine how much to allocate.
      Returns:
      Returns true if allocation was successful.
    • reAlloc

      void reAlloc()
      Allocate new buffer with double capacity, and copy data into the new buffer. Replace vector's buffer with new buffer, and release old one
    • getAllocator

      BufferAllocator getAllocator()
      Get the allocator associated with the vector. CAVEAT: Some ValueVector subclasses (e.g. NullVector) do not require an allocator for data storage and may return null.
      Returns:
      Returns nullable allocator.
    • setInitialCapacity

      void setInitialCapacity(int numRecords)
      Set the initial record capacity.
      Parameters:
      numRecords - the initial record capacity.
    • getValueCapacity

      int getValueCapacity()
      Returns the maximum number of values that can be stored in this vector instance.
      Returns:
      the maximum number of values that can be stored in this vector instance.
    • close

      void close()
      Alternative to clear(). Allows use as an AutoCloseable in try-with-resources.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
    • clear

      void clear()
      Release any owned ArrowBuf and reset the ValueVector to the initial state. If the vector has any child vectors, they will also be cleared.
    • reset

      void reset()
      Reset the ValueVector to the initial state without releasing any owned ArrowBuf. Buffer capacities will remain unchanged and any previous data will be zeroed out. This includes buffers for data, validity, offset, etc. If the vector has any child vectors, they will also be reset.
    • getField

      Field getField()
      Get information about how this field is materialized.
      Returns:
      the field corresponding to this vector
    • getMinorType

      Types.MinorType getMinorType()
    • getTransferPair

      TransferPair getTransferPair(BufferAllocator allocator)
      To transfer quota responsibility.
      Parameters:
      allocator - the target allocator
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • getTransferPair

      TransferPair getTransferPair(String ref, BufferAllocator allocator)
      To transfer quota responsibility.
      Parameters:
      ref - the name of the vector
      allocator - the target allocator
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • getTransferPair

      TransferPair getTransferPair(Field field, BufferAllocator allocator)
      To transfer quota responsibility.
      Parameters:
      field - the Field object used by the target vector
      allocator - the target allocator
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • getTransferPair

      TransferPair getTransferPair(String ref, BufferAllocator allocator, CallBack callBack)
      To transfer quota responsibility.
      Parameters:
      ref - the name of the vector
      allocator - the target allocator
      callBack - A schema change callback.
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • getTransferPair

      TransferPair getTransferPair(Field field, BufferAllocator allocator, CallBack callBack)
      To transfer quota responsibility.
      Parameters:
      field - the Field object used by the target vector
      allocator - the target allocator
      callBack - A schema change callback.
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • makeTransferPair

      TransferPair makeTransferPair(ValueVector target)
      Makes a new transfer pair used to transfer underlying buffers.
      Parameters:
      target - the target for the transfer
      Returns:
      a new transfer pair that is used to transfer underlying buffers into the target vector.
    • getReader

      FieldReader getReader()
      Get a reader for this vector.
      Returns:
      a field reader that supports reading values from this vector.
    • getBufferSize

      int getBufferSize()
      Get the number of bytes used by this vector.
      Returns:
      the number of bytes that is used by this vector instance.
    • getBufferSizeFor

      int getBufferSizeFor(int valueCount)
      Returns the number of bytes that is used by this vector if it holds the given number of values. The result will be the same as if setValueCount() were called, followed by calling getBufferSize(), but without any of the closing side-effects that setValueCount() implies wrt finishing off the population of a vector. Some operations might wish to use this to determine how much memory has been used by a vector so far, even though it is not finished being populated.
      Parameters:
      valueCount - the number of values to assume this vector contains
      Returns:
      the buffer size if this vector is holding valueCount values
    • getBuffers

      ArrowBuf[] getBuffers(boolean clear)
      Return the underlying buffers associated with this vector. Note that this doesn't impact the reference counts for this buffer so it only should be used for in-context access. Also note that this buffer changes regularly thus external classes shouldn't hold a reference to it (unless they change it).
      Parameters:
      clear - Whether to clear vector before returning; the buffers will still be refcounted; but the returned array will be the only reference to them
      Returns:
      The underlying buffers that is used by this vector instance.
    • getValidityBuffer

      ArrowBuf getValidityBuffer()
      Gets the underlying buffer associated with validity vector.
      Returns:
      buffer
    • getDataBuffer

      ArrowBuf getDataBuffer()
      Gets the underlying buffer associated with data vector.
      Returns:
      buffer
    • getOffsetBuffer

      ArrowBuf getOffsetBuffer()
      Gets the underlying buffer associated with offset vector.
      Returns:
      buffer
    • getValueCount

      int getValueCount()
      Gets the number of values.
      Returns:
      number of values in the vector
    • setValueCount

      void setValueCount(int valueCount)
      Set number of values in the vector.
    • getObject

      Object getObject(int index)
      Get friendly type object from the vector.
      Parameters:
      index - index of object to get
      Returns:
      friendly type object
    • getNullCount

      int getNullCount()
      Returns number of null elements in the vector.
      Returns:
      number of null elements
    • isNull

      boolean isNull(int index)
      Check whether an element in the vector is null.
      Parameters:
      index - index to check for null
      Returns:
      true if element is null
    • hashCode

      int hashCode(int index)
      Returns hashCode of element in index with the default hasher.
    • hashCode

      int hashCode(int index, ArrowBufHasher hasher)
      Returns hashCode of element in index with the given hasher.
    • copyFrom

      void copyFrom(int fromIndex, int thisIndex, ValueVector from)
      Copy a cell value from a particular index in source vector to a particular position in this vector.
      Parameters:
      fromIndex - position to copy from in source vector
      thisIndex - position to copy to in this vector
      from - source vector
    • copyFromSafe

      void copyFromSafe(int fromIndex, int thisIndex, ValueVector from)
      Same as copyFrom(int, int, ValueVector) except that it handles the case when the capacity of the vector needs to be expanded before copy.
      Parameters:
      fromIndex - position to copy from in source vector
      thisIndex - position to copy to in this vector
      from - source vector
    • accept

      <OUT, IN> OUT accept(VectorVisitor<OUT,IN> visitor, IN value)
      Accept a generic VectorVisitor and return the result.
      Type Parameters:
      OUT - the output result type.
      IN - the input data together with visitor.
    • getName

      String getName()
      Gets the name of the vector.
      Returns:
      the name of the vector.
    • validate

      default void validate()
    • validateFull

      default void validateFull()