Class BaseRepeatedValueVector

java.lang.Object
org.apache.arrow.vector.BaseValueVector
org.apache.arrow.vector.complex.BaseRepeatedValueVector
All Implemented Interfaces:
Closeable, AutoCloseable, Iterable<ValueVector>, BaseListVector, RepeatedValueVector, DensityAwareVector, FieldVector, ValueVector
Direct Known Subclasses:
ListVector

public abstract class BaseRepeatedValueVector extends BaseValueVector implements RepeatedValueVector, BaseListVector
Base class for Vectors that contain repeated values.
  • Field Details

    • DEFAULT_DATA_VECTOR

      public static final FieldVector DEFAULT_DATA_VECTOR
    • DATA_VECTOR_NAME

      public static final String DATA_VECTOR_NAME
      See Also:
    • OFFSET_WIDTH

      public static final byte OFFSET_WIDTH
      See Also:
    • offsetBuffer

      protected ArrowBuf offsetBuffer
    • vector

      protected FieldVector vector
    • repeatedCallBack

      protected final CallBack repeatedCallBack
    • valueCount

      protected int valueCount
    • offsetAllocationSizeInBytes

      protected long offsetAllocationSizeInBytes
    • defaultDataVectorName

      protected String defaultDataVectorName
  • Constructor Details

  • Method Details

    • getName

      public String getName()
      Description copied from interface: ValueVector
      Gets the name of the vector.
      Specified by:
      getName in interface ValueVector
      Specified by:
      getName in class BaseValueVector
      Returns:
      the name of the vector.
    • allocateNewSafe

      public boolean allocateNewSafe()
      Description copied from interface: ValueVector
      Allocates new buffers. ValueVector implements logic to determine how much to allocate.
      Specified by:
      allocateNewSafe in interface ValueVector
      Returns:
      Returns true if allocation was successful.
    • allocateOffsetBuffer

      protected ArrowBuf allocateOffsetBuffer(long size)
    • reAlloc

      public void reAlloc()
      Description copied from interface: ValueVector
      Allocate new buffer with double capacity, and copy data into the new buffer. Replace vector's buffer with new buffer, and release old one
      Specified by:
      reAlloc in interface ValueVector
    • reallocOffsetBuffer

      protected void reallocOffsetBuffer()
    • getOffsetVector

      @Deprecated public UInt4Vector getOffsetVector()
      Deprecated.
      This API will be removed, as the current implementations no longer hold inner offset vectors.
      Get the offset vector.
      Specified by:
      getOffsetVector in interface RepeatedValueVector
      Returns:
      the underlying offset vector or null if none exists.
    • getDataVector

      public FieldVector getDataVector()
      Description copied from interface: RepeatedValueVector
      Get the data vector.
      Specified by:
      getDataVector in interface RepeatedValueVector
      Returns:
      the underlying data vector or null if none exists.
    • setInitialCapacity

      public void setInitialCapacity(int numRecords)
      Description copied from interface: ValueVector
      Set the initial record capacity.
      Specified by:
      setInitialCapacity in interface ValueVector
      Parameters:
      numRecords - the initial record capacity.
    • setInitialCapacity

      public void setInitialCapacity(int numRecords, double density)
      Specialized version of setInitialCapacity() for ListVector. This is used by some callers when they want to explicitly control and be conservative about memory allocated for inner data vector. This is very useful when we are working with memory constraints for a query and have a fixed amount of memory reserved for the record batch. In such cases, we are likely to face OOM or related problems when we reserve memory for a record batch with value count x and do setInitialCapacity(x) such that each vector allocates only what is necessary and not the default amount but the multiplier forces the memory requirement to go beyond what was needed.
      Specified by:
      setInitialCapacity in interface DensityAwareVector
      Parameters:
      numRecords - value count
      density - density of ListVector. Density is the average size of list per position in the List vector. For example, a density value of 10 implies each position in the list vector has a list of 10 values. A density value of 0.1 implies out of 10 positions in the list vector, 1 position has a list of size 1 and remaining positions are null (no lists) or empty lists. This helps in tightly controlling the memory we provision for inner data vector.
    • setInitialTotalCapacity

      public void setInitialTotalCapacity(int numRecords, int totalNumberOfElements)
      Specialized version of setInitialTotalCapacity() for ListVector. This is used by some callers when they want to explicitly control and be conservative about memory allocated for inner data vector. This is very useful when we are working with memory constraints for a query and have a fixed amount of memory reserved for the record batch. In such cases, we are likely to face OOM or related problems when we reserve memory for a record batch with value count x and do setInitialCapacity(x) such that each vector allocates only what is necessary and not the default amount but the multiplier forces the memory requirement to go beyond what was needed.
      Parameters:
      numRecords - value count
      totalNumberOfElements - the total number of elements to to allow for in this vector across all records.
    • getValueCapacity

      public int getValueCapacity()
      Description copied from interface: ValueVector
      Returns the maximum number of values that can be stored in this vector instance.
      Specified by:
      getValueCapacity in interface ValueVector
      Returns:
      the maximum number of values that can be stored in this vector instance.
    • getOffsetBufferValueCapacity

      protected int getOffsetBufferValueCapacity()
    • getBufferSize

      public int getBufferSize()
      Description copied from interface: ValueVector
      Get the number of bytes used by this vector.
      Specified by:
      getBufferSize in interface ValueVector
      Returns:
      the number of bytes that is used by this vector instance.
    • getBufferSizeFor

      public int getBufferSizeFor(int valueCount)
      Description copied from interface: ValueVector
      Returns the number of bytes that is used by this vector if it holds the given number of values. The result will be the same as if setValueCount() were called, followed by calling getBufferSize(), but without any of the closing side-effects that setValueCount() implies wrt finishing off the population of a vector. Some operations might wish to use this to determine how much memory has been used by a vector so far, even though it is not finished being populated.
      Specified by:
      getBufferSizeFor in interface ValueVector
      Parameters:
      valueCount - the number of values to assume this vector contains
      Returns:
      the buffer size if this vector is holding valueCount values
    • iterator

      public Iterator<ValueVector> iterator()
      Specified by:
      iterator in interface Iterable<ValueVector>
      Overrides:
      iterator in class BaseValueVector
    • clear

      public void clear()
      Description copied from interface: ValueVector
      Release any owned ArrowBuf and reset the ValueVector to the initial state. If the vector has any child vectors, they will also be cleared.
      Specified by:
      clear in interface ValueVector
      Overrides:
      clear in class BaseValueVector
    • reset

      public void reset()
      Description copied from interface: ValueVector
      Reset the ValueVector to the initial state without releasing any owned ArrowBuf. Buffer capacities will remain unchanged and any previous data will be zeroed out. This includes buffers for data, validity, offset, etc. If the vector has any child vectors, they will also be reset.
      Specified by:
      reset in interface ValueVector
    • getBuffers

      public ArrowBuf[] getBuffers(boolean clear)
      Return the underlying buffers associated with this vector. Note that this doesn't impact the reference counts for this buffer, so it only should be used for in-context access. Also note that this buffer changes regularly, thus external classes shouldn't hold a reference to it (unless they change it).
      Specified by:
      getBuffers in interface ValueVector
      Parameters:
      clear - Whether to clear vector before returning, the buffers will still be refcounted but the returned array will be the only reference to them. Also, this won't clear the child buffers.
      Returns:
      The underlying buffers that is used by this vector instance.
    • size

      public int size()
      Get value indicating if inner vector is set.
      Returns:
      1 if inner vector is explicitly set via #addOrGetVector else 0
    • addOrGetVector

      public <T extends ValueVector> AddOrGetResult<T> addOrGetVector(FieldType fieldType)
      Initialize the data vector (and execute callback) if it hasn't already been done, returns the data vector.
    • replaceDataVector

      protected void replaceDataVector(FieldVector v)
    • getValueCount

      public int getValueCount()
      Description copied from interface: ValueVector
      Gets the number of values.
      Specified by:
      getValueCount in interface ValueVector
      Returns:
      number of values in the vector
    • getInnerValueCount

      public int getInnerValueCount()
    • getInnerValueCountAt

      public int getInnerValueCountAt(int index)
      Returns the value count for inner data vector at a particular index.
    • isEmpty

      public abstract boolean isEmpty(int index)
      Return if value at index is empty.
    • startNewValue

      public int startNewValue(int index)
      Starts a new repeated value.
    • setValueCount

      public void setValueCount(int valueCount)
      Preallocates the number of repeated values.
      Specified by:
      setValueCount in interface ValueVector