Class RunEndEncodedVector

java.lang.Object
org.apache.arrow.vector.BaseValueVector
org.apache.arrow.vector.complex.RunEndEncodedVector
All Implemented Interfaces:
Closeable, AutoCloseable, Iterable<ValueVector>, FieldVector, ValueVector

public class RunEndEncodedVector extends BaseValueVector implements FieldVector
A run-end encoded vector contains only two child vectors: a run_end vector of type int and a values vector of any type. There are no buffers associated with the parent vector.
  • Field Details

    • DEFAULT_VALUE_VECTOR

      public static final FieldVector DEFAULT_VALUE_VECTOR
    • DEFAULT_RUN_END_VECTOR

      public static final FieldVector DEFAULT_RUN_END_VECTOR
    • callBack

      protected final CallBack callBack
    • field

      protected Field field
    • runEndsVector

      protected FieldVector runEndsVector
    • valuesVector

      protected FieldVector valuesVector
    • valueCount

      protected int valueCount
  • Constructor Details

    • RunEndEncodedVector

      public RunEndEncodedVector(String name, BufferAllocator allocator, FieldType fieldType, CallBack callBack)
      Constructs a new instance.
      Parameters:
      name - The name of the instance.
      allocator - The allocator to use for allocating/reallocating buffers.
      fieldType - The type of the array that is run-end encoded.
      callBack - A schema change callback.
    • RunEndEncodedVector

      public RunEndEncodedVector(Field field, BufferAllocator allocator, CallBack callBack)
      Constructs a new instance.
      Parameters:
      field - The field materialized by this vector.
      allocator - The allocator to use for allocating/reallocating buffers.
      callBack - A schema change callback.
    • RunEndEncodedVector

      public RunEndEncodedVector(Field field, BufferAllocator allocator, FieldVector runEndsVector, FieldVector valuesVector, CallBack callBack)
      Constructs a new instance.
      Parameters:
      field - The field materialized by this vector.
      allocator - The allocator to use for allocating/reallocating buffers.
      runEndsVector - The vector represents run ends. Only Zero vector or type int vector with size 16, 32 is allowed
      valuesVector - The vector represents values
      callBack - A schema change callback.
  • Method Details

    • empty

      public static RunEndEncodedVector empty(String name, BufferAllocator allocator)
    • allocateNew

      public void allocateNew() throws OutOfMemoryException
      Allocate new buffers. ValueVector implements logic to determine how much to allocate.
      Specified by:
      allocateNew in interface ValueVector
      Throws:
      OutOfMemoryException - Thrown if no memory can be allocated.
    • allocateNewSafe

      public boolean allocateNewSafe()
      Allocates new buffers. ValueVector implements logic to determine how much to allocate.
      Specified by:
      allocateNewSafe in interface ValueVector
      Returns:
      Returns true if allocation was successful.
    • reAlloc

      public void reAlloc()
      Allocate new buffer with double capacity, and copy data into the new buffer. Replace vector's buffer with new buffer, and release old one
      Specified by:
      reAlloc in interface ValueVector
    • getAllocator

      public BufferAllocator getAllocator()
      Description copied from interface: ValueVector
      Get the allocator associated with the vector. CAVEAT: Some ValueVector subclasses (e.g. NullVector) do not require an allocator for data storage and may return null.
      Specified by:
      getAllocator in interface ValueVector
      Overrides:
      getAllocator in class BaseValueVector
      Returns:
      Returns nullable allocator.
    • getReaderImpl

      protected FieldReader getReaderImpl()
      Description copied from class: BaseValueVector
      Each vector has a different reader that implements the FieldReader interface. Overridden methods must make sure to return the correct concrete reader implementation.
      Specified by:
      getReaderImpl in class BaseValueVector
      Returns:
      Returns a lambda that initializes a reader when called.
    • setInitialCapacity

      public void setInitialCapacity(int numRecords)
      Set the initial record capacity.
      Specified by:
      setInitialCapacity in interface ValueVector
      Parameters:
      numRecords - the initial record capacity.
    • getValueCapacity

      public int getValueCapacity()
      Returns the maximum number of values that can be stored in this vector instance.
      Specified by:
      getValueCapacity in interface ValueVector
      Returns:
      the maximum number of values that can be stored in this vector instance.
    • close

      public void close()
      Alternative to clear(). Allows use as an AutoCloseable in try-with-resources.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in interface ValueVector
      Overrides:
      close in class BaseValueVector
    • clear

      public void clear()
      Release any owned ArrowBuf and reset the ValueVector to the initial state. If the vector has any child vectors, they will also be cleared.
      Specified by:
      clear in interface ValueVector
      Overrides:
      clear in class BaseValueVector
    • reset

      public void reset()
      Reset the ValueVector to the initial state without releasing any owned ArrowBuf. Buffer capacities will remain unchanged and any previous data will be zeroed out. This includes buffers for data, validity, offset, etc. If the vector has any child vectors, they will also be reset.
      Specified by:
      reset in interface ValueVector
    • getField

      public Field getField()
      Get information about how this field is materialized.
      Specified by:
      getField in interface ValueVector
      Returns:
      the field corresponding to this vector
    • getMinorType

      public Types.MinorType getMinorType()
      Specified by:
      getMinorType in interface ValueVector
    • getTransferPair

      public TransferPair getTransferPair(String ref, BufferAllocator allocator)
      To transfer quota responsibility.
      Specified by:
      getTransferPair in interface ValueVector
      Parameters:
      ref - the name of the vector
      allocator - the target allocator
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • getTransferPair

      public TransferPair getTransferPair(Field field, BufferAllocator allocator)
      To transfer quota responsibility.
      Specified by:
      getTransferPair in interface ValueVector
      Parameters:
      field - the Field object used by the target vector
      allocator - the target allocator
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • getTransferPair

      public TransferPair getTransferPair(String ref, BufferAllocator allocator, CallBack callBack)
      To transfer quota responsibility.
      Specified by:
      getTransferPair in interface ValueVector
      Parameters:
      ref - the name of the vector
      allocator - the target allocator
      callBack - A schema change callback.
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • getTransferPair

      public TransferPair getTransferPair(Field field, BufferAllocator allocator, CallBack callBack)
      To transfer quota responsibility.
      Specified by:
      getTransferPair in interface ValueVector
      Parameters:
      field - the Field object used by the target vector
      allocator - the target allocator
      callBack - A schema change callback.
      Returns:
      a transfer pair, creating a new target vector of the same type.
    • makeTransferPair

      public TransferPair makeTransferPair(ValueVector target)
      Makes a new transfer pair used to transfer underlying buffers.
      Specified by:
      makeTransferPair in interface ValueVector
      Parameters:
      target - the target for the transfer
      Returns:
      a new transfer pair that is used to transfer underlying buffers into the target vector.
    • getReader

      public FieldReader getReader()
      Get a reader for this vector.
      Specified by:
      getReader in interface ValueVector
      Overrides:
      getReader in class BaseValueVector
      Returns:
      a field reader that supports reading values from this vector.
    • getWriter

      public FieldWriter getWriter()
      Get a writer for this vector.
      Returns:
      a field writer that supports writing values to this vector.
    • getBufferSize

      public int getBufferSize()
      Get the number of bytes used by this vector.
      Specified by:
      getBufferSize in interface ValueVector
      Returns:
      the number of bytes that is used by this vector instance.
    • getBufferSizeFor

      public int getBufferSizeFor(int valueCount)
      Returns the number of bytes that is used by this vector if it holds the given number of values. The result will be the same as if setValueCount() were called, followed by calling getBufferSize(), but without any of the closing side-effects that setValueCount() implies wrt finishing off the population of a vector. Some operations might wish to use this to determine how much memory has been used by a vector so far, even though it is not finished being populated.
      Specified by:
      getBufferSizeFor in interface ValueVector
      Parameters:
      valueCount - the number of values to assume this vector contains
      Returns:
      the buffer size if this vector is holding valueCount values
    • getBuffers

      public ArrowBuf[] getBuffers(boolean clear)
      Return the underlying buffers associated with this vector. Note that this doesn't impact the reference counts for this buffer so it only should be used for in-context access. Also note that this buffer changes regularly thus external classes shouldn't hold a reference to it (unless they change it).
      Specified by:
      getBuffers in interface ValueVector
      Parameters:
      clear - Whether to clear vector before returning; the buffers will still be refcounted; but the returned array will be the only reference to them
      Returns:
      The underlying buffers that is used by this vector instance.
    • getValidityBuffer

      public ArrowBuf getValidityBuffer()
      Gets the underlying buffer associated with validity vector.
      Specified by:
      getValidityBuffer in interface ValueVector
      Returns:
      buffer
    • getDataBuffer

      public ArrowBuf getDataBuffer()
      Gets the underlying buffer associated with data vector.
      Specified by:
      getDataBuffer in interface ValueVector
      Returns:
      buffer
    • getOffsetBuffer

      public ArrowBuf getOffsetBuffer()
      Gets the underlying buffer associated with offset vector.
      Specified by:
      getOffsetBuffer in interface ValueVector
      Returns:
      buffer
    • getValueCount

      public int getValueCount()
      Gets the number of values.
      Specified by:
      getValueCount in interface ValueVector
      Returns:
      number of values in the vector
    • setValueCount

      public void setValueCount(int valueCount)
      Set number of values in the vector.
      Specified by:
      setValueCount in interface ValueVector
    • getObject

      public Object getObject(int index)
      Get friendly type object from the vector.
      Specified by:
      getObject in interface ValueVector
      Parameters:
      index - index of object to get
      Returns:
      friendly type object
    • getRunEnd

      public int getRunEnd(int index)
      Get the run end of giving index.
      Parameters:
      index - index of the run end to get
      Returns:
      the run end of giving index
    • getNullCount

      public int getNullCount()
      Returns number of null elements in the vector.
      Specified by:
      getNullCount in interface ValueVector
      Returns:
      number of null elements
    • isNull

      public boolean isNull(int index)
      Check whether an element in the vector is null.
      Specified by:
      isNull in interface ValueVector
      Parameters:
      index - index to check for null
      Returns:
      true if element is null
    • hashCode

      public int hashCode(int index)
      Returns hashCode of element in index with the default hasher.
      Specified by:
      hashCode in interface ValueVector
    • hashCode

      public int hashCode(int index, ArrowBufHasher hasher)
      Returns hashCode of element in index with the given hasher.
      Specified by:
      hashCode in interface ValueVector
    • accept

      public <OUT, IN> OUT accept(VectorVisitor<OUT,IN> visitor, IN value)
      Accept a generic VectorVisitor and return the result.
      Specified by:
      accept in interface ValueVector
      Type Parameters:
      OUT - the output result type.
      IN - the input data together with visitor.
    • getName

      public String getName()
      Gets the name of the vector.
      Specified by:
      getName in interface ValueVector
      Specified by:
      getName in class BaseValueVector
      Returns:
      the name of the vector.
    • iterator

      public Iterator<ValueVector> iterator()
      Specified by:
      iterator in interface Iterable<ValueVector>
      Overrides:
      iterator in class BaseValueVector
    • initializeChildrenFromFields

      public void initializeChildrenFromFields(List<Field> children)
      Initializes the child vectors to be later loaded with loadBuffers.
      Specified by:
      initializeChildrenFromFields in interface FieldVector
      Parameters:
      children - the schema containing the run_ends column first and the values column second
    • getChildrenFromFields

      public List<FieldVector> getChildrenFromFields()
      The returned list is the same size as the list passed to initializeChildrenFromFields.
      Specified by:
      getChildrenFromFields in interface FieldVector
      Returns:
      the children according to schema (empty for primitive types)
    • loadFieldBuffers

      public void loadFieldBuffers(ArrowFieldNode fieldNode, List<ArrowBuf> ownBuffers)
      Loads data in the vectors. (ownBuffers must be the same size as getFieldVectors())
      Specified by:
      loadFieldBuffers in interface FieldVector
      Parameters:
      fieldNode - the fieldNode
      ownBuffers - the buffers for this Field (own buffers only, children not included)
    • getFieldBuffers

      public List<ArrowBuf> getFieldBuffers()
      Get the buffers of the fields, (same size as getFieldVectors() since it is their content).
      Specified by:
      getFieldBuffers in interface FieldVector
      Returns:
      the buffers containing the data for this vector (ready for reading)
    • getFieldInnerVectors

      @Deprecated public List<BufferBacked> getFieldInnerVectors()
      Deprecated.
      This API will be removed as the current implementations no longer support inner vectors.
      Get the inner vectors.
      Specified by:
      getFieldInnerVectors in interface FieldVector
      Returns:
      the inner vectors for this field as defined by the TypeLayout
    • getValidityBufferAddress

      public long getValidityBufferAddress()
      Gets the starting address of the underlying buffer associated with validity vector.
      Specified by:
      getValidityBufferAddress in interface FieldVector
      Returns:
      buffer address
    • getDataBufferAddress

      public long getDataBufferAddress()
      Gets the starting address of the underlying buffer associated with data vector.
      Specified by:
      getDataBufferAddress in interface FieldVector
      Returns:
      buffer address
    • getOffsetBufferAddress

      public long getOffsetBufferAddress()
      Gets the starting address of the underlying buffer associated with offset vector.
      Specified by:
      getOffsetBufferAddress in interface FieldVector
      Returns:
      buffer address
    • setNull

      public void setNull(int index)
      Set the element at the given index to null.
      Specified by:
      setNull in interface FieldVector
      Parameters:
      index - the value to change
    • getRunEndsVector

      public FieldVector getRunEndsVector()
    • getValuesVector

      public FieldVector getValuesVector()
    • getPhysicalIndex

      public int getPhysicalIndex(int logicalIndex)
      The physical index is the index of the first value that is larger than logical index. e.g. if run_ends is [1,2,3], the physical index of logical index from 0 to 5 is [0, 1, 1, 2, 2, 2]