Class BaseLargeVariableWidthVector

java.lang.Object
org.apache.arrow.vector.BaseValueVector
org.apache.arrow.vector.BaseLargeVariableWidthVector
All Implemented Interfaces:
Closeable, AutoCloseable, Iterable<ValueVector>, DensityAwareVector, ElementAddressableVector, FieldVector, ValueVector, VariableWidthFieldVector, VariableWidthVector, VectorDefinitionSetter
Direct Known Subclasses:
LargeVarBinaryVector, LargeVarCharVector

public abstract class BaseLargeVariableWidthVector extends BaseValueVector implements VariableWidthFieldVector
BaseLargeVariableWidthVector is a base class providing functionality for large strings/large bytes types.
  • Field Details

    • OFFSET_WIDTH

      public static final int OFFSET_WIDTH
      See Also:
    • emptyByteArray

      protected static final byte[] emptyByteArray
    • validityBuffer

      protected ArrowBuf validityBuffer
    • valueBuffer

      protected ArrowBuf valueBuffer
    • offsetBuffer

      protected ArrowBuf offsetBuffer
    • valueCount

      protected int valueCount
    • lastSet

      protected int lastSet
    • field

      protected final Field field
  • Constructor Details

    • BaseLargeVariableWidthVector

      public BaseLargeVariableWidthVector(Field field, BufferAllocator allocator)
      Constructs a new instance.
      Parameters:
      field - The field materialized by this vector.
      allocator - The allocator to use for creating/resizing buffers
  • Method Details

    • getName

      public String getName()
      Description copied from interface: ValueVector
      Gets the name of the vector.
      Specified by:
      getName in interface ValueVector
      Specified by:
      getName in class BaseValueVector
      Returns:
      the name of the vector.
    • getValidityBuffer

      public ArrowBuf getValidityBuffer()
      Get buffer that manages the validity (NULL or NON-NULL nature) of elements in the vector. Consider it as a buffer for internal bit vector data structure.
      Specified by:
      getValidityBuffer in interface ValueVector
      Returns:
      buffer
    • getDataBuffer

      public ArrowBuf getDataBuffer()
      Get the buffer that stores the data for elements in the vector.
      Specified by:
      getDataBuffer in interface ValueVector
      Returns:
      buffer
    • getOffsetBuffer

      public ArrowBuf getOffsetBuffer()
      buffer that stores the offsets for elements in the vector. This operation is not supported for fixed-width vectors.
      Specified by:
      getOffsetBuffer in interface ValueVector
      Returns:
      buffer
    • getOffsetBufferAddress

      public long getOffsetBufferAddress()
      Get the memory address of buffer that stores the offsets for elements in the vector.
      Specified by:
      getOffsetBufferAddress in interface FieldVector
      Returns:
      starting address of the buffer
    • getValidityBufferAddress

      public long getValidityBufferAddress()
      Get the memory address of buffer that manages the validity (NULL or NON-NULL nature) of elements in the vector.
      Specified by:
      getValidityBufferAddress in interface FieldVector
      Returns:
      starting address of the buffer
    • getDataBufferAddress

      public long getDataBufferAddress()
      Get the memory address of buffer that stores the data for elements in the vector.
      Specified by:
      getDataBufferAddress in interface FieldVector
      Returns:
      starting address of the buffer
    • setInitialCapacity

      public void setInitialCapacity(int valueCount)
      Sets the desired value capacity for the vector. This function doesn't allocate any memory for the vector.
      Specified by:
      setInitialCapacity in interface ValueVector
      Parameters:
      valueCount - desired number of elements in the vector
    • setInitialCapacity

      public void setInitialCapacity(int valueCount, double density)
      Sets the desired value capacity for the vector. This function doesn't allocate any memory for the vector.
      Specified by:
      setInitialCapacity in interface DensityAwareVector
      Parameters:
      valueCount - desired number of elements in the vector
      density - average number of bytes per variable width element
    • getDensity

      public double getDensity()
      Get the density of this ListVector.
      Returns:
      density
    • getValueCapacity

      public int getValueCapacity()
      Get the current capacity which does not exceed either validity buffer or offset buffer. Note: Here the `getValueCapacity` has no relationship with the value buffer.
      Specified by:
      getValueCapacity in interface ValueVector
      Returns:
      number of elements that vector can hold.
    • zeroVector

      public void zeroVector()
      zero out the vector and the data in associated buffers.
    • reset

      public void reset()
      Reset the vector to initial state. Same as zeroVector(). Note that this method doesn't release any memory.
      Specified by:
      reset in interface ValueVector
    • close

      public void close()
      Close the vector and release the associated buffers.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in interface ValueVector
      Overrides:
      close in class BaseValueVector
    • clear

      public void clear()
      Same as close().
      Specified by:
      clear in interface ValueVector
      Overrides:
      clear in class BaseValueVector
    • getFieldInnerVectors

      @Deprecated public List<BufferBacked> getFieldInnerVectors()
      Deprecated.
      This API will be removed as the current implementations no longer support inner vectors.
      Get the inner vectors.
      Specified by:
      getFieldInnerVectors in interface FieldVector
      Returns:
      the inner vectors for this field as defined by the TypeLayout
    • initializeChildrenFromFields

      public void initializeChildrenFromFields(List<Field> children)
      Initialize the children in schema for this Field. This operation is a NO-OP for scalar types since they don't have any children.
      Specified by:
      initializeChildrenFromFields in interface FieldVector
      Parameters:
      children - the schema
      Throws:
      IllegalArgumentException - if children is a non-empty list for scalar types.
    • getChildrenFromFields

      public List<FieldVector> getChildrenFromFields()
      Get the inner child vectors.
      Specified by:
      getChildrenFromFields in interface FieldVector
      Returns:
      list of child vectors for complex types, empty list for scalar vector types
    • loadFieldBuffers

      public void loadFieldBuffers(ArrowFieldNode fieldNode, List<ArrowBuf> ownBuffers)
      Load the buffers of this vector with provided source buffers. The caller manages the source buffers and populates them before invoking this method.
      Specified by:
      loadFieldBuffers in interface FieldVector
      Parameters:
      fieldNode - the fieldNode indicating the value count
      ownBuffers - the buffers for this Field (own buffers only, children not included)
    • getFieldBuffers

      public List<ArrowBuf> getFieldBuffers()
      Get the buffers belonging to this vector.
      Specified by:
      getFieldBuffers in interface FieldVector
      Returns:
      the inner buffers.
    • exportCDataBuffers

      public void exportCDataBuffers(List<ArrowBuf> buffers, ArrowBuf buffersPtr, long nullValue)
      Export the buffers of the fields for C Data Interface. This method traverse the buffers and export buffer and buffer's memory address into a list of buffers and a pointer to the list of buffers.
      Specified by:
      exportCDataBuffers in interface FieldVector
    • allocateNew

      public void allocateNew()
      Specified by:
      allocateNew in interface ValueVector
    • allocateNewSafe

      public boolean allocateNewSafe()
      Allocate memory for the vector. We internally use a default value count of 4096 to allocate memory for at least these many elements in the vector. See allocateNew(long, int) for allocating memory for specific number of elements in the vector.
      Specified by:
      allocateNewSafe in interface ValueVector
      Returns:
      false if memory allocation fails, true otherwise.
    • allocateNew

      public void allocateNew(long totalBytes, int valueCount)
      Allocate memory for the vector to support storing at least the provided number of elements in the vector. This method must be called prior to using the ValueVector.
      Specified by:
      allocateNew in interface VariableWidthVector
      Parameters:
      totalBytes - desired total memory capacity
      valueCount - the desired number of elements in the vector
      Throws:
      OutOfMemoryException - if memory allocation fails
    • allocateNew

      public void allocateNew(int valueCount)
      Description copied from interface: VariableWidthVector
      Allocate a new memory space for this vector. Must be called prior to using the ValueVector. The initial size in bytes is either default (or) reused from previous allocation
      Specified by:
      allocateNew in interface VariableWidthVector
      Parameters:
      valueCount - Number of values in the vector.
    • reAlloc

      public void reAlloc()
      Resize the vector to increase the capacity. The internal behavior is to double the current value capacity.
      Specified by:
      reAlloc in interface ValueVector
    • reallocDataBuffer

      public void reallocDataBuffer()
      Reallocate the data buffer. Data Buffer stores the actual data for LARGEVARCHAR or LARGEVARBINARY elements in the vector. The behavior is to double the size of buffer.
      Throws:
      OversizedAllocationException - if the desired new size is more than max allowed
      OutOfMemoryException - if the internal memory allocation fails
    • reallocValidityAndOffsetBuffers

      public void reallocValidityAndOffsetBuffers()
      Reallocate the validity and offset buffers for this vector. Validity buffer is used to track the NULL or NON-NULL nature of elements in the vector and offset buffer is used to store the lengths of variable width elements in the vector.

      Note that data buffer for variable length vectors moves independent of the companion validity and offset buffers. This is in contrast to what we have for fixed width vectors.

      So even though we may have setup an initial capacity of 1024 elements in the vector, it is quite possible that we need to reAlloc() the data buffer when we are setting the 5th element in the vector simply because previous variable length elements have exhausted the buffer capacity. However, we really don't need to reAlloc() validity and offset buffers until we try to set the 1025th element This is why we do a separate check for safe methods to determine which buffer needs reallocation.

      Throws:
      OversizedAllocationException - if the desired new size is more than max allowed
      OutOfMemoryException - if the internal memory allocation fails
    • getByteCapacity

      public int getByteCapacity()
      Get the size (number of bytes) of underlying data buffer.
      Specified by:
      getByteCapacity in interface VariableWidthVector
      Returns:
      number of bytes in the data buffer
    • sizeOfValueBuffer

      public int sizeOfValueBuffer()
      Description copied from interface: VariableWidthVector
      Provide the number of bytes contained in the valueBuffer.
      Specified by:
      sizeOfValueBuffer in interface VariableWidthVector
      Returns:
      the number of bytes in valueBuffer.
    • getBufferSize

      public int getBufferSize()
      Get the size (number of bytes) of underlying buffers used by this vector.
      Specified by:
      getBufferSize in interface ValueVector
      Returns:
      size of underlying buffers.
    • getBufferSizeFor

      public int getBufferSizeFor(int valueCount)
      Get the potential buffer size for a particular number of records.
      Specified by:
      getBufferSizeFor in interface ValueVector
      Parameters:
      valueCount - desired number of elements in the vector
      Returns:
      estimated size of underlying buffers if the vector holds a given number of elements
    • getField

      public Field getField()
      Get information about how this field is materialized.
      Specified by:
      getField in interface ValueVector
      Returns:
      the field corresponding to this vector
    • getBuffers

      public ArrowBuf[] getBuffers(boolean clear)
      Return the underlying buffers associated with this vector. Note that this doesn't impact the reference counts for this buffer so it only should be used for in-context access. Also note that this buffer changes regularly thus external classes shouldn't hold a reference to it (unless they change it).
      Specified by:
      getBuffers in interface ValueVector
      Parameters:
      clear - Whether to clear vector before returning; the buffers will still be refcounted but the returned array will be the only reference to them
      Returns:
      The underlying buffers that is used by this vector instance.
    • validateScalars

      public void validateScalars()
      Validate the scalar values held by this vector.
    • getTransferPair

      public TransferPair getTransferPair(String ref, BufferAllocator allocator, CallBack callBack)
      Construct a transfer pair of this vector and another vector of same type.
      Specified by:
      getTransferPair in interface ValueVector
      Parameters:
      ref - name of the target vector
      allocator - allocator for the target vector
      callBack - not used
      Returns:
      TransferPair
    • getTransferPair

      public TransferPair getTransferPair(Field field, BufferAllocator allocator, CallBack callBack)
      Construct a transfer pair of this vector and another vector of same type.
      Specified by:
      getTransferPair in interface ValueVector
      Parameters:
      field - The field materialized by this vector
      allocator - allocator for the target vector
      callBack - not used
      Returns:
      TransferPair
    • getTransferPair

      public TransferPair getTransferPair(BufferAllocator allocator)
      Construct a transfer pair of this vector and another vector of same type.
      Specified by:
      getTransferPair in interface ValueVector
      Overrides:
      getTransferPair in class BaseValueVector
      Parameters:
      allocator - allocator for the target vector
      Returns:
      TransferPair
    • getTransferPair

      public abstract TransferPair getTransferPair(String ref, BufferAllocator allocator)
      Construct a transfer pair of this vector and another vector of same type.
      Specified by:
      getTransferPair in interface ValueVector
      Parameters:
      ref - name of the target vector
      allocator - allocator for the target vector
      Returns:
      TransferPair
    • getTransferPair

      public abstract TransferPair getTransferPair(Field field, BufferAllocator allocator)
      Construct a transfer pair of this vector and another vector of same type.
      Specified by:
      getTransferPair in interface ValueVector
      Parameters:
      field - The field materialized by this vector
      allocator - allocator for the target vector
      Returns:
      TransferPair
    • transferTo

      public void transferTo(BaseLargeVariableWidthVector target)
      Transfer this vector's data to another vector. The memory associated with this vector is transferred to the allocator of target vector for accounting and management purposes.
      Parameters:
      target - destination vector for transfer
    • splitAndTransferTo

      public void splitAndTransferTo(int startIndex, int length, BaseLargeVariableWidthVector target)
      Slice this vector at desired index and length and transfer the corresponding data to the target vector.
      Parameters:
      startIndex - start position of the split in source vector.
      length - length of the split.
      target - destination vector
    • getNullCount

      public int getNullCount()
      Get the number of elements that are null in the vector.
      Specified by:
      getNullCount in interface ValueVector
      Returns:
      the number of null elements.
    • isSafe

      public boolean isSafe(int index)
      Check if the given index is within the current value capacity of the vector.
      Parameters:
      index - position to check
      Returns:
      true if index is within the current value capacity
    • isNull

      public boolean isNull(int index)
      Check if element at given index is null.
      Specified by:
      isNull in interface ValueVector
      Parameters:
      index - position of element
      Returns:
      true if element at given index is null
    • isSet

      public int isSet(int index)
      Same as isNull(int).
      Parameters:
      index - position of element
      Returns:
      1 if element at given index is not null, 0 otherwise
    • getValueCount

      public int getValueCount()
      Get the value count of vector. This will always be zero unless setValueCount(int) has been called prior to calling this.
      Specified by:
      getValueCount in interface ValueVector
      Returns:
      valueCount for the vector
    • setValueCount

      public void setValueCount(int valueCount)
      Sets the value count for the vector.
      Specified by:
      setValueCount in interface ValueVector
      Parameters:
      valueCount - value count
    • fillEmpties

      public void fillEmpties(int index)
      Create holes in the vector upto the given index (exclusive). Holes will be created from the current last set position in the vector.
      Specified by:
      fillEmpties in interface VariableWidthFieldVector
      Parameters:
      index - target index
    • setLastSet

      public void setLastSet(int value)
      Set the index of last non-null element in the vector. It is important to call this method with appropriate value before calling setValueCount(int).
      Specified by:
      setLastSet in interface VariableWidthFieldVector
      Parameters:
      value - desired index of last non-null element.
    • getLastSet

      public int getLastSet()
      Get the index of last non-null element in the vector.
      Specified by:
      getLastSet in interface VariableWidthFieldVector
      Returns:
      index of the last non-null element
    • setIndexDefined

      public void setIndexDefined(int index)
      Mark the particular position in the vector as non-null.
      Specified by:
      setIndexDefined in interface VectorDefinitionSetter
      Parameters:
      index - position of the element.
    • setValueLengthSafe

      public void setValueLengthSafe(int index, int length)
      Sets the value length for an element.
      Specified by:
      setValueLengthSafe in interface VariableWidthFieldVector
      Parameters:
      index - position of the element to set
      length - length of the element
    • getValueLength

      public int getValueLength(int index)
      Get the variable length element at specified index as Text.
      Specified by:
      getValueLength in interface VariableWidthFieldVector
      Parameters:
      index - position of element to get
      Returns:
      greater than 0 length for non-null element, 0 otherwise
    • set

      public void set(int index, byte[] value)
      Set the variable length element at the specified index to the supplied byte array. This is same as using set(int, byte[], int, int) with start as 0 and length as value.length
      Specified by:
      set in interface VariableWidthFieldVector
      Parameters:
      index - position of the element to set
      value - array of bytes to write
    • setSafe

      public void setSafe(int index, byte[] value)
      Same as set(int, byte[]) except that it handles the case where index and length of new element are beyond the existing capacity of the vector.
      Specified by:
      setSafe in interface VariableWidthFieldVector
      Parameters:
      index - position of the element to set
      value - array of bytes to write
    • set

      public void set(int index, byte[] value, int start, int length)
      Set the variable length element at the specified index to the supplied byte array.
      Specified by:
      set in interface VariableWidthFieldVector
      Parameters:
      index - position of the element to set
      value - array of bytes to write
      start - start index in array of bytes
      length - length of data in array of bytes
    • setSafe

      public void setSafe(int index, byte[] value, int start, int length)
      Same as set(int, byte[], int, int) except that it handles the case where index and length of new element are beyond the existing capacity of the vector.
      Specified by:
      setSafe in interface VariableWidthFieldVector
      Parameters:
      index - position of the element to set
      value - array of bytes to write
      start - start index in array of bytes
      length - length of data in array of bytes
    • set

      public void set(int index, ByteBuffer value, int start, int length)
      Set the variable length element at the specified index to the content in supplied ByteBuffer.
      Specified by:
      set in interface VariableWidthFieldVector
      Parameters:
      index - position of the element to set
      value - ByteBuffer with data
      start - start index in ByteBuffer
      length - length of data in ByteBuffer
    • setSafe

      public void setSafe(int index, ByteBuffer value, int start, int length)
      Same as set(int, ByteBuffer, int, int) except that it handles the case where index and length of new element are beyond the existing capacity of the vector.
      Specified by:
      setSafe in interface VariableWidthFieldVector
      Parameters:
      index - position of the element to set
      value - ByteBuffer with data
      start - start index in ByteBuffer
      length - length of data in ByteBuffer
    • setNull

      public void setNull(int index)
      Set the element at the given index to null.
      Specified by:
      setNull in interface FieldVector
      Parameters:
      index - position of element
    • set

      public void set(int index, int isSet, long start, long end, ArrowBuf buffer)
      Store the given value at a particular position in the vector. isSet indicates whether the value is NULL or not.
      Parameters:
      index - position of the new value
      isSet - 0 for NULL value, 1 otherwise
      start - start position of data in buffer
      end - end position of data in buffer
      buffer - data buffer containing the variable width element to be stored in the vector
    • setSafe

      public void setSafe(int index, int isSet, long start, long end, ArrowBuf buffer)
      Same as set(int, int, long, long, ArrowBuf) except that it handles the case when index is greater than or equal to current value capacity of the vector.
      Parameters:
      index - position of the new value
      isSet - 0 for NULL value, 1 otherwise
      start - start position of data in buffer
      end - end position of data in buffer
      buffer - data buffer containing the variable width element to be stored in the vector
    • set

      public void set(int index, long start, int length, ArrowBuf buffer)
      Store the given value at a particular position in the vector. isSet indicates whether the value is NULL or not.
      Parameters:
      index - position of the new value
      start - start position of data in buffer
      length - length of data in buffer
      buffer - data buffer containing the variable width element to be stored in the vector
    • setSafe

      public void setSafe(int index, long start, int length, ArrowBuf buffer)
      Same as set(int, int, long, long, ArrowBuf) except that it handles the case when index is greater than or equal to current value capacity of the vector.
      Parameters:
      index - position of the new value
      start - start position of data in buffer
      length - length of data in buffer
      buffer - data buffer containing the variable width element to be stored in the vector
    • fillHoles

      protected final void fillHoles(int index)
    • setBytes

      protected final void setBytes(int index, byte[] value, int start, int length)
    • getStartOffset

      protected final long getStartOffset(int index)
      Gets the starting offset of a record, given its index.
      Parameters:
      index - index of the record.
      Returns:
      the starting offset of the record.
    • handleSafe

      protected final void handleSafe(int index, int dataLength)
    • get

      public static byte[] get(ArrowBuf data, ArrowBuf offset, int index)
      Method used by Json Writer to read a variable width element from the variable width vector and write to Json.

      This method should not be used externally.

      Parameters:
      data - buffer storing the variable width vector elements
      offset - buffer storing the offsets of variable width vector elements
      index - position of the element in the vector
      Returns:
      array of bytes
    • set

      public static ArrowBuf set(ArrowBuf buffer, BufferAllocator allocator, int valueCount, int index, long value)
      Method used by Json Reader to explicitly set the offsets of the variable width vector data. The method takes care of allocating the memory for offsets if the caller hasn't done so.

      This method should not be used externally.

      Parameters:
      buffer - ArrowBuf to store offsets for variable width elements
      allocator - memory allocator
      valueCount - number of elements
      index - position of the element
      value - offset of the element
      Returns:
      buffer holding the offsets
    • copyFrom

      public void copyFrom(int fromIndex, int thisIndex, ValueVector from)
      Copy a cell value from a particular index in source vector to a particular position in this vector.
      Specified by:
      copyFrom in interface ValueVector
      Overrides:
      copyFrom in class BaseValueVector
      Parameters:
      fromIndex - position to copy from in source vector
      thisIndex - position to copy to in this vector
      from - source vector
    • copyFromSafe

      public void copyFromSafe(int fromIndex, int thisIndex, ValueVector from)
      Same as copyFrom(int, int, ValueVector) except that it handles the case when the capacity of the vector needs to be expanded before copy.
      Specified by:
      copyFromSafe in interface ValueVector
      Overrides:
      copyFromSafe in class BaseValueVector
      Parameters:
      fromIndex - position to copy from in source vector
      thisIndex - position to copy to in this vector
      from - source vector
    • getDataPointer

      public ArrowBufPointer getDataPointer(int index)
      Description copied from interface: ElementAddressableVector
      Gets the pointer for the data at the given index.
      Specified by:
      getDataPointer in interface ElementAddressableVector
      Parameters:
      index - the index for the data.
      Returns:
      the pointer to the data.
    • getDataPointer

      public ArrowBufPointer getDataPointer(int index, ArrowBufPointer reuse)
      Description copied from interface: ElementAddressableVector
      Gets the pointer for the data at the given index.
      Specified by:
      getDataPointer in interface ElementAddressableVector
      Parameters:
      index - the index for the data.
      reuse - the data pointer to fill, this avoids creating a new pointer object.
      Returns:
      the pointer to the data, it should be the same one as the input parameter
    • hashCode

      public int hashCode(int index)
      Description copied from interface: ValueVector
      Returns hashCode of element in index with the default hasher.
      Specified by:
      hashCode in interface ValueVector
    • hashCode

      public int hashCode(int index, ArrowBufHasher hasher)
      Description copied from interface: ValueVector
      Returns hashCode of element in index with the given hasher.
      Specified by:
      hashCode in interface ValueVector
    • accept

      public <OUT, IN> OUT accept(VectorVisitor<OUT,IN> visitor, IN value)
      Description copied from interface: ValueVector
      Accept a generic VectorVisitor and return the result.
      Specified by:
      accept in interface ValueVector
      Type Parameters:
      OUT - the output result type.
      IN - the input data together with visitor.
    • getEndOffset

      protected final long getEndOffset(int index)