Package org.apache.arrow.vector.complex
Class LargeListVector
java.lang.Object
org.apache.arrow.vector.BaseValueVector
org.apache.arrow.vector.complex.LargeListVector
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Iterable<ValueVector>
,PromotableVector
,RepeatedValueVector
,DensityAwareVector
,FieldVector
,ValueIterableVector<List<?>>
,ValueVector
public class LargeListVector
extends BaseValueVector
implements RepeatedValueVector, FieldVector, PromotableVector, ValueIterableVector<List<?>>
A list vector contains lists of a specific type of elements. Its structure contains 3 elements.
- A validity buffer.
- An offset buffer, that denotes lists boundaries.
- A child data vector that contains the elements of lists.
WARNING: Currently Arrow in Java doesn't support 64-bit vectors. This class follows the expected behaviour of a LargeList but doesn't actually support allocating a 64-bit vector. It has little use until 64-bit vectors are supported and should be used with caution. todo review checkedCastToInt usage in this class. Once int64 indexed vectors are supported these checks aren't needed.
-
Field Summary
Modifier and TypeFieldDescriptionprotected final CallBack
static final String
static final FieldVector
protected String
static final byte
protected long
protected ArrowBuf
protected UnionLargeListReader
protected ArrowBuf
protected int
protected FieldVector
Fields inherited from class org.apache.arrow.vector.BaseValueVector
allocator, fieldReader, INITIAL_VALUE_ALLOCATION, MAX_ALLOCATION_SIZE, MAX_ALLOCATION_SIZE_PROPERTY
Fields inherited from interface org.apache.arrow.vector.complex.RepeatedValueVector
DEFAULT_REPEAT_PER_RECORD
-
Constructor Summary
ConstructorDescriptionLargeListVector
(String name, BufferAllocator allocator, FieldType fieldType, CallBack callBack) Constructs a new instance.LargeListVector
(Field field, BufferAllocator allocator, CallBack callBack) Creates a new instance. -
Method Summary
Modifier and TypeMethodDescription<OUT,
IN> OUT accept
(VectorVisitor<OUT, IN> visitor, IN value) Accept a genericVectorVisitor
and return the result.<T extends ValueVector>
AddOrGetResult<T>addOrGetVector
(FieldType fieldType) Initialize the data vector (and execute callback) if it hasn't already been done, returns the data vector.void
Same asallocateNewSafe()
.boolean
Allocate memory for the vector.protected ArrowBuf
allocateOffsetBuffer
(long size) void
clear()
Release any owned ArrowBuf and reset the ValueVector to the initial state.void
copyFrom
(int inIndex, int outIndex, ValueVector from) Copy a cell value from a particular index in source vector to a particular position in this vector.void
copyFromSafe
(int inIndex, int outIndex, ValueVector from) Same ascopyFrom(int, int, ValueVector)
except that it handles the case when the capacity of the vector needs to be expanded before copy.static LargeListVector
empty
(String name, BufferAllocator allocator) void
endValue
(int index, long size) End the current value.void
exportCDataBuffers
(List<ArrowBuf> buffers, ArrowBuf buffersPtr, long nullValue) Export the buffers of the fields for C Data Interface.ArrowBuf[]
getBuffers
(boolean clear) Return the underlying buffers associated with this vector.int
Get the size (number of bytes) of underlying buffers used by this vector.int
getBufferSizeFor
(int valueCount) Returns the number of bytes that is used by this vector if it holds the given number of values.The returned list is the same size as the list passed to initializeChildrenFromFields.Gets the underlying buffer associated with data vector.long
Gets the starting address of the underlying buffer associated with data vector.Get the inner data vector for this list vector.double
Get the density of this ListVector.long
getElementEndIndex
(int index) long
getElementStartIndex
(int index) getField()
Get information about how this field is materialized.Get the buffers belonging to this vector.Deprecated.This API will be removed as the current implementations no longer support inner vectors.int
getName()
Gets the name of the vector.int
Get the number of elements that are null in the vector.List<?>
getObject
(int index) Get the element in the list vector at a particular index.Gets the underlying buffer associated with offset vector.long
Gets the starting address of the underlying buffer associated with offset vector.protected int
Deprecated.This API will be removed, as the current implementations no longer hold inner offset vectors.Default implementation to create a reader for the vector.protected FieldReader
Each vector has a different reader that implements the FieldReader interface.getTransferPair
(String ref, BufferAllocator allocator) To transfer quota responsibility.getTransferPair
(String ref, BufferAllocator allocator, CallBack callBack) To transfer quota responsibility.getTransferPair
(Field field, BufferAllocator allocator) To transfer quota responsibility.getTransferPair
(Field field, BufferAllocator allocator, CallBack callBack) To transfer quota responsibility.Gets the underlying buffer associated with validity vector.long
Gets the starting address of the underlying buffer associated with validity vector.int
Get the current value capacity for the vector.int
Gets the number of values.int
hashCode
(int index) Returns hashCode of element in index with the default hasher.int
hashCode
(int index, ArrowBufHasher hasher) Returns hashCode of element in index with the given hasher.void
initializeChildrenFromFields
(List<Field> children) Initializes the child vectors to be later loaded with loadBuffers.protected void
boolean
isEmpty
(int index) Check if element at given index is empty list.boolean
isNull
(int index) Check if element at given index is null.int
isSet
(int index) Same asisNull(int)
.void
loadFieldBuffers
(ArrowFieldNode fieldNode, List<ArrowBuf> ownBuffers) Load the buffers of this vector with provided source buffers.makeTransferPair
(ValueVector target) Makes a new transfer pair used to transfer underlying buffers.void
reAlloc()
Resize the vector to increase the capacity.protected void
protected void
void
reset()
Reset the ValueVector to the initial state without releasing any owned ArrowBuf.void
setInitialCapacity
(int numRecords) Set the initial record capacity.void
setInitialCapacity
(int numRecords, double density) Specialized version of setInitialCapacity() for ListVector.void
setInitialTotalCapacity
(int numRecords, int totalNumberOfElements) Specialized version of setInitialTotalCapacity() for ListVector.void
setLastSet
(int value) void
setNotNull
(int index) Sets the list at index to be not-null.void
setNull
(int index) Sets list at index to be null.void
setValueCount
(int valueCount) Sets the value count for the vector.long
startNewValue
(long index) Start a new value in the list vector.Methods inherited from class org.apache.arrow.vector.BaseValueVector
checkBufRefs, close, getAllocator, getTransferPair, getValidityBufferSizeFromCount, iterator, releaseBuffer, toString, transferBuffer
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.arrow.vector.FieldVector
exportBuffer, getExportedCDataBufferCount
Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator
Methods inherited from interface org.apache.arrow.vector.ValueIterableVector
getValueIterable, getValueIterator
Methods inherited from interface org.apache.arrow.vector.ValueVector
close, getAllocator, getTransferPair, validate, validateFull
-
Field Details
-
DEFAULT_DATA_VECTOR
-
DATA_VECTOR_NAME
- See Also:
-
OFFSET_WIDTH
public static final byte OFFSET_WIDTH- See Also:
-
offsetBuffer
-
vector
-
callBack
-
valueCount
protected int valueCount -
offsetAllocationSizeInBytes
protected long offsetAllocationSizeInBytes -
defaultDataVectorName
-
validityBuffer
-
reader
-
-
Constructor Details
-
LargeListVector
public LargeListVector(String name, BufferAllocator allocator, FieldType fieldType, CallBack callBack) Constructs a new instance.- Parameters:
name
- The name of the instance.allocator
- The allocator to use for allocating/reallocating buffers.fieldType
- The type of this list.callBack
- A schema change callback.
-
LargeListVector
Creates a new instance.- Parameters:
field
- The field materialized by this vector.allocator
- The allocator to use for creating/reallocating buffers for the vector.callBack
- A schema change callback.
-
-
Method Details
-
empty
-
initializeChildrenFromFields
Description copied from interface:FieldVector
Initializes the child vectors to be later loaded with loadBuffers.- Specified by:
initializeChildrenFromFields
in interfaceFieldVector
- Parameters:
children
- the schema
-
setInitialCapacity
public void setInitialCapacity(int numRecords) Description copied from interface:ValueVector
Set the initial record capacity.- Specified by:
setInitialCapacity
in interfaceValueVector
- Parameters:
numRecords
- the initial record capacity.
-
setInitialCapacity
public void setInitialCapacity(int numRecords, double density) Specialized version of setInitialCapacity() for ListVector. This is used by some callers when they want to explicitly control and be conservative about memory allocated for inner data vector. This is very useful when we are working with memory constraints for a query and have a fixed amount of memory reserved for the record batch. In such cases, we are likely to face OOM or related problems when we reserve memory for a record batch with value count x and do setInitialCapacity(x) such that each vector allocates only what is necessary and not the default amount but the multiplier forces the memory requirement to go beyond what was needed.- Specified by:
setInitialCapacity
in interfaceDensityAwareVector
- Parameters:
numRecords
- value countdensity
- density of ListVector. Density is the average size of list per position in the List vector. For example, a density value of 10 implies each position in the list vector has a list of 10 values. A density value of 0.1 implies out of 10 positions in the list vector, 1 position has a list of size 1 and remaining positions are null (no lists) or empty lists. This helps in tightly controlling the memory we provision for inner data vector.
-
setInitialTotalCapacity
public void setInitialTotalCapacity(int numRecords, int totalNumberOfElements) Specialized version of setInitialTotalCapacity() for ListVector. This is used by some callers when they want to explicitly control and be conservative about memory allocated for inner data vector. This is very useful when we are working with memory constraints for a query and have a fixed amount of memory reserved for the record batch. In such cases, we are likely to face OOM or related problems when we reserve memory for a record batch with value count x and do setInitialCapacity(x) such that each vector allocates only what is necessary and not the default amount but the multiplier forces the memory requirement to go beyond what was needed.- Parameters:
numRecords
- value counttotalNumberOfElements
- the total number of elements to to allow for in this vector across all records.
-
getDensity
public double getDensity()Get the density of this ListVector.- Returns:
- density
-
getChildrenFromFields
Description copied from interface:FieldVector
The returned list is the same size as the list passed to initializeChildrenFromFields.- Specified by:
getChildrenFromFields
in interfaceFieldVector
- Returns:
- the children according to schema (empty for primitive types)
-
loadFieldBuffers
Load the buffers of this vector with provided source buffers. The caller manages the source buffers and populates them before invoking this method.- Specified by:
loadFieldBuffers
in interfaceFieldVector
- Parameters:
fieldNode
- the fieldNode indicating the value countownBuffers
- the buffers for this Field (own buffers only, children not included)
-
getFieldBuffers
Get the buffers belonging to this vector.- Specified by:
getFieldBuffers
in interfaceFieldVector
- Returns:
- the inner buffers.
-
exportCDataBuffers
Export the buffers of the fields for C Data Interface. This method traverse the buffers and export buffer and buffer's memory address into a list of buffers and a pointer to the list of buffers.- Specified by:
exportCDataBuffers
in interfaceFieldVector
-
getFieldInnerVectors
Deprecated.This API will be removed as the current implementations no longer support inner vectors.Get the inner vectors.- Specified by:
getFieldInnerVectors
in interfaceFieldVector
- Returns:
- the inner vectors for this field as defined by the TypeLayout
-
allocateNew
Same asallocateNewSafe()
.- Specified by:
allocateNew
in interfaceValueVector
- Throws:
OutOfMemoryException
- Thrown if no memory can be allocated.
-
allocateNewSafe
public boolean allocateNewSafe()Allocate memory for the vector. We internally use a default value count of 4096 to allocate memory for at least these many elements in the vector.- Specified by:
allocateNewSafe
in interfaceValueVector
- Returns:
- false if memory allocation fails, true otherwise.
-
allocateOffsetBuffer
-
reAlloc
public void reAlloc()Resize the vector to increase the capacity. The internal behavior is to double the current value capacity.- Specified by:
reAlloc
in interfaceValueVector
-
reallocOffsetBuffer
protected void reallocOffsetBuffer() -
copyFromSafe
Same ascopyFrom(int, int, ValueVector)
except that it handles the case when the capacity of the vector needs to be expanded before copy.- Specified by:
copyFromSafe
in interfaceValueVector
- Overrides:
copyFromSafe
in classBaseValueVector
- Parameters:
inIndex
- position to copy from in source vectoroutIndex
- position to copy to in this vectorfrom
- source vector
-
copyFrom
Copy a cell value from a particular index in source vector to a particular position in this vector.- Specified by:
copyFrom
in interfaceValueVector
- Overrides:
copyFrom
in classBaseValueVector
- Parameters:
inIndex
- position to copy from in source vectoroutIndex
- position to copy to in this vectorfrom
- source vector
-
getOffsetVector
Deprecated.This API will be removed, as the current implementations no longer hold inner offset vectors.Get the offset vector.- Specified by:
getOffsetVector
in interfaceRepeatedValueVector
- Returns:
- the underlying offset vector or null if none exists.
-
getDataVector
Get the inner data vector for this list vector.- Specified by:
getDataVector
in interfaceRepeatedValueVector
- Returns:
- data vector
-
getTransferPair
Description copied from interface:ValueVector
To transfer quota responsibility.- Specified by:
getTransferPair
in interfaceValueVector
- Parameters:
ref
- the name of the vectorallocator
- the target allocator- Returns:
- a
transfer pair
, creating a new target vector of the same type.
-
getTransferPair
Description copied from interface:ValueVector
To transfer quota responsibility.- Specified by:
getTransferPair
in interfaceValueVector
- Parameters:
field
- the Field object used by the target vectorallocator
- the target allocator- Returns:
- a
transfer pair
, creating a new target vector of the same type.
-
getTransferPair
Description copied from interface:ValueVector
To transfer quota responsibility.- Specified by:
getTransferPair
in interfaceValueVector
- Parameters:
ref
- the name of the vectorallocator
- the target allocatorcallBack
- A schema change callback.- Returns:
- a
transfer pair
, creating a new target vector of the same type.
-
getTransferPair
Description copied from interface:ValueVector
To transfer quota responsibility.- Specified by:
getTransferPair
in interfaceValueVector
- Parameters:
field
- the Field object used by the target vectorallocator
- the target allocatorcallBack
- A schema change callback.- Returns:
- a
transfer pair
, creating a new target vector of the same type.
-
makeTransferPair
Description copied from interface:ValueVector
Makes a new transfer pair used to transfer underlying buffers.- Specified by:
makeTransferPair
in interfaceValueVector
- Parameters:
target
- the target for the transfer- Returns:
- a new
transfer pair
that is used to transfer underlying buffers into the target vector.
-
getValidityBufferAddress
public long getValidityBufferAddress()Description copied from interface:FieldVector
Gets the starting address of the underlying buffer associated with validity vector.- Specified by:
getValidityBufferAddress
in interfaceFieldVector
- Returns:
- buffer address
-
getDataBufferAddress
public long getDataBufferAddress()Description copied from interface:FieldVector
Gets the starting address of the underlying buffer associated with data vector.- Specified by:
getDataBufferAddress
in interfaceFieldVector
- Returns:
- buffer address
-
getOffsetBufferAddress
public long getOffsetBufferAddress()Description copied from interface:FieldVector
Gets the starting address of the underlying buffer associated with offset vector.- Specified by:
getOffsetBufferAddress
in interfaceFieldVector
- Returns:
- buffer address
-
getValidityBuffer
Description copied from interface:ValueVector
Gets the underlying buffer associated with validity vector.- Specified by:
getValidityBuffer
in interfaceValueVector
- Returns:
- buffer
-
getDataBuffer
Description copied from interface:ValueVector
Gets the underlying buffer associated with data vector.- Specified by:
getDataBuffer
in interfaceValueVector
- Returns:
- buffer
-
getOffsetBuffer
Description copied from interface:ValueVector
Gets the underlying buffer associated with offset vector.- Specified by:
getOffsetBuffer
in interfaceValueVector
- Returns:
- buffer
-
getValueCount
public int getValueCount()Description copied from interface:ValueVector
Gets the number of values.- Specified by:
getValueCount
in interfaceValueVector
- Returns:
- number of values in the vector
-
hashCode
public int hashCode(int index) Description copied from interface:ValueVector
Returns hashCode of element in index with the default hasher.- Specified by:
hashCode
in interfaceValueVector
-
hashCode
Description copied from interface:ValueVector
Returns hashCode of element in index with the given hasher.- Specified by:
hashCode
in interfaceValueVector
-
accept
Description copied from interface:ValueVector
Accept a genericVectorVisitor
and return the result.- Specified by:
accept
in interfaceValueVector
- Type Parameters:
OUT
- the output result type.IN
- the input data together with visitor.
-
getWriter
-
replaceDataVector
-
promoteToUnion
- Specified by:
promoteToUnion
in interfacePromotableVector
-
getReaderImpl
Description copied from class:BaseValueVector
Each vector has a different reader that implements the FieldReader interface. Overridden methods must make sure to return the correct concrete reader implementation.- Specified by:
getReaderImpl
in classBaseValueVector
- Returns:
- Returns a lambda that initializes a reader when called.
-
getReader
Description copied from class:BaseValueVector
Default implementation to create a reader for the vector. Depends on the individual vector class' implementation ofBaseValueVector.getReaderImpl()
to initialize the reader appropriately.- Specified by:
getReader
in interfaceValueVector
- Overrides:
getReader
in classBaseValueVector
- Returns:
- Concrete instance of FieldReader by using double-checked locking.
-
addOrGetVector
Initialize the data vector (and execute callback) if it hasn't already been done, returns the data vector.- Specified by:
addOrGetVector
in interfacePromotableVector
-
getBufferSize
public int getBufferSize()Get the size (number of bytes) of underlying buffers used by this vector.- Specified by:
getBufferSize
in interfaceValueVector
- Returns:
- size of underlying buffers.
-
getBufferSizeFor
public int getBufferSizeFor(int valueCount) Description copied from interface:ValueVector
Returns the number of bytes that is used by this vector if it holds the given number of values. The result will be the same as if setValueCount() were called, followed by calling getBufferSize(), but without any of the closing side-effects that setValueCount() implies wrt finishing off the population of a vector. Some operations might wish to use this to determine how much memory has been used by a vector so far, even though it is not finished being populated.- Specified by:
getBufferSizeFor
in interfaceValueVector
- Parameters:
valueCount
- the number of values to assume this vector contains- Returns:
- the buffer size if this vector is holding valueCount values
-
getField
Description copied from interface:ValueVector
Get information about how this field is materialized.- Specified by:
getField
in interfaceValueVector
- Returns:
- the field corresponding to this vector
-
getMinorType
- Specified by:
getMinorType
in interfaceValueVector
-
getName
Description copied from interface:ValueVector
Gets the name of the vector.- Specified by:
getName
in interfaceValueVector
- Specified by:
getName
in classBaseValueVector
- Returns:
- the name of the vector.
-
clear
public void clear()Description copied from interface:ValueVector
Release any owned ArrowBuf and reset the ValueVector to the initial state. If the vector has any child vectors, they will also be cleared.- Specified by:
clear
in interfaceValueVector
- Overrides:
clear
in classBaseValueVector
-
reset
public void reset()Description copied from interface:ValueVector
Reset the ValueVector to the initial state without releasing any owned ArrowBuf. Buffer capacities will remain unchanged and any previous data will be zeroed out. This includes buffers for data, validity, offset, etc. If the vector has any child vectors, they will also be reset.- Specified by:
reset
in interfaceValueVector
-
getBuffers
Return the underlying buffers associated with this vector. Note that this doesn't impact the reference counts for this buffer so it only should be used for in-context access. Also note that this buffer changes regularly thus external classes shouldn't hold a reference to it (unless they change it).- Specified by:
getBuffers
in interfaceValueVector
- Parameters:
clear
- Whether to clear vector before returning; the buffers will still be refcounted but the returned array will be the only reference to them- Returns:
- The underlying
buffers
that is used by this vector instance.
-
invalidateReader
protected void invalidateReader() -
getObject
Get the element in the list vector at a particular index.- Specified by:
getObject
in interfaceValueVector
- Parameters:
index
- position of the element- Returns:
- Object at given position
-
isNull
public boolean isNull(int index) Check if element at given index is null.- Specified by:
isNull
in interfaceValueVector
- Parameters:
index
- position of element- Returns:
- true if element at given index is null, false otherwise
-
isEmpty
public boolean isEmpty(int index) Check if element at given index is empty list.- Parameters:
index
- position of element- Returns:
- true if element at given index is empty list or NULL, false otherwise
-
isSet
public int isSet(int index) Same asisNull(int)
.- Parameters:
index
- position of element- Returns:
- 1 if element at given index is not null, 0 otherwise
-
getNullCount
public int getNullCount()Get the number of elements that are null in the vector.- Specified by:
getNullCount
in interfaceValueVector
- Returns:
- the number of null elements.
-
getValueCapacity
public int getValueCapacity()Get the current value capacity for the vector.- Specified by:
getValueCapacity
in interfaceValueVector
- Returns:
- number of elements that vector can hold.
-
getOffsetBufferValueCapacity
protected int getOffsetBufferValueCapacity() -
setNotNull
public void setNotNull(int index) Sets the list at index to be not-null. Reallocates validity buffer if index is larger than current capacity. -
setNull
public void setNull(int index) Sets list at index to be null.- Specified by:
setNull
in interfaceFieldVector
- Parameters:
index
- position in vector
-
startNewValue
public long startNewValue(long index) Start a new value in the list vector.- Parameters:
index
- index of the value to start
-
endValue
public void endValue(int index, long size) End the current value.- Parameters:
index
- index of the value to endsize
- number of elements in the list that was written
-
setValueCount
public void setValueCount(int valueCount) Sets the value count for the vector.Important note: The underlying vector does not support 64-bit allocations yet. This may throw if attempting to hold larger than what a 32-bit vector can store.
- Specified by:
setValueCount
in interfaceValueVector
- Parameters:
valueCount
- value count
-
setLastSet
public void setLastSet(int value) -
getLastSet
public int getLastSet() -
getElementStartIndex
public long getElementStartIndex(int index) -
getElementEndIndex
public long getElementEndIndex(int index)
-