Module org.apache.arrow.vector
Package org.apache.arrow.vector.complex
Class BaseRepeatedValueVector
java.lang.Object
org.apache.arrow.vector.BaseValueVector
org.apache.arrow.vector.complex.BaseRepeatedValueVector
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Iterable<ValueVector>
,BaseListVector
,RepeatedValueVector
,DensityAwareVector
,FieldVector
,ValueVector
- Direct Known Subclasses:
ListVector
public abstract class BaseRepeatedValueVector
extends BaseValueVector
implements RepeatedValueVector, BaseListVector
Base class for Vectors that contain repeated values.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
static final FieldVector
protected String
static final byte
protected long
protected ArrowBuf
protected final CallBack
protected int
protected FieldVector
Fields inherited from class org.apache.arrow.vector.BaseValueVector
allocator, fieldReader, INITIAL_VALUE_ALLOCATION, MAX_ALLOCATION_SIZE, MAX_ALLOCATION_SIZE_PROPERTY
Fields inherited from interface org.apache.arrow.vector.complex.RepeatedValueVector
DEFAULT_REPEAT_PER_RECORD
-
Constructor Summary
ModifierConstructorDescriptionprotected
BaseRepeatedValueVector
(String name, BufferAllocator allocator, FieldVector vector, CallBack callBack) protected
BaseRepeatedValueVector
(String name, BufferAllocator allocator, CallBack callBack) -
Method Summary
Modifier and TypeMethodDescription<T extends ValueVector>
AddOrGetResult<T>addOrGetVector
(FieldType fieldType) Initialize the data vector (and execute callback) if it hasn't already been done, returns the data vector.boolean
Allocates new buffers.protected ArrowBuf
allocateOffsetBuffer
(long size) void
clear()
Release any owned ArrowBuf and reset the ValueVector to the initial state.ArrowBuf[]
getBuffers
(boolean clear) Return the underlying buffers associated with this vector.int
Get the number of bytes used by this vector.int
getBufferSizeFor
(int valueCount) Returns the number of bytes that is used by this vector if it holds the given number of values.Get the data vector.int
int
getInnerValueCountAt
(int index) Returns the value count for inner data vector at a particular index.getName()
Gets the name of the vector.protected int
Deprecated.This API will be removed, as the current implementations no longer hold inner offset vectors.int
Returns the maximum number of values that can be stored in this vector instance.int
Gets the number of values.abstract boolean
isEmpty
(int index) Return if value at index is empty.iterator()
void
reAlloc()
Allocate new buffer with double capacity, and copy data into the new buffer.protected void
protected void
void
reset()
Reset the ValueVector to the initial state without releasing any owned ArrowBuf.void
setInitialCapacity
(int numRecords) Set the initial record capacity.void
setInitialCapacity
(int numRecords, double density) Specialized version of setInitialCapacity() for ListVector.void
setInitialTotalCapacity
(int numRecords, int totalNumberOfElements) Specialized version of setInitialTotalCapacity() for ListVector.void
setValueCount
(int valueCount) Preallocates the number of repeated values.int
size()
Get value indicating if inner vector is set.int
startNewValue
(int index) Starts a new repeated value.Methods inherited from class org.apache.arrow.vector.BaseValueVector
checkBufRefs, close, copyFrom, copyFromSafe, getAllocator, getReader, getReaderImpl, getTransferPair, getValidityBufferSizeFromCount, releaseBuffer, toString, transferBuffer
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.arrow.vector.complex.BaseListVector
getElementEndIndex, getElementStartIndex
Methods inherited from interface org.apache.arrow.vector.FieldVector
exportBuffer, exportCDataBuffers, getChildrenFromFields, getDataBufferAddress, getExportedCDataBufferCount, getFieldBuffers, getFieldInnerVectors, getOffsetBufferAddress, getValidityBufferAddress, initializeChildrenFromFields, loadFieldBuffers, setNull
Methods inherited from interface java.lang.Iterable
forEach, spliterator
Methods inherited from interface org.apache.arrow.vector.ValueVector
accept, allocateNew, close, copyFrom, copyFromSafe, getAllocator, getDataBuffer, getField, getMinorType, getNullCount, getObject, getOffsetBuffer, getReader, getTransferPair, getTransferPair, getTransferPair, getTransferPair, getTransferPair, getValidityBuffer, hashCode, hashCode, isNull, makeTransferPair, validate, validateFull
-
Field Details
-
DEFAULT_DATA_VECTOR
-
DATA_VECTOR_NAME
- See Also:
-
OFFSET_WIDTH
public static final byte OFFSET_WIDTH- See Also:
-
offsetBuffer
-
vector
-
repeatedCallBack
-
valueCount
protected int valueCount -
offsetAllocationSizeInBytes
protected long offsetAllocationSizeInBytes -
defaultDataVectorName
-
-
Constructor Details
-
BaseRepeatedValueVector
-
BaseRepeatedValueVector
protected BaseRepeatedValueVector(String name, BufferAllocator allocator, FieldVector vector, CallBack callBack)
-
-
Method Details
-
getName
Description copied from interface:ValueVector
Gets the name of the vector.- Specified by:
getName
in interfaceValueVector
- Specified by:
getName
in classBaseValueVector
- Returns:
- the name of the vector.
-
allocateNewSafe
public boolean allocateNewSafe()Description copied from interface:ValueVector
Allocates new buffers. ValueVector implements logic to determine how much to allocate.- Specified by:
allocateNewSafe
in interfaceValueVector
- Returns:
- Returns true if allocation was successful.
-
allocateOffsetBuffer
-
reAlloc
public void reAlloc()Description copied from interface:ValueVector
Allocate new buffer with double capacity, and copy data into the new buffer. Replace vector's buffer with new buffer, and release old one- Specified by:
reAlloc
in interfaceValueVector
-
reallocOffsetBuffer
protected void reallocOffsetBuffer() -
getOffsetVector
Deprecated.This API will be removed, as the current implementations no longer hold inner offset vectors.Get the offset vector.- Specified by:
getOffsetVector
in interfaceRepeatedValueVector
- Returns:
- the underlying offset vector or null if none exists.
-
getDataVector
Description copied from interface:RepeatedValueVector
Get the data vector.- Specified by:
getDataVector
in interfaceRepeatedValueVector
- Returns:
- the underlying data vector or null if none exists.
-
setInitialCapacity
public void setInitialCapacity(int numRecords) Description copied from interface:ValueVector
Set the initial record capacity.- Specified by:
setInitialCapacity
in interfaceValueVector
- Parameters:
numRecords
- the initial record capacity.
-
setInitialCapacity
public void setInitialCapacity(int numRecords, double density) Specialized version of setInitialCapacity() for ListVector. This is used by some callers when they want to explicitly control and be conservative about memory allocated for inner data vector. This is very useful when we are working with memory constraints for a query and have a fixed amount of memory reserved for the record batch. In such cases, we are likely to face OOM or related problems when we reserve memory for a record batch with value count x and do setInitialCapacity(x) such that each vector allocates only what is necessary and not the default amount but the multiplier forces the memory requirement to go beyond what was needed.- Specified by:
setInitialCapacity
in interfaceDensityAwareVector
- Parameters:
numRecords
- value countdensity
- density of ListVector. Density is the average size of list per position in the List vector. For example, a density value of 10 implies each position in the list vector has a list of 10 values. A density value of 0.1 implies out of 10 positions in the list vector, 1 position has a list of size 1 and remaining positions are null (no lists) or empty lists. This helps in tightly controlling the memory we provision for inner data vector.
-
setInitialTotalCapacity
public void setInitialTotalCapacity(int numRecords, int totalNumberOfElements) Specialized version of setInitialTotalCapacity() for ListVector. This is used by some callers when they want to explicitly control and be conservative about memory allocated for inner data vector. This is very useful when we are working with memory constraints for a query and have a fixed amount of memory reserved for the record batch. In such cases, we are likely to face OOM or related problems when we reserve memory for a record batch with value count x and do setInitialCapacity(x) such that each vector allocates only what is necessary and not the default amount but the multiplier forces the memory requirement to go beyond what was needed.- Parameters:
numRecords
- value counttotalNumberOfElements
- the total number of elements to to allow for in this vector across all records.
-
getValueCapacity
public int getValueCapacity()Description copied from interface:ValueVector
Returns the maximum number of values that can be stored in this vector instance.- Specified by:
getValueCapacity
in interfaceValueVector
- Returns:
- the maximum number of values that can be stored in this vector instance.
-
getOffsetBufferValueCapacity
protected int getOffsetBufferValueCapacity() -
getBufferSize
public int getBufferSize()Description copied from interface:ValueVector
Get the number of bytes used by this vector.- Specified by:
getBufferSize
in interfaceValueVector
- Returns:
- the number of bytes that is used by this vector instance.
-
getBufferSizeFor
public int getBufferSizeFor(int valueCount) Description copied from interface:ValueVector
Returns the number of bytes that is used by this vector if it holds the given number of values. The result will be the same as if setValueCount() were called, followed by calling getBufferSize(), but without any of the closing side-effects that setValueCount() implies wrt finishing off the population of a vector. Some operations might wish to use this to determine how much memory has been used by a vector so far, even though it is not finished being populated.- Specified by:
getBufferSizeFor
in interfaceValueVector
- Parameters:
valueCount
- the number of values to assume this vector contains- Returns:
- the buffer size if this vector is holding valueCount values
-
iterator
- Specified by:
iterator
in interfaceIterable<ValueVector>
- Overrides:
iterator
in classBaseValueVector
-
clear
public void clear()Description copied from interface:ValueVector
Release any owned ArrowBuf and reset the ValueVector to the initial state. If the vector has any child vectors, they will also be cleared.- Specified by:
clear
in interfaceValueVector
- Overrides:
clear
in classBaseValueVector
-
reset
public void reset()Description copied from interface:ValueVector
Reset the ValueVector to the initial state without releasing any owned ArrowBuf. Buffer capacities will remain unchanged and any previous data will be zeroed out. This includes buffers for data, validity, offset, etc. If the vector has any child vectors, they will also be reset.- Specified by:
reset
in interfaceValueVector
-
getBuffers
Return the underlying buffers associated with this vector. Note that this doesn't impact the reference counts for this buffer, so it only should be used for in-context access. Also note that this buffer changes regularly, thus external classes shouldn't hold a reference to it (unless they change it).- Specified by:
getBuffers
in interfaceValueVector
- Parameters:
clear
- Whether to clear vector before returning, the buffers will still be refcounted but the returned array will be the only reference to them. Also, this won't clear the child buffers.- Returns:
- The underlying
buffers
that is used by this vector instance.
-
size
public int size()Get value indicating if inner vector is set.- Returns:
- 1 if inner vector is explicitly set via #addOrGetVector else 0
-
addOrGetVector
Initialize the data vector (and execute callback) if it hasn't already been done, returns the data vector. -
replaceDataVector
-
getValueCount
public int getValueCount()Description copied from interface:ValueVector
Gets the number of values.- Specified by:
getValueCount
in interfaceValueVector
- Returns:
- number of values in the vector
-
getInnerValueCount
public int getInnerValueCount() -
getInnerValueCountAt
public int getInnerValueCountAt(int index) Returns the value count for inner data vector at a particular index. -
isEmpty
public abstract boolean isEmpty(int index) Return if value at index is empty. -
startNewValue
public int startNewValue(int index) Starts a new repeated value. -
setValueCount
public void setValueCount(int valueCount) Preallocates the number of repeated values.- Specified by:
setValueCount
in interfaceValueVector
-