pub struct GenericByteViewBuilder<T: ByteViewType + ?Sized> {
views_builder: BufferBuilder<u128>,
null_buffer_builder: NullBufferBuilder,
completed: Vec<Buffer>,
in_progress: Vec<u8>,
block_size: BlockSizeGrowthStrategy,
string_tracker: Option<(HashTable<usize>, RandomState)>,
phantom: PhantomData<T>,
}
Expand description
A builder for GenericByteViewArray
A GenericByteViewArray
consists of a list of data blocks containing string data,
and a list of views into those buffers.
See examples on StringViewBuilder
and BinaryViewBuilder
This builder can be used in two ways
§Append Values
To avoid bump allocating, this builder allocates data in fixed size blocks, configurable
using GenericByteViewBuilder::with_fixed_block_size
. GenericByteViewBuilder::append_value
writes values larger than 12 bytes to the current in-progress block, with values smaller
than 12 bytes inlined into the views. If a value is appended that will not fit in the
in-progress block, it will be closed, and a new block of sufficient size allocated
§Append Views
Some use-cases may wish to reuse an existing allocation containing string data, for example,
when parsing data from a parquet data page. In such a case entire blocks can be appended
using GenericByteViewBuilder::append_block
and then views into this block appended
using GenericByteViewBuilder::try_append_view
Fields§
§views_builder: BufferBuilder<u128>
§null_buffer_builder: NullBufferBuilder
§completed: Vec<Buffer>
§in_progress: Vec<u8>
§block_size: BlockSizeGrowthStrategy
§string_tracker: Option<(HashTable<usize>, RandomState)>
Some if deduplicating strings
map <string hash> -> <index to the views>
phantom: PhantomData<T>
Implementations§
Source§impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T>
impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T>
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new GenericByteViewBuilder
.
Sourcepub fn with_capacity(capacity: usize) -> Self
pub fn with_capacity(capacity: usize) -> Self
Creates a new GenericByteViewBuilder
with space for capacity
string values.
Sourcepub fn with_fixed_block_size(self, block_size: u32) -> Self
pub fn with_fixed_block_size(self, block_size: u32) -> Self
Set a fixed buffer size for variable length strings
The block size is the size of the buffer used to store values greater than 12 bytes. The builder allocates new buffers when the current buffer is full.
By default the builder balances buffer size and buffer count by growing buffer size exponentially from 8KB up to 2MB. The first buffer allocated is 8KB, then 16KB, then 32KB, etc up to 2MB.
If this method is used, any new buffers allocated are
exactly this size. This can be useful for advanced users
that want to control the memory usage and buffer count.
See https://github.com/apache/arrow-rs/issues/6094 for more details on the implications.
Sourcepub fn with_block_size(self, block_size: u32) -> Self
👎Deprecated: Use with_fixed_block_size
instead
pub fn with_block_size(self, block_size: u32) -> Self
with_fixed_block_size
insteadOverride the size of buffers to allocate for holding string data
Use with_fixed_block_size
instead.
Sourcepub fn with_deduplicate_strings(self) -> Self
pub fn with_deduplicate_strings(self) -> Self
Deduplicate strings while building the array
This will potentially decrease the memory usage if the array have repeated strings It will also increase the time to build the array as it needs to hash the strings
Sourcepub fn append_block(&mut self, buffer: Buffer) -> u32
pub fn append_block(&mut self, buffer: Buffer) -> u32
Append a new data block returning the new block offset
Note: this will first flush any in-progress block
This allows appending views from blocks added using Self::append_block
. See
Self::append_value
for appending individual values
let mut builder = StringViewBuilder::new();
let block = builder.append_block(b"helloworldbingobongo".into());
builder.try_append_view(block, 0, 5).unwrap();
builder.try_append_view(block, 5, 5).unwrap();
builder.try_append_view(block, 10, 5).unwrap();
builder.try_append_view(block, 15, 5).unwrap();
builder.try_append_view(block, 0, 15).unwrap();
let array = builder.finish();
let actual: Vec<_> = array.iter().flatten().collect();
let expected = &["hello", "world", "bingo", "bongo", "helloworldbingo"];
assert_eq!(actual, expected);
Sourcepub unsafe fn append_view_unchecked(
&mut self,
block: u32,
offset: u32,
len: u32,
)
pub unsafe fn append_view_unchecked( &mut self, block: u32, offset: u32, len: u32, )
Append a view of the given block
, offset
and length
§Safety
(1) The block must have been added using Self::append_block
(2) The range offset..offset+length
must be within the bounds of the block
(3) The data in the block must be valid of type T
Sourcepub fn try_append_view(
&mut self,
block: u32,
offset: u32,
len: u32,
) -> Result<(), ArrowError>
pub fn try_append_view( &mut self, block: u32, offset: u32, len: u32, ) -> Result<(), ArrowError>
Try to append a view of the given block
, offset
and length
Sourcefn flush_in_progress(&mut self)
fn flush_in_progress(&mut self)
Flushes the in progress block if any
Sourcefn push_completed(&mut self, block: Buffer)
fn push_completed(&mut self, block: Buffer)
Append a block to self.completed
, checking for overflow
Sourcepub fn get_value(&self, index: usize) -> &[u8] ⓘ
pub fn get_value(&self, index: usize) -> &[u8] ⓘ
Returns the value at the given index
Useful if we want to know what value has been inserted to the builder
The index has to be smaller than self.len()
, otherwise it will panic
Sourcepub fn append_value(&mut self, value: impl AsRef<T::Native>)
pub fn append_value(&mut self, value: impl AsRef<T::Native>)
Appends a value into the builder
§Panics
Panics if
- String buffer count exceeds
u32::MAX
- String length exceeds
u32::MAX
Sourcepub fn append_option(&mut self, value: Option<impl AsRef<T::Native>>)
pub fn append_option(&mut self, value: Option<impl AsRef<T::Native>>)
Append an Option
value into the builder
Sourcepub fn append_null(&mut self)
pub fn append_null(&mut self)
Append a null value into the builder
Sourcepub fn finish(&mut self) -> GenericByteViewArray<T>
pub fn finish(&mut self) -> GenericByteViewArray<T>
Builds the GenericByteViewArray
and reset this builder
Sourcepub fn finish_cloned(&self) -> GenericByteViewArray<T>
pub fn finish_cloned(&self) -> GenericByteViewArray<T>
Builds the GenericByteViewArray
without resetting the builder
Sourcepub fn validity_slice(&self) -> Option<&[u8]>
pub fn validity_slice(&self) -> Option<&[u8]>
Returns the current null buffer as a slice
Sourcepub fn allocated_size(&self) -> usize
pub fn allocated_size(&self) -> usize
Return the allocated size of this builder in bytes, useful for memory accounting.
Trait Implementations§
Source§impl<T: ByteViewType + ?Sized> ArrayBuilder for GenericByteViewBuilder<T>
impl<T: ByteViewType + ?Sized> ArrayBuilder for GenericByteViewBuilder<T>
Source§fn finish_cloned(&self) -> ArrayRef
fn finish_cloned(&self) -> ArrayRef
Source§fn as_any_mut(&mut self) -> &mut dyn Any
fn as_any_mut(&mut self) -> &mut dyn Any
Any
reference. Read moreSource§impl<T: ByteViewType + ?Sized> Debug for GenericByteViewBuilder<T>
impl<T: ByteViewType + ?Sized> Debug for GenericByteViewBuilder<T>
Source§impl<T: ByteViewType + ?Sized> Default for GenericByteViewBuilder<T>
impl<T: ByteViewType + ?Sized> Default for GenericByteViewBuilder<T>
Source§impl<T: ByteViewType + ?Sized, V: AsRef<T::Native>> Extend<Option<V>> for GenericByteViewBuilder<T>
impl<T: ByteViewType + ?Sized, V: AsRef<T::Native>> Extend<Option<V>> for GenericByteViewBuilder<T>
Source§fn extend<I: IntoIterator<Item = Option<V>>>(&mut self, iter: I)
fn extend<I: IntoIterator<Item = Option<V>>>(&mut self, iter: I)
Source§fn extend_one(&mut self, item: A)
fn extend_one(&mut self, item: A)
extend_one
)Source§fn extend_reserve(&mut self, additional: usize)
fn extend_reserve(&mut self, additional: usize)
extend_one
)