Memory (management)

Buffers

class Buffer

Object containing a pointer to a piece of contiguous memory with a particular size.

Buffers have two related notions of length: size and capacity. Size is the number of bytes that might have valid data. Capacity is the number of bytes that were allocated for the buffer in total.

The Buffer base class does not own its memory, but subclasses often do.

The following invariant is always true: Size <= Capacity

Subclassed by arrow::cuda::CudaBuffer, arrow::MutableBuffer, arrow::py::NumPyBuffer, arrow::py::PyBuffer, arrow::py::PyForeignBuffer

Public Functions

Buffer(const uint8_t *data, int64_t size)

Construct from buffer and size without copying memory.

Note

The passed memory must be kept alive through some other means

Parameters
  • [in] data: a memory buffer

  • [in] size: buffer size

Buffer(util::string_view data)

Construct from string_view without copying memory.

Note

The memory viewed by data must not be deallocated in the lifetime of the Buffer; temporary rvalue strings must be stored in an lvalue somewhere

Parameters
  • [in] data: a string_view object

Buffer(const std::shared_ptr<Buffer> &parent, const int64_t offset, const int64_t size)

An offset into data that is owned by another buffer, but we want to be able to retain a valid pointer to it even after other shared_ptr’s to the parent buffer have been destroyed.

This method makes no assertions about alignment or padding of the buffer but in general we expected buffers to be aligned and padded to 64 bytes. In the future we might add utility methods to help determine if a buffer satisfies this contract.

std::string ToHexString()

Construct a new std::string with a hexadecimal representation of the buffer.

Return

std::string

bool Equals(const Buffer &other, int64_t nbytes) const

Return true if both buffers are the same size and contain the same bytes up to the number of compared bytes.

bool Equals(const Buffer &other) const

Return true if both buffers are the same size and contain the same bytes.

Status Copy(const int64_t start, const int64_t nbytes, MemoryPool *pool, std::shared_ptr<Buffer> *out) const

Copy a section of the buffer into a new Buffer.

Status Copy(const int64_t start, const int64_t nbytes, std::shared_ptr<Buffer> *out) const

Copy a section of the buffer using the default memory pool into a new Buffer.

void ZeroPadding()

Zero bytes in padding, i.e. bytes between size_ and capacity_.

std::string ToString() const

Copy buffer contents into a new std::string.

Return

std::string

Note

Can throw std::bad_alloc if buffer is large

operator util::string_view() const

View buffer contents as a util::string_view.

Return

util::string_view

const uint8_t *data() const

Return a pointer to the buffer’s data.

uint8_t *mutable_data()

Return a writable pointer to the buffer’s data.

The buffer has to be mutable. Otherwise, an assertion may be thrown or a null pointer may be returned.

int64_t size() const

Return the buffer’s size in bytes.

int64_t capacity() const

Return the buffer’s capacity (number of allocated bytes)

Public Static Functions

static Status FromString(const std::string &data, MemoryPool *pool, std::shared_ptr<Buffer> *out)

Construct a new buffer that owns its memory from a std::string.

Return

Status message

Parameters
  • [in] data: a std::string object

  • [in] pool: a memory pool

  • [out] out: the created buffer

static Status FromString(const std::string &data, std::shared_ptr<Buffer> *out)

Construct a new buffer that owns its memory from a std::string using the default memory pool.

static std::shared_ptr<Buffer> FromString(std::string &&data)

Construct an immutable buffer that takes ownership of the contents of an std::string.

Return

a new Buffer instance

Parameters
  • [in] data: an rvalue-reference of a string

template<typename T, typename SizeType = int64_t>
static std::shared_ptr<Buffer> Wrap(const T *data, SizeType length)

Create buffer referencing typed memory with some length without copying.

Return

a new shared_ptr<Buffer>

Parameters
  • [in] data: the typed memory as C array

  • [in] length: the number of values in the array

template<typename T>
static std::shared_ptr<Buffer> Wrap(const std::vector<T> &data)

Create buffer referencing std::vector with some length without copying.

Return

a new shared_ptr<Buffer>

Parameters
  • [in] data: the vector to be referenced. If this vector is changed, the buffer may become invalid

class MutableBuffer : public arrow::Buffer

A Buffer whose contents can be mutated.

May or may not own its data.

Subclassed by arrow::cuda::CudaHostBuffer, arrow::ResizableBuffer

Public Static Functions

template<typename T, typename SizeType = int64_t>
static std::shared_ptr<Buffer> Wrap(T *data, SizeType length)

Create buffer referencing typed memory with some length.

Return

a new shared_ptr<Buffer>

Parameters
  • [in] data: the typed memory as C array

  • [in] length: the number of values in the array

class ResizableBuffer : public arrow::MutableBuffer

A mutable buffer that can be resized.

Public Functions

virtual Status Resize(const int64_t new_size, bool shrink_to_fit = true) = 0

Change buffer reported size to indicated size, allocating memory if necessary.

This will ensure that the capacity of the buffer is a multiple of 64 bytes as defined in Layout.md. Consider using ZeroPadding afterwards, to conform to the Arrow layout specification.

Parameters
  • new_size: The new size for the buffer.

  • shrink_to_fit: Whether to shrink the capacity if new size < current size

virtual Status Reserve(const int64_t new_capacity) = 0

Ensure that buffer has enough memory allocated to fit the indicated capacity (and meets the 64 byte padding requirement in Layout.md).

It does not change buffer’s reported size and doesn’t zero the padding.

Memory Pools

MemoryPool *arrow::default_memory_pool()

Return the process-wide default memory pool.

class MemoryPool

Base class for memory allocation.

Besides tracking the number of allocated bytes, the allocator also should take care of the required 64-byte alignment.

Subclassed by arrow::LoggingMemoryPool, arrow::ProxyMemoryPool, arrow::STLMemoryPool< Allocator >

Public Functions

virtual Status Allocate(int64_t size, uint8_t **out) = 0

Allocate a new memory region of at least size bytes.

The allocated region shall be 64-byte aligned.

virtual Status Reallocate(int64_t old_size, int64_t new_size, uint8_t **ptr) = 0

Resize an already allocated memory section.

As by default most default allocators on a platform don’t support aligned reallocation, this function can involve a copy of the underlying data.

virtual void Free(uint8_t *buffer, int64_t size) = 0

Free an allocated region.

Parameters
  • buffer: Pointer to the start of the allocated memory region

  • size: Allocated size located at buffer. An allocator implementation may use this for tracking the amount of allocated bytes as well as for faster deallocation if supported by its backend.

virtual int64_t bytes_allocated() const = 0

The number of bytes that were allocated and not yet free’d through this allocator.

virtual int64_t max_memory() const

Return peak memory allocation in this memory pool.

Return

Maximum bytes allocated. If not known (or not implemented), returns -1

Public Static Functions

static std::unique_ptr<MemoryPool> CreateDefault()

EXPERIMENTAL. Create a new instance of the default MemoryPool.

class LoggingMemoryPool : public arrow::MemoryPool

Public Functions

Status Allocate(int64_t size, uint8_t **out)

Allocate a new memory region of at least size bytes.

The allocated region shall be 64-byte aligned.

Status Reallocate(int64_t old_size, int64_t new_size, uint8_t **ptr)

Resize an already allocated memory section.

As by default most default allocators on a platform don’t support aligned reallocation, this function can involve a copy of the underlying data.

void Free(uint8_t *buffer, int64_t size)

Free an allocated region.

Parameters
  • buffer: Pointer to the start of the allocated memory region

  • size: Allocated size located at buffer. An allocator implementation may use this for tracking the amount of allocated bytes as well as for faster deallocation if supported by its backend.

int64_t bytes_allocated() const

The number of bytes that were allocated and not yet free’d through this allocator.

int64_t max_memory() const

Return peak memory allocation in this memory pool.

Return

Maximum bytes allocated. If not known (or not implemented), returns -1

class ProxyMemoryPool : public arrow::MemoryPool

Derived class for memory allocation.

Tracks the number of bytes and maximum memory allocated through its direct calls. Actual allocation is delegated to MemoryPool class.

Public Functions

Status Allocate(int64_t size, uint8_t **out)

Allocate a new memory region of at least size bytes.

The allocated region shall be 64-byte aligned.

Status Reallocate(int64_t old_size, int64_t new_size, uint8_t **ptr)

Resize an already allocated memory section.

As by default most default allocators on a platform don’t support aligned reallocation, this function can involve a copy of the underlying data.

void Free(uint8_t *buffer, int64_t size)

Free an allocated region.

Parameters
  • buffer: Pointer to the start of the allocated memory region

  • size: Allocated size located at buffer. An allocator implementation may use this for tracking the amount of allocated bytes as well as for faster deallocation if supported by its backend.

int64_t bytes_allocated() const

The number of bytes that were allocated and not yet free’d through this allocator.

int64_t max_memory() const

Return peak memory allocation in this memory pool.

Return

Maximum bytes allocated. If not known (or not implemented), returns -1

Allocation Functions

These functions allocate a buffer from a particular memory pool.

Status arrow::AllocateBuffer(MemoryPool *pool, const int64_t size, std::shared_ptr<Buffer> *out)

Allocate a fixed size mutable buffer from a memory pool, zero its padding.

Return

Status message

Parameters
  • [in] pool: a memory pool

  • [in] size: size of buffer to allocate

  • [out] out: the allocated buffer (contains padding)

Status arrow::AllocateBuffer(MemoryPool *pool, const int64_t size, std::unique_ptr<Buffer> *out)

Allocate a fixed size mutable buffer from a memory pool, zero its padding.

Return

Status message

Parameters
  • [in] pool: a memory pool

  • [in] size: size of buffer to allocate

  • [out] out: the allocated buffer (contains padding)

Status arrow::AllocateBuffer(const int64_t size, std::shared_ptr<Buffer> *out)

Allocate a fixed-size mutable buffer from the default memory pool.

Return

Status message

Parameters
  • [in] size: size of buffer to allocate

  • [out] out: the allocated buffer (contains padding)

Status arrow::AllocateBuffer(const int64_t size, std::unique_ptr<Buffer> *out)

Allocate a fixed-size mutable buffer from the default memory pool.

Return

Status message

Parameters
  • [in] size: size of buffer to allocate

  • [out] out: the allocated buffer (contains padding)

Status arrow::AllocateResizableBuffer(MemoryPool *pool, const int64_t size, std::shared_ptr<ResizableBuffer> *out)

Allocate a resizeable buffer from a memory pool, zero its padding.

Return

Status message

Parameters
  • [in] pool: a memory pool

  • [in] size: size of buffer to allocate

  • [out] out: the allocated buffer

Status arrow::AllocateResizableBuffer(MemoryPool *pool, const int64_t size, std::unique_ptr<ResizableBuffer> *out)

Allocate a resizeable buffer from a memory pool, zero its padding.

Return

Status message

Parameters
  • [in] pool: a memory pool

  • [in] size: size of buffer to allocate

  • [out] out: the allocated buffer

Status arrow::AllocateResizableBuffer(const int64_t size, std::shared_ptr<ResizableBuffer> *out)

Allocate a resizeable buffer from the default memory pool.

Return

Status message

Parameters
  • [in] size: size of buffer to allocate

  • [out] out: the allocated buffer

Status arrow::AllocateResizableBuffer(const int64_t size, std::unique_ptr<ResizableBuffer> *out)

Allocate a resizeable buffer from the default memory pool.

Return

Status message

Parameters
  • [in] size: size of buffer to allocate

  • [out] out: the allocated buffer

Status arrow::AllocateBitmap(MemoryPool *pool, int64_t length, std::shared_ptr<Buffer> *out)

Allocate a bitmap buffer from a memory pool no guarantee on values is provided.

Return

Status message

Parameters
  • [in] pool: memory pool to allocate memory from

  • [in] length: size in bits of bitmap to allocate

  • [out] out: the resulting buffer

Status arrow::AllocateEmptyBitmap(MemoryPool *pool, int64_t length, std::shared_ptr<Buffer> *out)

Allocate a zero-initialized bitmap buffer from a memory pool.

Return

Status message

Parameters
  • [in] pool: memory pool to allocate memory from

  • [in] length: size in bits of bitmap to allocate

  • [out] out: the resulting buffer (zero-initialized).

Status arrow::AllocateEmptyBitmap(int64_t length, std::shared_ptr<Buffer> *out)

Allocate a zero-initialized bitmap buffer from the default memory pool.

Return

Status message

Parameters
  • [in] length: size in bits of bitmap to allocate

  • [out] out: the resulting buffer

Status arrow::ConcatenateBuffers(const BufferVector &buffers, MemoryPool *pool, std::shared_ptr<Buffer> *out)

Concatenate multiple buffers into a single buffer.

Return

Status

Parameters
  • [in] buffers: to be concatenated

  • [in] pool: memory pool to allocate the new buffer from

  • [out] out: the concatenated buffer

Slicing

static std::shared_ptr<Buffer> arrow::SliceBuffer(const std::shared_ptr<Buffer> &buffer, const int64_t offset, const int64_t length)

Construct a view on a buffer at the given offset and length.

This function cannot fail and does not check for errors (except in debug builds)

static std::shared_ptr<Buffer> arrow::SliceBuffer(const std::shared_ptr<Buffer> &buffer, const int64_t offset)

Construct a view on a buffer at the given offset, up to the buffer’s end.

This function cannot fail and does not check for errors (except in debug builds)

std::shared_ptr<Buffer> arrow::SliceMutableBuffer(const std::shared_ptr<Buffer> &buffer, const int64_t offset, const int64_t length)

Like SliceBuffer, but construct a mutable buffer slice.

If the parent buffer is not mutable, behavior is undefined (it may abort in debug builds).

static std::shared_ptr<Buffer> arrow::SliceMutableBuffer(const std::shared_ptr<Buffer> &buffer, const int64_t offset)

Like SliceBuffer, but construct a mutable buffer slice.

If the parent buffer is not mutable, behavior is undefined (it may abort in debug builds).

Buffer Builders

class BufferBuilder

A class for incrementally building a contiguous chunk of in-memory data.

Public Functions

Status Resize(const int64_t new_capacity, bool shrink_to_fit = true)

Resize the buffer to the nearest multiple of 64 bytes.

Return

Status

Parameters
  • new_capacity: the new capacity of the of the builder. Will be rounded up to a multiple of 64 bytes for padding

  • shrink_to_fit: if new capacity is smaller than the existing size, reallocate internal buffer. Set to false to avoid reallocations when shrinking the builder.

Status Reserve(const int64_t additional_bytes)

Ensure that builder can accommodate the additional number of bytes without the need to perform allocations.

Return

Status

Parameters
  • [in] additional_bytes: number of additional bytes to make space for

Status Append(const void *data, const int64_t length)

Append the given data to the buffer.

The buffer is automatically expanded if necessary.

Status Append(const int64_t num_copies, uint8_t value)

Append copies of a value to the buffer.

The buffer is automatically expanded if necessary.

Status Finish(std::shared_ptr<Buffer> *out, bool shrink_to_fit = true)

Return result of builder as a Buffer object.

The builder is reset and can be reused afterwards.

Return

Status

Parameters
  • [out] out: the finalized Buffer object

  • shrink_to_fit: if the buffer size is smaller than its capacity, reallocate to fit more tightly in memory. Set to false to avoid a reallocation, at the expense of potentially more memory consumption.

void Rewind(int64_t position)

Set size to a smaller value without modifying builder contents.

For reusable BufferBuilder classes

Parameters
  • [in] position: must be non-negative and less than or equal to the current length()

Public Static Functions

static int64_t GrowByFactor(int64_t current_capacity, int64_t new_capacity)

Return a capacity expanded by an unspecified growth factor.

template<typename T, typename Enable = void>
class TypedBufferBuilder

STL Integration

template<class T>
class stl_allocator

A STL allocator delegating allocations to a Arrow MemoryPool.

Public Functions

stl_allocator()

Construct an allocator from the default MemoryPool.

stl_allocator(MemoryPool *pool)

Construct an allocator from the given MemoryPool.

template<typename Allocator = std::allocator<uint8_t>>
class STLMemoryPool : public arrow::MemoryPool

A MemoryPool implementation delegating allocations to a STL allocator.

Note that STL allocators don’t provide a resizing operation, and therefore any buffer resizes will do a full reallocation and copy.

Public Functions

STLMemoryPool(const Allocator &alloc)

Construct a memory pool from the given allocator.

Status Allocate(int64_t size, uint8_t **out)

Allocate a new memory region of at least size bytes.

The allocated region shall be 64-byte aligned.

Status Reallocate(int64_t old_size, int64_t new_size, uint8_t **ptr)

Resize an already allocated memory section.

As by default most default allocators on a platform don’t support aligned reallocation, this function can involve a copy of the underlying data.

void Free(uint8_t *buffer, int64_t size)

Free an allocated region.

Parameters
  • buffer: Pointer to the start of the allocated memory region

  • size: Allocated size located at buffer. An allocator implementation may use this for tracking the amount of allocated bytes as well as for faster deallocation if supported by its backend.

int64_t bytes_allocated() const

The number of bytes that were allocated and not yet free’d through this allocator.

int64_t max_memory() const

Return peak memory allocation in this memory pool.

Return

Maximum bytes allocated. If not known (or not implemented), returns -1