jemalloc MemoryPoolΒΆ

Arrow’s default MemoryPool uses the system’s allocator through the POSIX APIs. Although this already provides aligned allocation, the POSIX interface doesn’t support aligned reallocation. The default reallocation strategy is to allocate a new region, copy over the old data and free the previous region. Using jemalloc we can simply extend the existing memory allocation to the requested size. While this may still be linear in the size of allocated memory, it is magnitudes faster as only the page mapping in the kernel is touched, not the actual data.

The jemalloc-based allocator is not enabled by default to allow the use of the system allocator and/or other allocators like tcmalloc. You can either explicitly make it the default allocator or pass it only to single operations.

import pyarrow as pa

jemalloc_pool = pyarrow.jemalloc_memory_pool()

# Explicitly use jemalloc for allocating memory for an Arrow Table object
array = pa.Array.from_pylist([1, 2, 3], memory_pool=jemalloc_pool)

# Set the global pool
pyarrow.set_memory_pool(jemalloc_pool)
# This operation has no explicit MemoryPool specified and will thus will
# also use jemalloc for its allocations.
array = pa.Array.from_pylist([1, 2, 3])