pyarrow.Codec

class pyarrow.Codec(unicode compression, compression_level=None)

Bases: pyarrow.lib._Weakrefable

Compression codec.

Parameters
  • compression (str) – Type of compression codec to initialize, valid values are: ‘gzip’, ‘bz2’, ‘brotli’, ‘lz4’ (or ‘lz4_frame’), ‘lz4_raw’, ‘zstd’ and ‘snappy’.

  • compression_level (int, None) –

    Optional parameter specifying how aggressively to compress. The possible ranges and effect of this parameter depend on the specific codec chosen. Higher values compress more but typically use more resources (CPU/RAM). Some codecs support negative values.

    gzip

    The compression_level maps to the memlevel parameter of deflateInit2. Higher levels use more RAM but are faster and should have higher compression ratios.

    bz2

    The compression level maps to the blockSize100k parameter of the BZ2_bzCompressInit function. Higher levels use more RAM but are faster and should have higher compression ratios.

    brotli

    The compression level maps to the BROTLI_PARAM_QUALITY parameter. Higher values are slower and should have higher compression ratios.

    lz4/lz4_frame/lz4_raw

    The compression level parameter is not supported and must be None

    zstd

    The compression level maps to the compressionLevel parameter of ZSTD_initCStream. Negative values are supported. Higher values are slower and should have higher compression ratios.

    snappy

    The compression level parameter is not supported and must be None

Raises

ValueError – If invalid compression value is passed.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(*args, **kwargs)

Initialize self.

compress(self, buf[, asbytes, memory_pool])

Compress data from buffer-like object.

decompress(self, buf[, decompressed_size, …])

Decompress data from buffer-like object.

default_compression_level(unicode compression)

Returns the compression level that Arrow will use for the codec if None is specified.

detect(path)

Detect and instantiate compression codec based on file extension.

is_available(unicode compression)

Returns whether the compression support has been built and enabled.

maximum_compression_level(unicode compression)

Returns the largest valid value for the compression level

minimum_compression_level(unicode compression)

Returns the smallest valid value for the compression level

supports_compression_level(unicode compression)

Returns true if the compression level parameter is supported for the given codec.

Attributes

compression_level

Returns the compression level parameter of the codec

name

Returns the name of the codec

compress(self, buf, asbytes=False, memory_pool=None)

Compress data from buffer-like object.

Parameters
  • buf (pyarrow.Buffer, bytes, or other object supporting buffer protocol) –

  • asbytes (bool, default False) – Return result as Python bytes object, otherwise Buffer

  • memory_pool (MemoryPool, default None) – Memory pool to use for buffer allocations, if any

Returns

compressed (pyarrow.Buffer or bytes (if asbytes=True))

compression_level

Returns the compression level parameter of the codec

decompress(self, buf, decompressed_size=None, asbytes=False, memory_pool=None)

Decompress data from buffer-like object.

Parameters
  • buf (pyarrow.Buffer, bytes, or memoryview-compatible object) –

  • decompressed_size (int64_t, default None) – If not specified, will be computed if the codec is able to determine the uncompressed buffer size.

  • asbytes (boolean, default False) – Return result as Python bytes object, otherwise Buffer

  • memory_pool (MemoryPool, default None) – Memory pool to use for buffer allocations, if any.

Returns

uncompressed (pyarrow.Buffer or bytes (if asbytes=True))

static default_compression_level(unicode compression)

Returns the compression level that Arrow will use for the codec if None is specified.

static detect(path)

Detect and instantiate compression codec based on file extension.

Parameters

path (str, path-like) – File-path to detect compression from.

Raises
  • TypeError – If the passed value is not path-like.

  • ValueError – If the compression can’t be detected from the path.

Returns

Codec

static is_available(unicode compression)

Returns whether the compression support has been built and enabled.

Parameters

compression (str) – Type of compression codec, valid values are: gzip, bz2, brotli, lz4, zstd and snappy.

Returns

bool

static maximum_compression_level(unicode compression)

Returns the largest valid value for the compression level

static minimum_compression_level(unicode compression)

Returns the smallest valid value for the compression level

name

Returns the name of the codec

static supports_compression_level(unicode compression)

Returns true if the compression level parameter is supported for the given codec.