pyarrow.Codec

class pyarrow.Codec(unicode compression, compression_level=None)

Bases: _Weakrefable

Compression codec.

Parameters:
compressionstr

Type of compression codec to initialize, valid values are: ‘gzip’, ‘bz2’, ‘brotli’, ‘lz4’ (or ‘lz4_frame’), ‘lz4_raw’, ‘zstd’ and ‘snappy’.

compression_levelint, None

Optional parameter specifying how aggressively to compress. The possible ranges and effect of this parameter depend on the specific codec chosen. Higher values compress more but typically use more resources (CPU/RAM). Some codecs support negative values.

gzip

The compression_level maps to the memlevel parameter of deflateInit2. Higher levels use more RAM but are faster and should have higher compression ratios.

bz2

The compression level maps to the blockSize100k parameter of the BZ2_bzCompressInit function. Higher levels use more RAM but are faster and should have higher compression ratios.

brotli

The compression level maps to the BROTLI_PARAM_QUALITY parameter. Higher values are slower and should have higher compression ratios.

lz4/lz4_frame/lz4_raw

The compression level parameter is not supported and must be None

zstd

The compression level maps to the compressionLevel parameter of ZSTD_initCStream. Negative values are supported. Higher values are slower and should have higher compression ratios.

snappy

The compression level parameter is not supported and must be None

Raises:
ValueError

If invalid compression value is passed.

Examples

>>> import pyarrow as pa
>>> pa.Codec.is_available('gzip')
True
>>> codec = pa.Codec('gzip')
>>> codec.name
'gzip'
>>> codec.compression_level
9
__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

compress(self, buf[, asbytes, memory_pool])

Compress data from buffer-like object.

decompress(self, buf[, decompressed_size, ...])

Decompress data from buffer-like object.

default_compression_level(unicode compression)

Returns the compression level that Arrow will use for the codec if None is specified.

detect(path)

Detect and instantiate compression codec based on file extension.

is_available(unicode compression)

Returns whether the compression support has been built and enabled.

maximum_compression_level(unicode compression)

Returns the largest valid value for the compression level

minimum_compression_level(unicode compression)

Returns the smallest valid value for the compression level

supports_compression_level(unicode compression)

Returns true if the compression level parameter is supported for the given codec.

Attributes

compression_level

Returns the compression level parameter of the codec

name

Returns the name of the codec

compress(self, buf, asbytes=False, memory_pool=None)

Compress data from buffer-like object.

Parameters:
bufpyarrow.Buffer, bytes, or other object supporting buffer protocol
asbytesbool, default False

Return result as Python bytes object, otherwise Buffer

memory_poolMemoryPool, default None

Memory pool to use for buffer allocations, if any

Returns:
compressedpyarrow.Buffer or bytes (if asbytes=True)
compression_level

Returns the compression level parameter of the codec

decompress(self, buf, decompressed_size=None, asbytes=False, memory_pool=None)

Decompress data from buffer-like object.

Parameters:
bufpyarrow.Buffer, bytes, or memoryview-compatible object
decompressed_sizeint, default None

Size of the decompressed result

asbytesbool, default False

Return result as Python bytes object, otherwise Buffer

memory_poolMemoryPool, default None

Memory pool to use for buffer allocations, if any.

Returns:
uncompressedpyarrow.Buffer or bytes (if asbytes=True)
static default_compression_level(unicode compression)

Returns the compression level that Arrow will use for the codec if None is specified.

Parameters:
compressionstr

Type of compression codec, refer to Codec docstring for a list of supported ones.

static detect(path)

Detect and instantiate compression codec based on file extension.

Parameters:
pathstr, path-like

File-path to detect compression from.

Returns:
Codec
Raises:
TypeError

If the passed value is not path-like.

ValueError

If the compression can’t be detected from the path.

static is_available(unicode compression)

Returns whether the compression support has been built and enabled.

Parameters:
compressionstr

Type of compression codec, refer to Codec docstring for a list of supported ones.

Returns:
bool
static maximum_compression_level(unicode compression)

Returns the largest valid value for the compression level

Parameters:
compressionstr

Type of compression codec, refer to Codec docstring for a list of supported ones.

static minimum_compression_level(unicode compression)

Returns the smallest valid value for the compression level

Parameters:
compressionstr

Type of compression codec, refer to Codec docstring for a list of supported ones.

name

Returns the name of the codec

static supports_compression_level(unicode compression)

Returns true if the compression level parameter is supported for the given codec.

Parameters:
compressionstr

Type of compression codec, refer to Codec docstring for a list of supported ones.