pyarrow.dataset.ParquetFragmentScanOptions#

class pyarrow.dataset.ParquetFragmentScanOptions(bool use_buffered_stream=False, *, buffer_size=8192, bool pre_buffer=True, thrift_string_size_limit=None, thrift_container_size_limit=None, decryption_config=None)#

Bases: FragmentScanOptions

Scan-specific options for Parquet fragments.

Parameters:

use_buffered_streambool, default False: Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default.
buffer_sizeint, default 8192: Size of buffered stream, if enabled. Default is 8KB.
pre_bufferbool, default True: If enabled, pre-buffer the raw Parquet data instead of issuing one read per column chunk. This can improve performance on high-latency filesystems (e.g. S3, GCS) by coalesing and issuing file reads in parallel using a background I/O thread pool. Set to False if you want to prioritize minimal memory usage over maximum speed.
thrift_string_size_limitint, default None: If not None, override the maximum total string size allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.
thrift_container_size_limitint, default None: If not None, override the maximum total size of containers allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.
decryption_configpyarrow.dataset.ParquetDecryptionConfig, default None: If not None, use the provided ParquetDecryptionConfig to decrypt the Parquet file.

__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

equals(self, ParquetFragmentScanOptions other)

Parameters:

Attributes

`buffer_size`
`parquet_decryption_config`
`pre_buffer`
`thrift_container_size_limit`
`thrift_string_size_limit`
`type_name`
`use_buffered_stream`

buffer_size#

equals(self, ParquetFragmentScanOptions other)#

Parameters:

otherpyarrow.dataset.ParquetFragmentScanOptions

Returns:

bool

parquet_decryption_config#

pre_buffer#

thrift_container_size_limit#

thrift_string_size_limit#

type_name#

use_buffered_stream#