pyarrow.dataset.ParquetFragmentScanOptions#

class pyarrow.dataset.ParquetFragmentScanOptions(bool use_buffered_stream=False, *, buffer_size=8192, bool pre_buffer=False, thrift_string_size_limit=None, thrift_container_size_limit=None)#

Bases: FragmentScanOptions

Scan-specific options for Parquet fragments.

Parameters:
use_buffered_streambool, default False

Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default.

buffer_sizeint, default 8192

Size of buffered stream, if enabled. Default is 8KB.

pre_bufferbool, default False

If enabled, pre-buffer the raw Parquet data instead of issuing one read per column chunk. This can improve performance on high-latency filesystems.

thrift_string_size_limitint, default None

If not None, override the maximum total string size allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.

thrift_container_size_limitint, default None

If not None, override the maximum total size of containers allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.

__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

equals(self, ParquetFragmentScanOptions other)

Parameters:

Attributes

buffer_size

pre_buffer

thrift_container_size_limit

thrift_string_size_limit

type_name

use_buffered_stream

buffer_size#
equals(self, ParquetFragmentScanOptions other)#
Parameters:
otherpyarrow.dataset.ParquetFragmentScanOptions
Returns:
bool
pre_buffer#
thrift_container_size_limit#
thrift_string_size_limit#
type_name#
use_buffered_stream#