pyarrow.dataset.ParquetFragmentScanOptions#
- class pyarrow.dataset.ParquetFragmentScanOptions(bool use_buffered_stream=False, *, buffer_size=8192, bool pre_buffer=True, thrift_string_size_limit=None, thrift_container_size_limit=None, decryption_config=None)#
Bases:
FragmentScanOptions
Scan-specific options for Parquet fragments.
- Parameters:
- use_buffered_streambool, default
False
Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default.
- buffer_size
int
, default 8192 Size of buffered stream, if enabled. Default is 8KB.
- pre_bufferbool, default
True
If enabled, pre-buffer the raw Parquet data instead of issuing one read per column chunk. This can improve performance on high-latency filesystems (e.g. S3, GCS) by coalesing and issuing file reads in parallel using a background I/O thread pool. Set to False if you want to prioritize minimal memory usage over maximum speed.
- thrift_string_size_limit
int
, defaultNone
If not None, override the maximum total string size allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.
- thrift_container_size_limit
int
, defaultNone
If not None, override the maximum total size of containers allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.
- decryption_config
pyarrow.dataset.ParquetDecryptionConfig
, defaultNone
If not None, use the provided ParquetDecryptionConfig to decrypt the Parquet file.
- use_buffered_streambool, default
- __init__(*args, **kwargs)#
Methods
__init__
(*args, **kwargs)equals
(self, ParquetFragmentScanOptions other)- Parameters:
Attributes
- buffer_size#
- equals(self, ParquetFragmentScanOptions other)#
- Parameters:
- Returns:
- parquet_decryption_config#
- pre_buffer#
- thrift_container_size_limit#
- thrift_string_size_limit#
- type_name#
- use_buffered_stream#