pyarrow.dataset.ParquetFragmentScanOptions#
- class pyarrow.dataset.ParquetFragmentScanOptions(bool use_buffered_stream=False, *, buffer_size=8192, bool pre_buffer=True, cache_options=None, thrift_string_size_limit=None, thrift_container_size_limit=None, decryption_config=None, bool page_checksum_verification=False)#
Bases:
FragmentScanOptionsScan-specific options for Parquet fragments.
- Parameters:
- use_buffered_streambool, default
False Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default.
- buffer_size
int, default 8192 Size of buffered stream, if enabled. Default is 8KB.
- pre_bufferbool, default
True If enabled, pre-buffer the raw Parquet data instead of issuing one read per column chunk. This can improve performance on high-latency filesystems (e.g. S3, GCS) by coalescing and issuing file reads in parallel using a background I/O thread pool. Set to False if you want to prioritize minimal memory usage over maximum speed.
- cache_options
pyarrow.CacheOptions, defaultNone Cache options used when pre_buffer is enabled. The default values should be good for most use cases. You may want to adjust these for example if you have exceptionally high latency to the file system.
- thrift_string_size_limit
int, defaultNone If not None, override the maximum total string size allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.
- thrift_container_size_limit
int, defaultNone If not None, override the maximum total size of containers allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.
- decryption_config
pyarrow.dataset.ParquetDecryptionConfig, defaultNone If not None, use the provided ParquetDecryptionConfig to decrypt the Parquet file.
- page_checksum_verificationbool, default
False If True, verify the page checksum for each page read from the file.
- use_buffered_streambool, default
- __init__(*args, **kwargs)#
Methods
__init__(*args, **kwargs)equals(self, ParquetFragmentScanOptions other)- Parameters:
Attributes
- buffer_size#
- cache_options#
- equals(self, ParquetFragmentScanOptions other)#
- Parameters:
- Returns:
- page_checksum_verification#
- parquet_decryption_config#
- pre_buffer#
- thrift_container_size_limit#
- thrift_string_size_limit#
- type_name#
- use_buffered_stream#