pyarrow.dataset.ParquetFragmentScanOptions#
- class pyarrow.dataset.ParquetFragmentScanOptions(bool use_buffered_stream=False, *, buffer_size=8192, bool pre_buffer=True, cache_options=None, thrift_string_size_limit=None, thrift_container_size_limit=None, decryption_config=None, bool page_checksum_verification=False)#
Bases:
FragmentScanOptions
Scan-specific options for Parquet fragments.
- Parameters:
- use_buffered_streambool, default
False
Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default.
- buffer_size
int
, default 8192 Size of buffered stream, if enabled. Default is 8KB.
- pre_bufferbool, default
True
If enabled, pre-buffer the raw Parquet data instead of issuing one read per column chunk. This can improve performance on high-latency filesystems (e.g. S3, GCS) by coalescing and issuing file reads in parallel using a background I/O thread pool. Set to False if you want to prioritize minimal memory usage over maximum speed.
- cache_options
pyarrow.CacheOptions
, defaultNone
Cache options used when pre_buffer is enabled. The default values should be good for most use cases. You may want to adjust these for example if you have exceptionally high latency to the file system.
- thrift_string_size_limit
int
, defaultNone
If not None, override the maximum total string size allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.
- thrift_container_size_limit
int
, defaultNone
If not None, override the maximum total size of containers allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files.
- decryption_config
pyarrow.dataset.ParquetDecryptionConfig
, defaultNone
If not None, use the provided ParquetDecryptionConfig to decrypt the Parquet file.
- page_checksum_verificationbool, default
False
If True, verify the page checksum for each page read from the file.
- use_buffered_streambool, default
- __init__(*args, **kwargs)#
Methods
__init__
(*args, **kwargs)equals
(self, ParquetFragmentScanOptions other)- Parameters:
Attributes
- buffer_size#
- cache_options#
- equals(self, ParquetFragmentScanOptions other)#
- Parameters:
- Returns:
- page_checksum_verification#
- parquet_decryption_config#
- pre_buffer#
- thrift_container_size_limit#
- thrift_string_size_limit#
- type_name#
- use_buffered_stream#