pyarrow.dataset.ParquetFragmentScanOptions#
- class pyarrow.dataset.ParquetFragmentScanOptions(bool use_buffered_stream=False, *, buffer_size=8192, bool pre_buffer=True, thrift_string_size_limit=None, thrift_container_size_limit=None, decryption_config=None)#
- Bases: - FragmentScanOptions- Scan-specific options for Parquet fragments. - Parameters:
- use_buffered_streambool, default False
- Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default. 
- buffer_sizeint, default 8192
- Size of buffered stream, if enabled. Default is 8KB. 
- pre_bufferbool, default True
- If enabled, pre-buffer the raw Parquet data instead of issuing one read per column chunk. This can improve performance on high-latency filesystems (e.g. S3, GCS) by coalesing and issuing file reads in parallel using a background I/O thread pool. Set to False if you want to prioritize minimal memory usage over maximum speed. 
- thrift_string_size_limitint, defaultNone
- If not None, override the maximum total string size allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files. 
- thrift_container_size_limitint, defaultNone
- If not None, override the maximum total size of containers allocated when decoding Thrift structures. The default limit should be sufficient for most Parquet files. 
- decryption_configpyarrow.dataset.ParquetDecryptionConfig, defaultNone
- If not None, use the provided ParquetDecryptionConfig to decrypt the Parquet file. 
 
- use_buffered_streambool, default 
 - __init__(*args, **kwargs)#
 - Methods - __init__(*args, **kwargs)- equals(self, ParquetFragmentScanOptions other)- Parameters:
 - Attributes - buffer_size#
 - equals(self, ParquetFragmentScanOptions other)#
- Parameters:
- Returns:
 
 - parquet_decryption_config#
 - pre_buffer#
 - thrift_container_size_limit#
 - thrift_string_size_limit#
 - type_name#
 - use_buffered_stream#