pyarrow.dataset.ParquetFragmentScanOptions¶
- class pyarrow.dataset.ParquetFragmentScanOptions(bool use_buffered_stream=False, buffer_size=8192, bool pre_buffer=False, bool enable_parallel_column_conversion=False)¶
- Bases: - pyarrow._dataset.FragmentScanOptions- Scan-specific options for Parquet fragments. - Parameters
- use_buffered_streambool, default False
- Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default. 
- buffer_sizeint, default 8192
- Size of buffered stream, if enabled. Default is 8KB. 
- pre_bufferbool, default False
- If enabled, pre-buffer the raw Parquet data instead of issuing one read per column chunk. This can improve performance on high-latency filesystems. 
- enable_parallel_column_conversionbool, default False
- EXPERIMENTAL: Parallelize conversion across columns. This option is ignored if a scan is already parallelized across input files to avoid thread contention. This option will be removed after support is added for simultaneous parallelization across files and columns. 
 
- use_buffered_streambool, default 
 - __init__(*args, **kwargs)¶
 - Methods - __init__(*args, **kwargs)- equals(self, ParquetFragmentScanOptions other)- Attributes - buffer_size¶
 - enable_parallel_column_conversion¶
 - equals(self, ParquetFragmentScanOptions other)¶
 - pre_buffer¶
 - type_name¶
 - use_buffered_stream¶
 
