pyarrow.dataset.ParquetFragmentScanOptions¶

class pyarrow.dataset.ParquetFragmentScanOptions(bool use_buffered_stream=False, buffer_size=8192, bool pre_buffer=False, bool enable_parallel_column_conversion=False)¶

Bases: pyarrow._dataset.FragmentScanOptions

Scan-specific options for Parquet fragments.

Parameters

use_buffered_streambool, default False: Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default.
buffer_sizeint, default 8192: Size of buffered stream, if enabled. Default is 8KB.
pre_bufferbool, default False: If enabled, pre-buffer the raw Parquet data instead of issuing one read per column chunk. This can improve performance on high-latency filesystems.
enable_parallel_column_conversionbool, default False: EXPERIMENTAL: Parallelize conversion across columns. This option is ignored if a scan is already parallelized across input files to avoid thread contention. This option will be removed after support is added for simultaneous parallelization across files and columns.

__init__(*args, **kwargs)¶

Methods

`__init__`(args, *kwargs)
`equals`(self, ParquetFragmentScanOptions other)

Attributes

`buffer_size`
`enable_parallel_column_conversion`
`pre_buffer`
`type_name`
`use_buffered_stream`

buffer_size¶

enable_parallel_column_conversion¶

equals(self, ParquetFragmentScanOptions other)¶

pre_buffer¶

type_name¶

use_buffered_stream¶

pyarrow.dataset.ParquetReadOptions

pyarrow.dataset.Partitioning