pyarrow.csv.ReadOptions

class pyarrow.csv.ReadOptions(use_threads=None, *, block_size=None, skip_rows=None, column_names=None, autogenerate_column_names=None, encoding='utf8', skip_rows_after_names=None)

Bases: pyarrow.lib._Weakrefable

Options for reading CSV files.

Parameters
  • use_threads (bool, optional (default True)) – Whether to use multiple threads to accelerate reading

  • block_size (int, optional) – How much bytes to process at a time from the input stream. This will determine multi-threading granularity as well as the size of individual record batches or table chunks. Minimum valid value for block size is 1

  • skip_rows (int, optional (default 0)) – The number of rows to skip before the column names (if any) and the CSV data.

  • skip_rows_after_names (int, optional (default 0)) – The number of rows to skip after the column names. This number can be larger than the number of rows in one block, and empty rows are counted. The order of application is as follows: - skip_rows is applied (if non-zero); - column names aread (unless column_names is set); - skip_rows_after_names is applied (if non-zero).

  • column_names (list, optional) – The column names of the target table. If empty, fall back on autogenerate_column_names.

  • autogenerate_column_names (bool, optional (default False)) – Whether to autogenerate column names if column_names is empty. If true, column names will be of the form “f0”, “f1”… If false, column names will be read from the first CSV row after skip_rows.

  • encoding (str, optional (default 'utf8')) – The character encoding of the CSV data. Columns that cannot decode using this encoding can still be read as Binary.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(*args, **kwargs)

Initialize self.

equals(self, ReadOptions other)

validate(self)

Attributes

autogenerate_column_names

Whether to autogenerate column names if column_names is empty.

block_size

How much bytes to process at a time from the input stream.

column_names

The column names of the target table.

encoding

object

skip_rows

The number of rows to skip before the column names (if any) and the CSV data.

skip_rows_after_names

The number of rows to skip after the column names.

use_threads

Whether to use multiple threads to accelerate reading.

autogenerate_column_names

Whether to autogenerate column names if column_names is empty. If true, column names will be of the form “f0”, “f1”… If false, column names will be read from the first CSV row after skip_rows.

block_size

How much bytes to process at a time from the input stream. This will determine multi-threading granularity as well as the size of individual record batches or table chunks.

column_names

The column names of the target table. If empty, fall back on autogenerate_column_names.

encoding

object

Type

encoding

equals(self, ReadOptions other)
skip_rows

The number of rows to skip before the column names (if any) and the CSV data. See skip_rows_after_names for interaction description

skip_rows_after_names

The number of rows to skip after the column names. This number can be larger than the number of rows in one block, and empty rows are counted. The order of application is as follows: - skip_rows is applied (if non-zero); - column names aread (unless column_names is set); - skip_rows_after_names is applied (if non-zero).

use_threads

Whether to use multiple threads to accelerate reading.

validate(self)