pyarrow.csv.ConvertOptions

class pyarrow.csv.ConvertOptions(check_utf8=None, *, column_types=None, null_values=None, true_values=None, false_values=None, decimal_point=None, strings_can_be_null=None, quoted_strings_can_be_null=None, include_columns=None, include_missing_columns=None, auto_dict_encode=None, auto_dict_max_cardinality=None, timestamp_parsers=None)

Bases: pyarrow.lib._Weakrefable

Options for converting CSV data.

Parameters
  • check_utf8 (bool, optional (default True)) – Whether to check UTF8 validity of string columns.

  • column_types (pa.Schema or dict, optional) – Explicitly map column names to column types. Passing this argument disables type inference on the defined columns.

  • null_values (list, optional) – A sequence of strings that denote nulls in the data (defaults are appropriate in most cases). Note that by default, string columns are not checked for null values. To enable null checking for those, specify strings_can_be_null=True.

  • true_values (list, optional) – A sequence of strings that denote true booleans in the data (defaults are appropriate in most cases).

  • false_values (list, optional) – A sequence of strings that denote false booleans in the data (defaults are appropriate in most cases).

  • decimal_point (1-character string, optional (default '.')) – The character used as decimal point in floating-point and decimal data.

  • timestamp_parsers (list, optional) – A sequence of strptime()-compatible format strings, tried in order when attempting to infer or convert timestamp values (the special value ISO8601() can also be given). By default, a fast built-in ISO-8601 parser is used.

  • strings_can_be_null (bool, optional (default False)) – Whether string / binary columns can have null values. If true, then strings in null_values are considered null for string columns. If false, then all strings are valid string values.

  • quoted_strings_can_be_null (bool, optional (default True)) – Whether quoted values can be null. If true, then strings in “null_values” are also considered null when they appear quoted in the CSV file. Otherwise, quoted values are never considered null.

  • auto_dict_encode (bool, optional (default False)) – Whether to try to automatically dict-encode string / binary data. If true, then when type inference detects a string or binary column, it it dict-encoded up to auto_dict_max_cardinality distinct values (per chunk), after which it switches to regular encoding. This setting is ignored for non-inferred columns (those in column_types).

  • auto_dict_max_cardinality (int, optional) – The maximum dictionary cardinality for auto_dict_encode. This value is per chunk.

  • include_columns (list, optional) – The names of columns to include in the Table. If empty, the Table will include all columns from the CSV file. If not empty, only these columns will be included, in this order.

  • include_missing_columns (bool, optional (default False)) – If false, columns in include_columns but not in the CSV file will error out. If true, columns in include_columns but not in the CSV file will produce a column of nulls (whose type is selected using column_types, or null by default). This option is ignored if include_columns is empty.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(*args, **kwargs)

Initialize self.

equals(self, ConvertOptions other)

validate(self)

Attributes

auto_dict_encode

Whether to try to automatically dict-encode string / binary data.

auto_dict_max_cardinality

The maximum dictionary cardinality for auto_dict_encode.

check_utf8

Whether to check UTF8 validity of string columns.

column_types

Explicitly map column names to column types.

decimal_point

The character used as decimal point in floating-point and decimal data.

false_values

A sequence of strings that denote false booleans in the data.

include_columns

The names of columns to include in the Table.

include_missing_columns

If false, columns in include_columns but not in the CSV file will error out.

null_values

A sequence of strings that denote nulls in the data.

quoted_strings_can_be_null

Whether quoted values can be null.

strings_can_be_null

Whether string / binary columns can have null values.

timestamp_parsers

A sequence of strptime()-compatible format strings, tried in order when attempting to infer or convert timestamp values (the special value ISO8601() can also be given).

true_values

A sequence of strings that denote true booleans in the data.

auto_dict_encode

Whether to try to automatically dict-encode string / binary data.

auto_dict_max_cardinality

The maximum dictionary cardinality for auto_dict_encode.

This value is per chunk.

check_utf8

Whether to check UTF8 validity of string columns.

column_types

Explicitly map column names to column types.

decimal_point

The character used as decimal point in floating-point and decimal data.

equals(self, ConvertOptions other)
false_values

A sequence of strings that denote false booleans in the data.

include_columns

The names of columns to include in the Table.

If empty, the Table will include all columns from the CSV file. If not empty, only these columns will be included, in this order.

include_missing_columns

If false, columns in include_columns but not in the CSV file will error out. If true, columns in include_columns but not in the CSV file will produce a null column (whose type is selected using column_types, or null by default). This option is ignored if include_columns is empty.

null_values

A sequence of strings that denote nulls in the data.

quoted_strings_can_be_null

Whether quoted values can be null.

strings_can_be_null

Whether string / binary columns can have null values.

timestamp_parsers

A sequence of strptime()-compatible format strings, tried in order when attempting to infer or convert timestamp values (the special value ISO8601() can also be given). By default, a fast built-in ISO-8601 parser is used.

true_values

A sequence of strings that denote true booleans in the data.

validate(self)