pyarrow.csv.ParseOptions#
- class pyarrow.csv.ParseOptions(delimiter=None, *, quote_char=None, double_quote=None, escape_char=None, newlines_in_values=None, ignore_empty_lines=None, invalid_row_handler=None)#
Bases:
pyarrow.lib._Weakrefable
Options for parsing CSV files.
- Parameters
- delimiter1-character
str
, optional (default ‘,’) The character delimiting individual cells in the CSV data.
- quote_char1-character
str
orFalse
, optional (default ‘”’) The character used optionally for quoting CSV values (False if quoting is not allowed).
- double_quotebool, optional (default
True
) Whether two quotes in a quoted CSV value denote a single quote in the data.
- escape_char1-character
str
orFalse
, optional (defaultFalse
) The character used optionally for escaping special characters (False if escaping is not allowed).
- newlines_in_valuesbool, optional (default
False
) Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.
- ignore_empty_linesbool, optional (default
True
) Whether empty lines are ignored in CSV input. If False, an empty line is interpreted as containing a single empty value (assuming a one-column CSV file).
- invalid_row_handler
callable()
, optional (defaultNone
) If not None, this object is called for each CSV row that fails parsing (because of a mismatching number of columns). It should accept a single InvalidRow argument and return either “skip” or “error” depending on the desired outcome.
- delimiter1-character
Examples
Defining an example file from bytes object:
>>> import io >>> s = "animals;n_legs;entry\nFlamingo;2;2022-03-01\n# Comment here:\nHorse;4;2022-03-02\nBrittle stars;5;2022-03-03\nCentipede;100;2022-03-04" >>> print(s) animals;n_legs;entry Flamingo;2;2022-03-01 # Comment here: Horse;4;2022-03-02 Brittle stars;5;2022-03-03 Centipede;100;2022-03-04 >>> source = io.BytesIO(s.encode())
Read the data from a file skipping rows with comments and defining the delimiter:
>>> from pyarrow import csv >>> def skip_comment(row): ... if row.text.startswith("# "): ... return 'skip' ... else: ... return 'error' ... >>> parse_options = csv.ParseOptions(delimiter=";", invalid_row_handler=skip_comment) >>> csv.read_csv(source, parse_options=parse_options) pyarrow.Table animals: string n_legs: int64 entry: date32[day] ---- animals: [["Flamingo","Horse","Brittle stars","Centipede"]] n_legs: [[2,4,5,100]] entry: [[2022-03-01,2022-03-02,2022-03-03,2022-03-04]]
- __init__(*args, **kwargs)#
Methods
__init__
(*args, **kwargs)equals
(self, ParseOptions other)validate
(self)Attributes
The character delimiting individual cells in the CSV data.
Whether two quotes in a quoted CSV value denote a single quote in the data.
The character used optionally for escaping special characters (False if escaping is not allowed).
Whether empty lines are ignored in CSV input.
Optional handler for invalid rows.
Whether newline characters are allowed in CSV values.
The character used optionally for quoting CSV values (False if quoting is not allowed).
- delimiter#
The character delimiting individual cells in the CSV data.
- double_quote#
Whether two quotes in a quoted CSV value denote a single quote in the data.
- equals(self, ParseOptions other)#
- escape_char#
The character used optionally for escaping special characters (False if escaping is not allowed).
- ignore_empty_lines#
Whether empty lines are ignored in CSV input. If False, an empty line is interpreted as containing a single empty value (assuming a one-column CSV file).
- invalid_row_handler#
Optional handler for invalid rows.
If not None, this object is called for each CSV row that fails parsing (because of a mismatching number of columns). It should accept a single InvalidRow argument and return either “skip” or “error” depending on the desired outcome.
- newlines_in_values#
Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.
- quote_char#
The character used optionally for quoting CSV values (False if quoting is not allowed).
- validate(self)#