Tabular File Formats

CSV Files

ConvertOptions([check_utf8, column_types, ...])

Options for converting CSV data.

CSVStreamingReader()

An object that reads record batches incrementally from a CSV file.

CSVWriter(sink, Schema schema, ...)

Writer to create a CSV file.

ISO8601

A special object indicating ISO-8601 parsing.

ParseOptions([delimiter, quote_char, ...])

Options for parsing CSV files.

ReadOptions([use_threads, block_size, ...])

Options for reading CSV files.

WriteOptions([include_header, batch_size])

Options for writing CSV files.

open_csv(input_file[, read_options, ...])

Open a streaming reader of CSV data.

read_csv(input_file[, read_options, ...])

Read a Table from a stream of CSV data.

write_csv(data, output_file[, write_options])

Write record batch or table to a CSV file.

InvalidRow(expected_columns, actual_columns, ...)

Description of an invalid row in a CSV file.

Feather Files

read_feather(source[, columns, use_threads, ...])

Read a pandas.DataFrame from Feather format.

read_table(source[, columns, memory_map, ...])

Read a pyarrow.Table from Feather format

write_feather(df, dest[, compression, ...])

Write a pandas.DataFrame to Feather format.

JSON Files

ReadOptions([use_threads, block_size])

Options for reading JSON files.

ParseOptions([explicit_schema, ...])

Options for parsing JSON files.

read_json(input_file[, read_options, ...])

Read a Table from a stream of JSON data.

Parquet Files

ParquetDataset([path_or_paths, filesystem, ...])

Encapsulates details of reading a complete Parquet dataset possibly consisting of multiple files and partitions in subdirectories.

ParquetFile(source[, metadata, ...])

Reader interface for a single Parquet file.

ParquetWriter(where, schema[, filesystem, ...])

Class for incrementally building a Parquet file for Arrow tables.

read_table(source[, columns, use_threads, ...])

Read a Table from Parquet format

read_metadata(where[, memory_map])

Read FileMetadata from footer of a single Parquet file.

read_pandas(source[, columns])

Read a Table from Parquet format, also reading DataFrame index values if known in the file metadata

read_schema(where[, memory_map])

Read effective Arrow schema from Parquet file metadata.

write_metadata(schema, where[, ...])

Write metadata-only Parquet file from schema.

write_table(table, where[, row_group_size, ...])

Write a Table to Parquet format.

write_to_dataset(table, root_path[, ...])

Wrapper around parquet.write_table for writing a Table to Parquet format by partitions.

ORC Files

ORCFile(source)

Reader interface for a single ORC file

ORCWriter(where, *[, file_version, ...])

Writer interface for a single ORC file

read_table(source[, columns, filesystem])

Read a Table from an ORC file.

write_table(table, where, *[, file_version, ...])

Write a table into an ORC file.