Tabular File Formats#

CSV Files#

ConvertOptions([check_utf8, column_types, ...])

Options for converting CSV data.

CSVStreamingReader()

An object that reads record batches incrementally from a CSV file.

CSVWriter(sink, Schema schema, ...)

Writer to create a CSV file.

ISO8601

A special object indicating ISO-8601 parsing.

ParseOptions([delimiter, quote_char, ...])

Options for parsing CSV files.

ReadOptions([use_threads, block_size, ...])

Options for reading CSV files.

WriteOptions([include_header, batch_size, ...])

Options for writing CSV files.

open_csv(input_file[, read_options, ...])

Open a streaming reader of CSV data.

read_csv(input_file[, read_options, ...])

Read a Table from a stream of CSV data.

write_csv(data, output_file[, write_options])

Write record batch or table to a CSV file.

InvalidRow(expected_columns, actual_columns, ...)

Description of an invalid row in a CSV file.

Feather Files#

read_feather(source[, columns, use_threads, ...])

Read a pandas.DataFrame from Feather format.

read_table(source[, columns, memory_map, ...])

Read a pyarrow.Table from Feather format

write_feather(df, dest[, compression, ...])

Write a pandas.DataFrame to Feather format.

JSON Files#

ReadOptions([use_threads, block_size])

Options for reading JSON files.

ParseOptions([explicit_schema, ...])

Options for parsing JSON files.

read_json(input_file[, read_options, ...])

Read a Table from a stream of JSON data.

Parquet Files#

ParquetDataset([path_or_paths, filesystem, ...])

Encapsulates details of reading a complete Parquet dataset possibly consisting of multiple files and partitions in subdirectories.

ParquetFile(source[, metadata, ...])

Reader interface for a single Parquet file.

ParquetWriter(where, schema[, filesystem, ...])

Class for incrementally building a Parquet file for Arrow tables.

read_table(source[, columns, use_threads, ...])

Read a Table from Parquet format

read_metadata(where[, memory_map, ...])

Read FileMetaData from footer of a single Parquet file.

read_pandas(source[, columns])

Read a Table from Parquet format, also reading DataFrame index values if known in the file metadata

read_schema(where[, memory_map, ...])

Read effective Arrow schema from Parquet file metadata.

write_metadata(schema, where[, ...])

Write metadata-only Parquet file from schema.

write_table(table, where[, row_group_size, ...])

Write a Table to Parquet format.

write_to_dataset(table, root_path[, ...])

Wrapper around parquet.write_table for writing a Table to Parquet format by partitions.

Parquet Metadata#

FileMetaData

Parquet metadata for a single file.

RowGroupMetaData

Metadata for a single row group.

ColumnChunkMetaData

Column metadata for a single row group.

Statistics

Statistics for a single column in a single row group.

ParquetSchema

A Parquet schema.

ColumnSchema

Schema for a single column.

ParquetLogicalType

Logical type of parquet type.

Encrypted Parquet Files#

CryptoFactory(kms_client_factory)

A factory that produces the low-level FileEncryptionProperties and FileDecryptionProperties objects, from the high-level parameters.

KmsClient()

The abstract base class for KmsClient implementations.

KmsConnectionConfig([kms_instance_id, ...])

Configuration of the connection to the Key Management Service (KMS)

EncryptionConfiguration(footer_key[, ...])

Configuration of the encryption, such as which columns to encrypt

DecryptionConfiguration([cache_lifetime])

Configuration of the decryption, such as cache timeout.

ORC Files#

ORCFile(source)

Reader interface for a single ORC file

ORCWriter(where, *[, file_version, ...])

Writer interface for a single ORC file

read_table(source[, columns, filesystem])

Read a Table from an ORC file.

write_table(table, where, *[, file_version, ...])

Write a table into an ORC file.