pyarrow.orc.ORCFile

class pyarrow.orc.ORCFile(source)[source]

Bases: object

Reader interface for a single ORC file

Parameters:
sourcestr or pyarrow.NativeFile

Readable source. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface or pyarrow.io.BufferReader.

__init__(source)[source]

Methods

__init__(source)

read([columns])

Read the whole file.

read_stripe(n[, columns])

Read a single stripe from the file.

Attributes

compression

Compression codec of the file

compression_size

Number of bytes to buffer for the compression codec in the file

content_length

Length of the data stripes in the file in bytes

file_footer_length

The number of compressed bytes in the file footer

file_length

The number of bytes in the file

file_postscript_length

The number of bytes in the file postscript

file_version

Format version of the ORC file, must be 0.11 or 0.12

metadata

The file metadata, as an arrow KeyValueMetadata

nrows

The number of rows in the file

nstripe_statistics

Number of stripe statistics

nstripes

The number of stripes in the file

row_index_stride

Number of rows per an entry in the row index or 0 if there is no row index

schema

The file schema, as an arrow schema

software_version

Software instance and version that wrote this file

stripe_statistics_length

The number of compressed bytes in the file stripe statistics

writer

Name of the writer that wrote this file.

writer_version

Version of the writer

property compression

Compression codec of the file

property compression_size

Number of bytes to buffer for the compression codec in the file

property content_length

Length of the data stripes in the file in bytes

The number of compressed bytes in the file footer

property file_length

The number of bytes in the file

property file_postscript_length

The number of bytes in the file postscript

property file_version

Format version of the ORC file, must be 0.11 or 0.12

property metadata

The file metadata, as an arrow KeyValueMetadata

property nrows

The number of rows in the file

property nstripe_statistics

Number of stripe statistics

property nstripes

The number of stripes in the file

read(columns=None)[source]

Read the whole file.

Parameters:
columnslist

If not None, only these columns will be read from the file. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’. Output always follows the ordering of the file and not the columns list.

Returns:
pyarrow.Table

Content of the file as a Table.

read_stripe(n, columns=None)[source]

Read a single stripe from the file.

Parameters:
nint

The stripe index

columnslist

If not None, only these columns will be read from the stripe. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’

Returns:
pyarrow.RecordBatch

Content of the stripe as a RecordBatch.

property row_index_stride

Number of rows per an entry in the row index or 0 if there is no row index

property schema

The file schema, as an arrow schema

property software_version

Software instance and version that wrote this file

property stripe_statistics_length

The number of compressed bytes in the file stripe statistics

property writer

Name of the writer that wrote this file. If the writer is unknown then its Writer ID (a number) is returned

property writer_version

Version of the writer