pyarrow.orc.ORCFile#

class pyarrow.orc.ORCFile(source)[source]#

Bases: object

Reader interface for a single ORC file

Parameters
sourcestr or pyarrow.NativeFile

Readable source. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface or pyarrow.io.BufferReader.

__init__(source)[source]#

Methods

__init__(source)

read([columns])

Read the whole file.

read_stripe(n[, columns])

Read a single stripe from the file.

Attributes

compression

Compression codec of the file

compression_size

Number of bytes to buffer for the compression codec in the file

content_length

Length of the data stripes in the file in bytes

file_footer_length

The number of compressed bytes in the file footer

file_length

The number of bytes in the file

file_postscript_length

The number of bytes in the file postscript

file_version

Format version of the ORC file, must be 0.11 or 0.12

metadata

The file metadata, as an arrow KeyValueMetadata

nrows

The number of rows in the file

nstripe_statistics

Number of stripe statistics

nstripes

The number of stripes in the file

row_index_stride

Number of rows per an entry in the row index or 0 if there is no row index

schema

The file schema, as an arrow schema

software_version

Software instance and version that wrote this file

stripe_statistics_length

The number of compressed bytes in the file stripe statistics

writer

Name of the writer that wrote this file.

writer_version

Version of the writer

property compression#

Compression codec of the file

property compression_size#

Number of bytes to buffer for the compression codec in the file

property content_length#

Length of the data stripes in the file in bytes

The number of compressed bytes in the file footer

property file_length#

The number of bytes in the file

property file_postscript_length#

The number of bytes in the file postscript

property file_version#

Format version of the ORC file, must be 0.11 or 0.12

property metadata#

The file metadata, as an arrow KeyValueMetadata

property nrows#

The number of rows in the file

property nstripe_statistics#

Number of stripe statistics

property nstripes#

The number of stripes in the file

read(columns=None)[source]#

Read the whole file.

Parameters
columnslist

If not None, only these columns will be read from the file. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’

Returns
pyarrow.Table

Content of the file as a Table.

read_stripe(n, columns=None)[source]#

Read a single stripe from the file.

Parameters
nint

The stripe index

columnslist

If not None, only these columns will be read from the stripe. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’

Returns
pyarrow.RecordBatch

Content of the stripe as a RecordBatch.

property row_index_stride#

Number of rows per an entry in the row index or 0 if there is no row index

property schema#

The file schema, as an arrow schema

property software_version#

Software instance and version that wrote this file

property stripe_statistics_length#

The number of compressed bytes in the file stripe statistics

property writer#

Name of the writer that wrote this file. If the writer is unknown then its Writer ID (a number) is returned

property writer_version#

Version of the writer