pyarrow.orc.ORCFile

class pyarrow.orc.ORCFile(source)[source]

Bases: object

Reader interface for a single ORC file

Parameters

source (str or pyarrow.io.NativeFile) – Readable source. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface or pyarrow.io.BufferReader.

__init__(source)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(source)

Initialize self.

read([columns])

Read the whole file.

read_stripe(n[, columns])

Read a single stripe from the file.

Attributes

metadata

The file metadata, as an arrow KeyValueMetadata

nrows

The number of rows in the file

nstripes

The number of stripes in the file

schema

The file schema, as an arrow schema

property metadata

The file metadata, as an arrow KeyValueMetadata

property nrows

The number of rows in the file

property nstripes

The number of stripes in the file

read(columns=None)[source]

Read the whole file.

Parameters

columns (list) – If not None, only these columns will be read from the file. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’

Returns

pyarrow.lib.Table – Content of the file as a Table.

read_stripe(n, columns=None)[source]

Read a single stripe from the file.

Parameters
  • n (int) – The stripe index

  • columns (list) – If not None, only these columns will be read from the stripe. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’

Returns

pyarrow.lib.RecordBatch – Content of the stripe as a RecordBatch.

property schema

The file schema, as an arrow schema