pyarrow.orc.ORCFile#
- class pyarrow.orc.ORCFile(source)[source]#
Bases:
object
Reader interface for a single ORC file
- Parameters:
- source
str
orpyarrow.NativeFile
Readable source. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface or pyarrow.io.BufferReader.
- source
Methods
__init__
(source)read
([columns])Read the whole file.
read_stripe
(n[, columns])Read a single stripe from the file.
Attributes
Compression codec of the file
Number of bytes to buffer for the compression codec in the file
Length of the data stripes in the file in bytes
The number of compressed bytes in the file footer
The number of bytes in the file
The number of bytes in the file postscript
Format version of the ORC file, must be 0.11 or 0.12
The file metadata, as an arrow KeyValueMetadata
The number of rows in the file
Number of stripe statistics
The number of stripes in the file
Number of rows per an entry in the row index or 0 if there is no row index
The file schema, as an arrow schema
Software instance and version that wrote this file
The number of compressed bytes in the file stripe statistics
Name of the writer that wrote this file.
Version of the writer
- property compression#
Compression codec of the file
- property compression_size#
Number of bytes to buffer for the compression codec in the file
- property content_length#
Length of the data stripes in the file in bytes
The number of compressed bytes in the file footer
- property file_length#
The number of bytes in the file
- property file_postscript_length#
The number of bytes in the file postscript
- property file_version#
Format version of the ORC file, must be 0.11 or 0.12
- property metadata#
The file metadata, as an arrow KeyValueMetadata
- property nrows#
The number of rows in the file
- property nstripe_statistics#
Number of stripe statistics
- property nstripes#
The number of stripes in the file
- read(columns=None)[source]#
Read the whole file.
- Parameters:
- columns
list
If not None, only these columns will be read from the file. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’. Output always follows the ordering of the file and not the columns list.
- columns
- Returns:
pyarrow.Table
Content of the file as a Table.
- read_stripe(n, columns=None)[source]#
Read a single stripe from the file.
- Parameters:
- Returns:
pyarrow.RecordBatch
Content of the stripe as a RecordBatch.
- property row_index_stride#
Number of rows per an entry in the row index or 0 if there is no row index
- property schema#
The file schema, as an arrow schema
- property software_version#
Software instance and version that wrote this file
- property stripe_statistics_length#
The number of compressed bytes in the file stripe statistics
- property writer#
Name of the writer that wrote this file. If the writer is unknown then its Writer ID (a number) is returned
- property writer_version#
Version of the writer