pyarrow.orc.ORCFile#
- class pyarrow.orc.ORCFile(source)[source]#
- Bases: - object- Reader interface for a single ORC file - Parameters:
- sourcestrorpyarrow.NativeFile
- Readable source. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface or pyarrow.io.BufferReader. 
 
- source
 - Methods - __init__(source)- read([columns])- Read the whole file. - read_stripe(n[, columns])- Read a single stripe from the file. - Attributes - Compression codec of the file - Number of bytes to buffer for the compression codec in the file - Length of the data stripes in the file in bytes - The number of compressed bytes in the file footer - The number of bytes in the file - The number of bytes in the file postscript - Format version of the ORC file, must be 0.11 or 0.12 - The file metadata, as an arrow KeyValueMetadata - The number of rows in the file - Number of stripe statistics - The number of stripes in the file - Number of rows per an entry in the row index or 0 if there is no row index - The file schema, as an arrow schema - Software instance and version that wrote this file - The number of compressed bytes in the file stripe statistics - Name of the writer that wrote this file. - Version of the writer - property compression#
- Compression codec of the file 
 - property compression_size#
- Number of bytes to buffer for the compression codec in the file 
 - property content_length#
- Length of the data stripes in the file in bytes 
 - The number of compressed bytes in the file footer 
 - property file_length#
- The number of bytes in the file 
 - property file_postscript_length#
- The number of bytes in the file postscript 
 - property file_version#
- Format version of the ORC file, must be 0.11 or 0.12 
 - property metadata#
- The file metadata, as an arrow KeyValueMetadata 
 - property nrows#
- The number of rows in the file 
 - property nstripe_statistics#
- Number of stripe statistics 
 - property nstripes#
- The number of stripes in the file 
 - read(columns=None)[source]#
- Read the whole file. - Parameters:
- columnslist
- If not None, only these columns will be read from the file. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’. Output always follows the ordering of the file and not the columns list. 
 
- columns
- Returns:
- pyarrow.Table
- Content of the file as a Table. 
 
 
 - read_stripe(n, columns=None)[source]#
- Read a single stripe from the file. - Parameters:
- Returns:
- pyarrow.RecordBatch
- Content of the stripe as a RecordBatch. 
 
 
 - property row_index_stride#
- Number of rows per an entry in the row index or 0 if there is no row index 
 - property schema#
- The file schema, as an arrow schema 
 - property software_version#
- Software instance and version that wrote this file 
 - property stripe_statistics_length#
- The number of compressed bytes in the file stripe statistics 
 - property writer#
- Name of the writer that wrote this file. If the writer is unknown then its Writer ID (a number) is returned 
 - property writer_version#
- Version of the writer 
 
 
    