pyarrow.orc.ORCWriter

class pyarrow.orc.ORCWriter(where, *, file_version='0.12', batch_size=1024, stripe_size=67108864, compression='uncompressed', compression_block_size=65536, compression_strategy='speed', row_index_stride=10000, padding_tolerance=0.0, dictionary_key_size_threshold=0.0, bloom_filter_columns=None, bloom_filter_fpp=0.05)[source]

Bases: object

Writer interface for a single ORC file

Parameters
wherestr or pyarrow.io.NativeFile

Writable target. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface, pyarrow.io.BufferOutputStream or pyarrow.io.FixedSizeBufferWriter.

file_version{“0.11”, “0.12”}, default “0.12”

Determine which ORC file version to use. Hive 0.11 / ORC v0 is the older version while Hive 0.12 / ORC v1 is the newer one.

batch_sizeint, default 1024

Number of rows the ORC writer writes at a time.

stripe_sizeint, default 64 * 1024 * 1024

Size of each ORC stripe in bytes.

compressionstr, default ‘uncompressed’

The compression codec. Valid values: {‘UNCOMPRESSED’, ‘SNAPPY’, ‘ZLIB’, ‘LZ4’, ‘ZSTD’} Note that LZ0 is currently not supported.

compression_block_sizeint, default 64 * 1024

Size of each compression block in bytes.

compression_strategystr, default ‘speed’

The compression strategy i.e. speed vs size reduction. Valid values: {‘SPEED’, ‘COMPRESSION’}

row_index_strideint, default 10000

The row index stride i.e. the number of rows per an entry in the row index.

padding_tolerancedouble, default 0.0

The padding tolerance.

dictionary_key_size_thresholddouble, default 0.0

The dictionary key size threshold. 0 to disable dictionary encoding. 1 to always enable dictionary encoding.

bloom_filter_columnsNone, set-like or list-like, default None

Columns that use the bloom filter.

bloom_filter_fppdouble, default 0.05

Upper limit of the false-positive rate of the bloom filter.

__init__(where, *, file_version='0.12', batch_size=1024, stripe_size=67108864, compression='uncompressed', compression_block_size=65536, compression_strategy='speed', row_index_stride=10000, padding_tolerance=0.0, dictionary_key_size_threshold=0.0, bloom_filter_columns=None, bloom_filter_fpp=0.05)[source]

Methods

__init__(where, *[, file_version, ...])

close()

Close the ORC file

write(table)

Write the table into an ORC file.

Attributes

is_open

close()[source]

Close the ORC file

is_open = False
write(table)[source]

Write the table into an ORC file. The schema of the table must be equal to the schema used when opening the ORC file.

Parameters
tablepyarrow.Table

The table to be written into the ORC file