pyarrow.orc.ORCWriter#

class pyarrow.orc.ORCWriter(where, *, file_version='0.12', batch_size=1024, stripe_size=67108864, compression='uncompressed', compression_block_size=65536, compression_strategy='speed', row_index_stride=10000, padding_tolerance=0.0, dictionary_key_size_threshold=0.0, bloom_filter_columns=None, bloom_filter_fpp=0.05)[source]#

Bases: object

Writer interface for a single ORC file

Parameters:
wherestr or pyarrow.io.NativeFile

Writable target. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface, pyarrow.io.BufferOutputStream or pyarrow.io.FixedSizeBufferWriter.

file_version{“0.11”, “0.12”}, default “0.12”

Determine which ORC file version to use. Hive 0.11 / ORC v0 is the older version while Hive 0.12 / ORC v1 is the newer one.

batch_sizeint, default 1024

Number of rows the ORC writer writes at a time.

stripe_sizeint, default 64 * 1024 * 1024

Size of each ORC stripe in bytes.

compressionstr, default ‘uncompressed’

The compression codec. Valid values: {‘UNCOMPRESSED’, ‘SNAPPY’, ‘ZLIB’, ‘LZ4’, ‘ZSTD’} Note that LZ0 is currently not supported.

compression_block_sizeint, default 64 * 1024

Size of each compression block in bytes.

compression_strategystr, default ‘speed’

The compression strategy i.e. speed vs size reduction. Valid values: {‘SPEED’, ‘COMPRESSION’}

row_index_strideint, default 10000

The row index stride i.e. the number of rows per an entry in the row index.

padding_tolerancedouble, default 0.0

The padding tolerance.

dictionary_key_size_thresholddouble, default 0.0

The dictionary key size threshold. 0 to disable dictionary encoding. 1 to always enable dictionary encoding.

bloom_filter_columnsNone, set-like or list-like, default None

Columns that use the bloom filter.

bloom_filter_fppdouble, default 0.05

Upper limit of the false-positive rate of the bloom filter.

__init__(where, *, file_version='0.12', batch_size=1024, stripe_size=67108864, compression='uncompressed', compression_block_size=65536, compression_strategy='speed', row_index_stride=10000, padding_tolerance=0.0, dictionary_key_size_threshold=0.0, bloom_filter_columns=None, bloom_filter_fpp=0.05)[source]#

Methods

__init__(where, *[, file_version, ...])

close()

Close the ORC file

write(table)

Write the table into an ORC file.

Attributes

close()[source]#

Close the ORC file

is_open = False#
write(table)[source]#

Write the table into an ORC file. The schema of the table must be equal to the schema used when opening the ORC file.

Parameters:
tablepyarrow.Table

The table to be written into the ORC file