pyarrow.orc.write_table#

pyarrow.orc.write_table(table, where, *, file_version='0.12', batch_size=1024, stripe_size=67108864, compression='uncompressed', compression_block_size=65536, compression_strategy='speed', row_index_stride=10000, padding_tolerance=0.0, dictionary_key_size_threshold=0.0, bloom_filter_columns=None, bloom_filter_fpp=0.05)[source]#

Write a table into an ORC file.

Parameters
tablepyarrow.lib.Table

The table to be written into the ORC file

wherestr or pyarrow.io.NativeFile

Writable target. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface, pyarrow.io.BufferOutputStream or pyarrow.io.FixedSizeBufferWriter.

file_version{“0.11”, “0.12”}, default “0.12”

Determine which ORC file version to use. Hive 0.11 / ORC v0 is the older version while Hive 0.12 / ORC v1 is the newer one.

batch_sizeint, default 1024

Number of rows the ORC writer writes at a time.

stripe_sizeint, default 64 * 1024 * 1024

Size of each ORC stripe in bytes.

compressionstr, default ‘uncompressed’

The compression codec. Valid values: {‘UNCOMPRESSED’, ‘SNAPPY’, ‘ZLIB’, ‘LZ4’, ‘ZSTD’} Note that LZ0 is currently not supported.

compression_block_sizeint, default 64 * 1024

Size of each compression block in bytes.

compression_strategystr, default ‘speed’

The compression strategy i.e. speed vs size reduction. Valid values: {‘SPEED’, ‘COMPRESSION’}

row_index_strideint, default 10000

The row index stride i.e. the number of rows per an entry in the row index.

padding_tolerancedouble, default 0.0

The padding tolerance.

dictionary_key_size_thresholddouble, default 0.0

The dictionary key size threshold. 0 to disable dictionary encoding. 1 to always enable dictionary encoding.

bloom_filter_columnsNone, set-like or list-like, default None

Columns that use the bloom filter.

bloom_filter_fppdouble, default 0.05

Upper limit of the false-positive rate of the bloom filter.