Expand description
Binary file to converts csv to Parquet file
§Install
parquet-fromcsv
can be installed using cargo
:
cargo install parquet --features=cli
After this parquet-fromcsv
should be available:
parquet-fromcsv --schema message_schema_for_parquet.txt input.csv output.parquet
The binary can also be built from the source code and run as follows:
cargo run --features=cli --bin parquet-fromcsv --schema message_schema_for_parquet.txt \
\ input.csv output.parquet
§Options
Usage: parquet [OPTIONS] --schema <SCHEMA> --input-file <INPUT_FILE> --output-file <OUTPUT_FILE>
Options:
-s, --schema <SCHEMA>
message schema for output Parquet
-i, --input-file <INPUT_FILE>
input CSV file
-o, --output-file <OUTPUT_FILE>
output Parquet file
-f, --input-format <INPUT_FORMAT>
input file format
[default: csv]
[possible values: csv, tsv]
-b, --batch-size <BATCH_SIZE>
batch size
[env: PARQUET_FROM_CSV_BATCHSIZE=]
[default: 1000]
-h, --has-header
has header
-d, --delimiter <DELIMITER>
field delimiter
default value: when input_format==CSV: ',' when input_format==TSV: 'TAB'
-r, --record-terminator <RECORD_TERMINATOR>
record terminator
[possible values: lf, crlf, cr]
-e, --escape-char <ESCAPE_CHAR>
escape character
-q, --quote-char <QUOTE_CHAR>
quote character
-D, --double-quote <DOUBLE_QUOTE>
double quote
[possible values: true, false]
-C, --csv-compression <CSV_COMPRESSION>
compression mode of csv
[default: UNCOMPRESSED]
-c, --parquet-compression <PARQUET_COMPRESSION>
compression mode of parquet
[default: SNAPPY]
-w, --writer-version <WRITER_VERSION>
writer version
-m, --max-row-group-size <MAX_ROW_GROUP_SIZE>
max row group size
--enable-bloom-filter <ENABLE_BLOOM_FILTER>
whether to enable bloom filter writing
[possible values: true, false]
--help
display usage help
-V, --version
Print version
§Parquet file options
- `-b`, `--batch-size` : Batch size for Parquet
- `-c`, `--parquet-compression` : Compression option for Parquet, default is SNAPPY
- `-s`, `--schema` : Path to message schema for generated Parquet file
- `-o`, `--output-file` : Path to output Parquet file
- `-w`, `--writer-version` : Writer version
- `-m`, `--max-row-group-size` : Max row group size
- `--enable-bloom-filter` : Enable bloom filter during writing
§Input file options
- `-i`, `--input-file` : Path to input CSV file
- `-f`, `--input-format` : Dialect for input file, `csv` or `tsv`.
- `-C`, `--csv-compression` : Compression option for csv, default is UNCOMPRESSED
- `-d`, `--delimiter : Field delimiter for CSV file, default depends `--input-format`
- `-e`, `--escape` : Escape character for input file
- `-h`, `--has-header` : Input has header
- `-r`, `--record-terminator` : Record terminator character for input. default is CRLF
- `-q`, `--quote-char` : Input quoting character
Structs§
- Args 🔒
Enums§
Functions§
- main 🔒