Crate parquet_fromcsv

Source
Expand description

Binary file to converts csv to Parquet file

§Install

parquet-fromcsv can be installed using cargo:

cargo install parquet --features=cli

After this parquet-fromcsv should be available:

parquet-fromcsv --schema message_schema_for_parquet.txt input.csv output.parquet

The binary can also be built from the source code and run as follows:

cargo run --features=cli --bin parquet-fromcsv --schema message_schema_for_parquet.txt \
   \ input.csv output.parquet

§Options

Usage: parquet [OPTIONS] --schema <SCHEMA> --input-file <INPUT_FILE> --output-file <OUTPUT_FILE>

Options:
  -s, --schema <SCHEMA>
          message schema for output Parquet

  -i, --input-file <INPUT_FILE>
          input CSV file

  -o, --output-file <OUTPUT_FILE>
          output Parquet file

  -f, --input-format <INPUT_FORMAT>
          input file format
          
          [default: csv]
          [possible values: csv, tsv]

  -b, --batch-size <BATCH_SIZE>
          batch size
          
          [env: PARQUET_FROM_CSV_BATCHSIZE=]
          [default: 1000]

  -h, --has-header
          has header

  -d, --delimiter <DELIMITER>
          field delimiter
          
          default value: when input_format==CSV: ',' when input_format==TSV: 'TAB'

  -r, --record-terminator <RECORD_TERMINATOR>
          record terminator
          
          [possible values: lf, crlf, cr]

  -e, --escape-char <ESCAPE_CHAR>
          escape character

  -q, --quote-char <QUOTE_CHAR>
          quote character

  -D, --double-quote <DOUBLE_QUOTE>
          double quote
          
          [possible values: true, false]

  -C, --csv-compression <CSV_COMPRESSION>
          compression mode of csv
          
          [default: UNCOMPRESSED]

  -c, --parquet-compression <PARQUET_COMPRESSION>
          compression mode of parquet
          
          [default: SNAPPY]

  -w, --writer-version <WRITER_VERSION>
          writer version

  -m, --max-row-group-size <MAX_ROW_GROUP_SIZE>
          max row group size

      --enable-bloom-filter <ENABLE_BLOOM_FILTER>
          whether to enable bloom filter writing
          
          [possible values: true, false]

      --help
          display usage help

  -V, --version
          Print version

§Parquet file options

- `-b`, `--batch-size` : Batch size for Parquet
- `-c`, `--parquet-compression` : Compression option for Parquet, default is SNAPPY
- `-s`, `--schema` : Path to message schema for generated Parquet file
- `-o`, `--output-file` : Path to output Parquet file
- `-w`, `--writer-version` : Writer version
- `-m`, `--max-row-group-size` : Max row group size
-       `--enable-bloom-filter` : Enable bloom filter during writing

§Input file options

- `-i`, `--input-file` : Path to input CSV file
- `-f`, `--input-format` : Dialect for input file, `csv` or `tsv`.
- `-C`, `--csv-compression` : Compression option for csv, default is UNCOMPRESSED
- `-d`, `--delimiter : Field delimiter for CSV file, default depends `--input-format`
- `-e`, `--escape` : Escape character for input file
- `-h`, `--has-header` : Input has header
- `-r`, `--record-terminator` : Record terminator character for input. default is CRLF
- `-q`, `--quote-char` : Input quoting character

Structs§

Enums§

Functions§