parquet::schema

Module parser

Source
Expand description

Parquet schema parser. Provides methods to parse and validate string message type into Parquet Type.

ยงExample

use parquet::schema::parser::parse_message_type;

let message_type = "
  message spark_schema {
    OPTIONAL BYTE_ARRAY a (UTF8);
    REQUIRED INT32 b;
    REQUIRED DOUBLE c;
    REQUIRED BOOLEAN d;
    OPTIONAL group e (LIST) {
      REPEATED group list {
        REQUIRED INT32 element;
      }
    }
  }
";

let schema = parse_message_type(message_type).expect("Expected valid schema");
println!("{:?}", schema);

Structsยง

  • Parser ๐Ÿ”’
    Internal Schema parser. Traverses message type using tokenizer and parses each group/primitive type recursively.
  • Tokenizer ๐Ÿ”’
    Tokenizer to split message type string into tokens that are separated using characters defined in is_schema_delim method. Tokenizer also preserves delimiters as tokens. Tokenizer provides Iterator interface to process tokens; it also allows to step back to reprocess previous tokens.

Functionsยง