Module parser

Source
Expand description

Parquet schema parser. Provides methods to parse and validate string message type into Parquet Type.

ยงExample

use parquet::schema::parser::parse_message_type;

let message_type = "
  message spark_schema {
    OPTIONAL BYTE_ARRAY a (UTF8);
    REQUIRED INT32 b;
    REQUIRED DOUBLE c;
    REQUIRED BOOLEAN d;
    OPTIONAL group e (LIST) {
      REPEATED group list {
        REQUIRED INT32 element;
      }
    }
  }
";

let schema = parse_message_type(message_type).expect("Expected valid schema");
println!("{:?}", schema);

Structsยง

Parser ๐Ÿ”’
Internal Schema parser. Traverses message type using tokenizer and parses each group/primitive type recursively.
Tokenizer ๐Ÿ”’
Tokenizer to split message type string into tokens that are separated using characters defined in is_schema_delim method. Tokenizer also preserves delimiters as tokens. Tokenizer provides Iterator interface to process tokens; it also allows to step back to reprocess previous tokens.

Functionsยง

assert_token ๐Ÿ”’
parse_bool ๐Ÿ”’
parse_i32 ๐Ÿ”’
parse_message_type
Parses message type as string into a Parquet Type which, for example, could be used to extract individual columns. Returns Parquet general error when parsing or validation fails.
parse_timeunit ๐Ÿ”’