Struct ArrowSchemaConverter

Source
pub struct ArrowSchemaConverter<'a> {
    schema_root: &'a str,
    coerce_types: bool,
}
Expand description

Converter for Arrow schema to Parquet schema

Example:

use parquet::schema::types::{SchemaDescriptor, Type};
use parquet::basic; // note there are two `Type`s in the following example
// create an Arrow Schema
let arrow_schema = Schema::new(vec![
  Field::new("a", DataType::Int64, true),
  Field::new("b", DataType::Date32, true),
]);
// convert the Arrow schema to a Parquet schema
let parquet_schema = ArrowSchemaConverter::new()
  .convert(&arrow_schema)
  .unwrap();

let expected_parquet_schema = SchemaDescriptor::new(
  Arc::new(
    Type::group_type_builder("arrow_schema")
      .with_fields(vec![
        Arc::new(
         Type::primitive_type_builder("a", basic::Type::INT64)
          .build().unwrap()
        ),
        Arc::new(
         Type::primitive_type_builder("b", basic::Type::INT32)
          .with_converted_type(basic::ConvertedType::DATE)
          .with_logical_type(Some(basic::LogicalType::Date))
          .build().unwrap()
        ),
     ])
     .build().unwrap()
  )
);
assert_eq!(parquet_schema, expected_parquet_schema);

Fields§

§schema_root: &'a str

Name of the root schema in Parquet

§coerce_types: bool

Should we coerce Arrow types to compatible Parquet types?

See docs on Self::with_coerce_types`

Implementations§

Source§

impl<'a> ArrowSchemaConverter<'a>

Source

pub fn new() -> Self

Create a new converter

Source

pub fn with_coerce_types(self, coerce_types: bool) -> Self

Should Arrow types be coerced into Parquet native types (default false).

Setting this option to true will result in Parquet files that can be read by more readers, but may lose precision for Arrow types such as [DataType::Date64] which have no direct corresponding Parquet type.

By default, this converter does not coerce to native Parquet types. Enabling type coercion allows for meaningful representations that do not require downstream readers to consider the embedded Arrow schema, and can allow for greater compatibility with other Parquet implementations. However, type coercion also prevents data from being losslessly round-tripped.

§Discussion

Some Arrow types such as Date64, Timestamp and Interval have no corresponding Parquet logical type. Thus, they can not be losslessly round-tripped when stored using the appropriate Parquet logical type. For example, some Date64 values may be truncated when stored with parquet’s native 32 bit date type.

For List and Map types, some Parquet readers expect certain schema elements to have specific names (earlier versions of the spec were somewhat ambiguous on this point). Type coercion will use the names prescribed by the Parquet specification, potentially losing naming metadata from the Arrow schema.

Source

pub fn schema_root(self, schema_root: &'a str) -> Self

Set the root schema element name (defaults to "arrow_schema").

Source

pub fn convert(&self, schema: &Schema) -> Result<SchemaDescriptor>

Convert the specified Arrow [Schema] to the desired Parquet SchemaDescriptor

See example in ArrowSchemaConverter

Trait Implementations§

Source§

impl<'a> Debug for ArrowSchemaConverter<'a>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for ArrowSchemaConverter<'_>

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

§

impl<T> ErasedDestructor for T
where T: 'static,

§

impl<T> MaybeSendSync for T