pub struct ArrowSchemaConverter<'a> {
schema_root: &'a str,
coerce_types: bool,
}
Expand description
Converter for Arrow schema to Parquet schema
Example:
use parquet::schema::types::{SchemaDescriptor, Type};
use parquet::basic; // note there are two `Type`s in the following example
// create an Arrow Schema
let arrow_schema = Schema::new(vec![
Field::new("a", DataType::Int64, true),
Field::new("b", DataType::Date32, true),
]);
// convert the Arrow schema to a Parquet schema
let parquet_schema = ArrowSchemaConverter::new()
.convert(&arrow_schema)
.unwrap();
let expected_parquet_schema = SchemaDescriptor::new(
Arc::new(
Type::group_type_builder("arrow_schema")
.with_fields(vec![
Arc::new(
Type::primitive_type_builder("a", basic::Type::INT64)
.build().unwrap()
),
Arc::new(
Type::primitive_type_builder("b", basic::Type::INT32)
.with_converted_type(basic::ConvertedType::DATE)
.with_logical_type(Some(basic::LogicalType::Date))
.build().unwrap()
),
])
.build().unwrap()
)
);
assert_eq!(parquet_schema, expected_parquet_schema);
Fields§
§schema_root: &'a str
Name of the root schema in Parquet
coerce_types: bool
Should we coerce Arrow types to compatible Parquet types?
See docs on Self::with_coerce_types`
Implementations§
Source§impl<'a> ArrowSchemaConverter<'a>
impl<'a> ArrowSchemaConverter<'a>
Sourcepub fn with_coerce_types(self, coerce_types: bool) -> Self
pub fn with_coerce_types(self, coerce_types: bool) -> Self
Should Arrow types be coerced into Parquet native types (default false
).
Setting this option to true
will result in Parquet files that can be
read by more readers, but may lose precision for Arrow types such as
[DataType::Date64
] which have no direct corresponding Parquet type.
By default, this converter does not coerce to native Parquet types. Enabling type coercion allows for meaningful representations that do not require downstream readers to consider the embedded Arrow schema, and can allow for greater compatibility with other Parquet implementations. However, type coercion also prevents data from being losslessly round-tripped.
§Discussion
Some Arrow types such as Date64
, Timestamp
and Interval
have no
corresponding Parquet logical type. Thus, they can not be losslessly
round-tripped when stored using the appropriate Parquet logical type.
For example, some Date64 values may be truncated when stored with
parquet’s native 32 bit date type.
For List
and Map
types, some Parquet readers expect certain
schema elements to have specific names (earlier versions of the spec
were somewhat ambiguous on this point). Type coercion will use the names
prescribed by the Parquet specification, potentially losing naming
metadata from the Arrow schema.
Sourcepub fn schema_root(self, schema_root: &'a str) -> Self
pub fn schema_root(self, schema_root: &'a str) -> Self
Set the root schema element name (defaults to "arrow_schema"
).
Sourcepub fn convert(&self, schema: &Schema) -> Result<SchemaDescriptor>
pub fn convert(&self, schema: &Schema) -> Result<SchemaDescriptor>
Convert the specified Arrow [Schema
] to the desired Parquet SchemaDescriptor
See example in ArrowSchemaConverter
Trait Implementations§
Source§impl<'a> Debug for ArrowSchemaConverter<'a>
impl<'a> Debug for ArrowSchemaConverter<'a>
Auto Trait Implementations§
impl<'a> Freeze for ArrowSchemaConverter<'a>
impl<'a> RefUnwindSafe for ArrowSchemaConverter<'a>
impl<'a> Send for ArrowSchemaConverter<'a>
impl<'a> Sync for ArrowSchemaConverter<'a>
impl<'a> Unpin for ArrowSchemaConverter<'a>
impl<'a> UnwindSafe for ArrowSchemaConverter<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more