pub struct ArrowReaderOptions {
skip_arrow_metadata: bool,
supplied_schema: Option<SchemaRef>,
pub(crate) page_index: bool,
}
Expand description
Options that control how metadata is read for a parquet file
See ArrowReaderBuilder
for how to configure how the column data
is then read from the file, including projection and filter pushdown
Fields§
§skip_arrow_metadata: bool
Should the reader strip any user defined metadata from the Arrow schema
supplied_schema: Option<SchemaRef>
If provided used as the schema for the file, otherwise the schema is read from the file
page_index: bool
If true, attempt to read OffsetIndex
and ColumnIndex
Implementations§
Source§impl ArrowReaderOptions
impl ArrowReaderOptions
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new ArrowReaderOptions
with the default settings
Sourcepub fn with_skip_arrow_metadata(self, skip_arrow_metadata: bool) -> Self
pub fn with_skip_arrow_metadata(self, skip_arrow_metadata: bool) -> Self
Skip decoding the embedded arrow metadata (defaults to false
)
Parquet files generated by some writers may contain embedded arrow schema and metadata. This may not be correct or compatible with your system, for example: ARROW-16184
Sourcepub fn with_schema(self, schema: SchemaRef) -> Self
pub fn with_schema(self, schema: SchemaRef) -> Self
Provide a schema to use when reading the parquet file. If provided it takes precedence over the schema inferred from the file or the schema defined in the file’s metadata. If the schema is not compatible with the file’s schema an error will be returned when constructing the builder.
This option is only required if you want to cast columns to a different type. For example, if you wanted to cast from an Int64 in the Parquet file to a Timestamp in the Arrow schema.
The supplied schema must have the same number of columns as the parquet schema and the column names need to be the same.
§Example
use std::io::Bytes;
use std::sync::Arc;
use tempfile::tempfile;
use arrow_array::{ArrayRef, Int32Array, RecordBatch};
use arrow_schema::{DataType, Field, Schema, TimeUnit};
use parquet::arrow::arrow_reader::{ArrowReaderOptions, ParquetRecordBatchReaderBuilder};
use parquet::arrow::ArrowWriter;
// Write data - schema is inferred from the data to be Int32
let file = tempfile().unwrap();
let batch = RecordBatch::try_from_iter(vec![
("col_1", Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef),
]).unwrap();
let mut writer = ArrowWriter::try_new(file.try_clone().unwrap(), batch.schema(), None).unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
// Read the file back.
// Supply a schema that interprets the Int32 column as a Timestamp.
let supplied_schema = Arc::new(Schema::new(vec![
Field::new("col_1", DataType::Timestamp(TimeUnit::Nanosecond, None), false)
]));
let options = ArrowReaderOptions::new().with_schema(supplied_schema.clone());
let mut builder = ParquetRecordBatchReaderBuilder::try_new_with_options(
file.try_clone().unwrap(),
options
).expect("Error if the schema is not compatible with the parquet file schema.");
// Create the reader and read the data using the supplied schema.
let mut reader = builder.build().unwrap();
let _batch = reader.next().unwrap().unwrap();
Sourcepub fn with_page_index(self, page_index: bool) -> Self
pub fn with_page_index(self, page_index: bool) -> Self
Enable reading PageIndex
, if present (defaults to false
)
The PageIndex
can be used to push down predicates to the parquet scan,
potentially eliminating unnecessary IO, by some query engines.
If this is enabled, ParquetMetaData::column_index
and
ParquetMetaData::offset_index
will be populated if the corresponding
information is present in the file.
Trait Implementations§
Source§impl Clone for ArrowReaderOptions
impl Clone for ArrowReaderOptions
Source§fn clone(&self) -> ArrowReaderOptions
fn clone(&self) -> ArrowReaderOptions
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreSource§impl Debug for ArrowReaderOptions
impl Debug for ArrowReaderOptions
Source§impl Default for ArrowReaderOptions
impl Default for ArrowReaderOptions
Source§fn default() -> ArrowReaderOptions
fn default() -> ArrowReaderOptions
Auto Trait Implementations§
impl Freeze for ArrowReaderOptions
impl RefUnwindSafe for ArrowReaderOptions
impl Send for ArrowReaderOptions
impl Sync for ArrowReaderOptions
impl Unpin for ArrowReaderOptions
impl UnwindSafe for ArrowReaderOptions
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more