pub struct ParquetObjectReader {
store: Arc<dyn ObjectStore>,
path: Path,
file_size: Option<usize>,
metadata_size_hint: Option<usize>,
preload_column_index: bool,
preload_offset_index: bool,
runtime: Option<Handle>,
}
Expand description
Reads Parquet files in object storage using [ObjectStore
].
// Populate configuration from environment
let storage_container = Arc::new(MicrosoftAzureBuilder::from_env().build().unwrap());
let location = Path::from("path/to/blob.parquet");
let meta = storage_container.head(&location).await.unwrap();
println!("Found Blob with {}B at {}", meta.size, meta.location);
// Show Parquet metadata
let reader = ParquetObjectReader::new(storage_container, meta.location).with_file_size(meta.size.try_into().unwrap());
let builder = ParquetRecordBatchStreamBuilder::new(reader).await.unwrap();
print_parquet_metadata(&mut stdout(), builder.metadata());
Fields§
§store: Arc<dyn ObjectStore>
§path: Path
§file_size: Option<usize>
§metadata_size_hint: Option<usize>
§preload_column_index: bool
§preload_offset_index: bool
§runtime: Option<Handle>
Implementations§
Source§impl ParquetObjectReader
impl ParquetObjectReader
Sourcepub fn new(store: Arc<dyn ObjectStore>, path: Path) -> Self
pub fn new(store: Arc<dyn ObjectStore>, path: Path) -> Self
Creates a new ParquetObjectReader
for the provided [ObjectStore
] and [Path
].
Provide a hint as to the size of the parquet file’s footer, see fetch_parquet_metadata
Sourcepub fn with_file_size(self, file_size: usize) -> Self
pub fn with_file_size(self, file_size: usize) -> Self
Provide the byte size of this file.
If provided, the file size will ensure that only bounded range requests are used. If file size is not provided, the reader will use suffix range requests to fetch the metadata.
Providing this size up front is an important optimization to avoid extra calls when the underlying store does not support suffix range requests.
The file size can be obtained using [ObjectStore::list
] or [ObjectStore::head
].
Sourcepub fn with_preload_column_index(self, preload_column_index: bool) -> Self
pub fn with_preload_column_index(self, preload_column_index: bool) -> Self
Load the Column Index as part of Self::get_metadata
Sourcepub fn with_preload_offset_index(self, preload_offset_index: bool) -> Self
pub fn with_preload_offset_index(self, preload_offset_index: bool) -> Self
Load the Offset Index as part of Self::get_metadata
Sourcepub fn with_runtime(self, handle: Handle) -> Self
pub fn with_runtime(self, handle: Handle) -> Self
Perform IO on the provided tokio runtime
Tokio is a cooperative scheduler, and relies on tasks yielding in a timely manner to service IO. Therefore, running IO and CPU-bound tasks, such as parquet decoding, on the same tokio runtime can lead to degraded throughput, dropped connections and other issues. For more information see here.
fn spawn<F, O, E>(&self, f: F) -> BoxFuture<'_, Result<O>>
Trait Implementations§
Source§impl AsyncFileReader for ParquetObjectReader
impl AsyncFileReader for ParquetObjectReader
Source§fn get_bytes(&mut self, range: Range<usize>) -> BoxFuture<'_, Result<Bytes>>
fn get_bytes(&mut self, range: Range<usize>) -> BoxFuture<'_, Result<Bytes>>
range
Source§fn get_byte_ranges(
&mut self,
ranges: Vec<Range<usize>>,
) -> BoxFuture<'_, Result<Vec<Bytes>>>where
Self: Send,
fn get_byte_ranges(
&mut self,
ranges: Vec<Range<usize>>,
) -> BoxFuture<'_, Result<Vec<Bytes>>>where
Self: Send,
get_bytes
sequentiallySource§fn get_metadata<'a>(
&'a mut self,
options: Option<&'a ArrowReaderOptions>,
) -> BoxFuture<'a, Result<Arc<ParquetMetaData>>>
fn get_metadata<'a>( &'a mut self, options: Option<&'a ArrowReaderOptions>, ) -> BoxFuture<'a, Result<Arc<ParquetMetaData>>>
ParquetMetaData
of a parquet file,
allowing fine-grained control over how metadata is sourced, in particular allowing
for caching, pre-fetching, catalog metadata, etc…
ArrowReaderOptions may be provided to supply decryption parametersSource§impl Clone for ParquetObjectReader
impl Clone for ParquetObjectReader
Source§fn clone(&self) -> ParquetObjectReader
fn clone(&self) -> ParquetObjectReader
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreSource§impl Debug for ParquetObjectReader
impl Debug for ParquetObjectReader
Source§impl MetadataSuffixFetch for &mut ParquetObjectReader
impl MetadataSuffixFetch for &mut ParquetObjectReader
Auto Trait Implementations§
impl Freeze for ParquetObjectReader
impl !RefUnwindSafe for ParquetObjectReader
impl Send for ParquetObjectReader
impl Sync for ParquetObjectReader
impl Unpin for ParquetObjectReader
impl !UnwindSafe for ParquetObjectReader
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more