This class enables you to interact with Parquet files.
Factory
The ParquetFileReader$create() factory method instantiates the object and
takes the following arguments:
- fileA character file name, raw vector, or Arrow file connection object (e.g.- RandomAccessFile).
- propsOptional ParquetArrowReaderProperties
- mmapLogical: whether to memory-map the file (default- TRUE)
- reader_propsOptional ParquetReaderProperties
- ...Additional arguments, currently ignored
Methods
- $ReadTable(column_indices): get an- arrow::Tablefrom the file. The optional- column_indices=argument is a 0-based integer vector indicating which columns to retain.
- $ReadRowGroup(i, column_indices): get an- arrow::Tableby reading the- ith row group (0-based). The optional- column_indices=argument is a 0-based integer vector indicating which columns to retain.
- $ReadRowGroups(row_groups, column_indices): get an- arrow::Tableby reading several row groups (0-based integers). The optional- column_indices=argument is a 0-based integer vector indicating which columns to retain.
- $GetSchema(): get the- arrow::Schemaof the data in the file
- $ReadColumn(i): read the- ith column (0-based) as a ChunkedArray.
Active bindings
- $num_rows: number of rows.
- $num_columns: number of columns.
- $num_row_groups: number of row groups.
Examples
f <- system.file("v0.7.1.parquet", package = "arrow")
pq <- ParquetFileReader$create(f)
pq$GetSchema()
#> Schema
#> carat: double
#> cut: string
#> color: string
#> clarity: string
#> depth: double
#> table: double
#> price: int64
#> x: double
#> y: double
#> z: double
#> __index_level_0__: int64
#> 
#> See $metadata for additional Schema metadata
if (codec_is_available("snappy")) {
  # This file has compressed data columns
  tab <- pq$ReadTable()
  tab$schema
}
#> Schema
#> carat: double
#> cut: string
#> color: string
#> clarity: string
#> depth: double
#> table: double
#> price: int64
#> x: double
#> y: double
#> z: double
#> __index_level_0__: int64
#> 
#> See $metadata for additional Schema metadata