This class enables you to interact with Parquet files.
Factory
The ParquetFileReader$create()
factory method instantiates the object and
takes the following arguments:
file
A character file name, raw vector, or Arrow file connection object (e.g.RandomAccessFile
).props
Optional ParquetArrowReaderPropertiesmmap
Logical: whether to memory-map the file (defaultTRUE
)...
Additional arguments, currently ignored
Methods
$ReadTable(column_indices)
: get anarrow::Table
from the file. The optionalcolumn_indices=
argument is a 0-based integer vector indicating which columns to retain.$ReadRowGroup(i, column_indices)
: get anarrow::Table
by reading thei
th row group (0-based). The optionalcolumn_indices=
argument is a 0-based integer vector indicating which columns to retain.$ReadRowGroups(row_groups, column_indices)
: get anarrow::Table
by reading several row groups (0-based integers). The optionalcolumn_indices=
argument is a 0-based integer vector indicating which columns to retain.$GetSchema()
: get thearrow::Schema
of the data in the file$ReadColumn(i)
: read thei
th column (0-based) as a ChunkedArray.
Active bindings
$num_rows
: number of rows.$num_columns
: number of columns.$num_row_groups
: number of row groups.
Examples
f <- system.file("v0.7.1.parquet", package = "arrow")
pq <- ParquetFileReader$create(f)
pq$GetSchema()
#> Schema
#> carat: double
#> cut: string
#> color: string
#> clarity: string
#> depth: double
#> table: double
#> price: int64
#> x: double
#> y: double
#> z: double
#> __index_level_0__: int64
#>
#> See $metadata for additional Schema metadata
if (codec_is_available("snappy")) {
# This file has compressed data columns
tab <- pq$ReadTable()
tab$schema
}
#> Schema
#> carat: double
#> cut: string
#> color: string
#> clarity: string
#> depth: double
#> table: double
#> price: int64
#> x: double
#> y: double
#> z: double
#> __index_level_0__: int64
#>
#> See $metadata for additional Schema metadata