Arrow Datasets allow you to query against data that has been split across multiple files. This sharding of data may indicate partitioning, which can accelerate queries that only touch some partitions (files).
Dataset contains one or more
Fragments, such as files, of potentially
differing type and partitioning.
open_dataset(), which is an alias for it.
DatasetFactory is used to provide finer control over the creation of
DatasetFactory is used to create a
Dataset, inspect the Schema of the
fragments contained in it, and declare a partitioning.
FileSystemDatasetFactory is a subclass of
discovering files in the local file system, the only currently supported
DatasetFactory$create() factory method, see
alias for it. A
TRUE, all fragments
will be scanned and a unified Schema will be created from them; if
(default), only the first fragment will be inspected for its schema. Use this
fast path when you know and trust that all fragments have an identical schema.
$Finish(schema, unify_schemas): Returns a
schema is provided,
it will be used for the
Dataset; if omitted, a
Schema will be created from
inspecting the fragments (files) in the dataset, following
as described above.
FileSystemDatasetFactory$create() is a lower-level factory method and
takes the following arguments:
filesystem: A FileSystem
selector: Either a FileSelector or
paths: Either a character vector of file paths or
format: A FileFormat
Dataset has the following methods:
$NewScan(): Returns a ScannerBuilder for building a query
$schema: Active binding that returns the Schema of the Dataset; you
may also replace the dataset's schema by using
ds$schema <- new_schema.
This method currently supports only adding, removing, or reordering
fields in the schema: you cannot alter or cast the field types.
FileSystemDataset has the following methods:
$files: Active binding, returns the files of the
$format: Active binding, returns the FileFormat of the
UnionDataset has the following methods:
$children: Active binding, returns all child
open_dataset() for a simple interface to creating a