pyarrow.dataset.Fragment#
- class pyarrow.dataset.Fragment#
Bases:
pyarrow.lib._Weakrefable
Fragment of data from a Dataset.
- __init__(*args, **kwargs)#
Methods
__init__
(*args, **kwargs)count_rows
(self, **kwargs)Count rows matching the scanner filter.
head
(self, int num_rows, **kwargs)Load the first N rows of the fragment.
scanner
(self, Schema schema=None, **kwargs)Build a scan operation against the fragment.
take
(self, indices, **kwargs)Select rows of data by index.
to_batches
(self, Schema schema=None, **kwargs)Read the fragment as materialized record batches.
to_table
(self, Schema schema=None, **kwargs)Convert this Fragment into a Table.
Attributes
An Expression which evaluates to true for all data viewed by this Fragment.
Return the physical schema of this Fragment.
- count_rows(self, **kwargs)#
Count rows matching the scanner filter.
- head(self, int num_rows, **kwargs)#
Load the first N rows of the fragment.
- partition_expression#
An Expression which evaluates to true for all data viewed by this Fragment.
- physical_schema#
Return the physical schema of this Fragment. This schema can be different from the dataset read schema.
- scanner(self, Schema schema=None, **kwargs)#
Build a scan operation against the fragment.
Data is not loaded immediately. Instead, this produces a Scanner, which exposes further operations (e.g. loading all data as a table, counting rows).
- take(self, indices, **kwargs)#
Select rows of data by index.
- Parameters
- indices
Array
orarray-like
The indices of row to select in the dataset.
- **kwargs
dict
, optional Arguments for Scanner.from_fragment.
- indices
- Returns
- to_batches(self, Schema schema=None, **kwargs)#
Read the fragment as materialized record batches.
- Parameters
- Returns
- record_batchesiterator of
RecordBatch
- record_batchesiterator of
- to_table(self, Schema schema=None, **kwargs)#
Convert this Fragment into a Table.
Use this convenience utility with care. This will serially materialize the Scan result in memory before creating the Table.