pyarrow.dataset.Fragment#

class pyarrow.dataset.Fragment#

Bases: pyarrow.lib._Weakrefable

Fragment of data from a Dataset.

__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

count_rows(self, **kwargs)

Count rows matching the scanner filter.

head(self, int num_rows, **kwargs)

Load the first N rows of the fragment.

scanner(self, Schema schema=None, **kwargs)

Build a scan operation against the fragment.

take(self, indices, **kwargs)

Select rows of data by index.

to_batches(self, Schema schema=None, **kwargs)

Read the fragment as materialized record batches.

to_table(self, Schema schema=None, **kwargs)

Convert this Fragment into a Table.

Attributes

partition_expression

An Expression which evaluates to true for all data viewed by this Fragment.

physical_schema

Return the physical schema of this Fragment.

count_rows(self, **kwargs)#

Count rows matching the scanner filter.

Parameters
**kwargsdict, optional

Arguments for Scanner.from_fragment.

Returns
countint
head(self, int num_rows, **kwargs)#

Load the first N rows of the fragment.

Parameters
num_rowsint

The number of rows to load.

**kwargsdict, optional

Arguments for Scanner.from_fragment.

Returns
Table
partition_expression#

An Expression which evaluates to true for all data viewed by this Fragment.

physical_schema#

Return the physical schema of this Fragment. This schema can be different from the dataset read schema.

scanner(self, Schema schema=None, **kwargs)#

Build a scan operation against the fragment.

Data is not loaded immediately. Instead, this produces a Scanner, which exposes further operations (e.g. loading all data as a table, counting rows).

Parameters
schemaSchema

Schema to use for scanning. This is used to unify a Fragment to it’s Dataset’s schema. If not specified this will use the Fragment’s physical schema which might differ for each Fragment.

**kwargsdict, optional

Arguments for Scanner.from_fragment.

Returns
scannerScanner
take(self, indices, **kwargs)#

Select rows of data by index.

Parameters
indicesArray or array-like

The indices of row to select in the dataset.

**kwargsdict, optional

Arguments for Scanner.from_fragment.

Returns
Table
to_batches(self, Schema schema=None, **kwargs)#

Read the fragment as materialized record batches.

Parameters
schemaSchema, optional

Concrete schema to use for scanning.

**kwargsdict, optional

Arguments for Scanner.from_fragment.

Returns
record_batchesiterator of RecordBatch
to_table(self, Schema schema=None, **kwargs)#

Convert this Fragment into a Table.

Use this convenience utility with care. This will serially materialize the Scan result in memory before creating the Table.

Parameters
schemaSchema, optional

Concrete schema to use for scanning.

**kwargsdict, optional

Arguments for Scanner.from_fragment.

Returns
tableTable