datafusion.DataFrame

class datafusion.DataFrame

Bases: object

A DataFrame is a representation of a logical plan and an API to compose statements. Use it to build a plan and .collect() to execute the plan and collect the result. The actual execution of a plan runs natively on Rust and Arrow on a multi-threaded environment.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__

Initialize self.

aggregate

Aggregates using expressions

collect

Executes the plan, returning a list of `RecordBatch`es.Unless some order is specified in the plan, there is no guarantee of the order of the result.

filter

Filter according to the predicate expression

join

Returns the join of two DataFrames on.

limit

Limits the plan to return at most count rows

select

Select expressions from the existing DataFrame.

show

Print the result, 20 lines by default

sort

Sort by specified sorting expressions

aggregate()

Aggregates using expressions

collect()

Executes the plan, returning a list of `RecordBatch`es. Unless some order is specified in the plan, there is no guarantee of the order of the result

filter()

Filter according to the predicate expression

join()

Returns the join of two DataFrames on.

limit()

Limits the plan to return at most count rows

select()

Select expressions from the existing DataFrame.

show()

Print the result, 20 lines by default

sort()

Sort by specified sorting expressions