datafusion.DataFrame

class datafusion.DataFrame

Bases: object

A PyDataFrame is a representation of a logical plan and an API to compose statements. Use it to build a plan and .collect() to execute the plan and collect the result. The actual execution of a plan runs natively on Rust and Arrow on a multi-threaded environment.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__

Initialize self.

aggregate

collect

Executes the plan, returning a list of `RecordBatch`es.Unless some order is specified in the plan, there is no guarantee of the order of the result..

filter

join

limit

schema

Returns the schema from the logical plan

select

show

Print the result, 20 lines by default

sort

aggregate()
collect()

Executes the plan, returning a list of `RecordBatch`es. Unless some order is specified in the plan, there is no guarantee of the order of the result.

filter()
join()
limit()
schema()

Returns the schema from the logical plan

select()
show()

Print the result, 20 lines by default

sort()