pyarrow.substrait.run_query¶
- pyarrow.substrait.run_query(plan, *, table_provider=None, use_threads=True)¶
Execute a Substrait plan and read the results as a RecordBatchReader.
- Parameters:
- plan
Union
[Buffer
,bytes
] The serialized Substrait plan to execute.
- table_providerobject (optional)
A function to resolve any NamedTable relation to a table. The function will receive two arguments which will be a list of strings representing the table name and a pyarrow.Schema representing the expected schema and should return a pyarrow.Table.
- use_threadsbool, default
True
If True then multiple threads will be used to run the query. If False then all CPU intensive work will be done on the calling thread.
- plan
- Returns:
- RecordBatchReader
A reader containing the result of the executed query
Examples
>>> import pyarrow as pa >>> from pyarrow.lib import tobytes >>> import pyarrow.substrait as substrait >>> test_table_1 = pa.Table.from_pydict({"x": [1, 2, 3]}) >>> test_table_2 = pa.Table.from_pydict({"x": [4, 5, 6]}) >>> def table_provider(names, schema): ... if not names: ... raise Exception("No names provided") ... elif names[0] == "t1": ... return test_table_1 ... elif names[1] == "t2": ... return test_table_2 ... else: ... raise Exception("Unrecognized table name") ... >>> substrait_query = ''' ... { ... "relations": [ ... {"rel": { ... "read": { ... "base_schema": { ... "struct": { ... "types": [ ... {"i64": {}} ... ] ... }, ... "names": [ ... "x" ... ] ... }, ... "namedTable": { ... "names": ["t1"] ... } ... } ... }} ... ] ... } ... ''' >>> buf = pa._substrait._parse_json_plan(tobytes(substrait_query)) >>> reader = pa.substrait.run_query(buf, table_provider=table_provider) >>> reader.read_all() pyarrow.Table x: int64 ---- x: [[1,2,3]]