pyarrow.substrait.run_query#

pyarrow.substrait.run_query(plan, *, table_provider=None, use_threads=True)#

Execute a Substrait plan and read the results as a RecordBatchReader.

Parameters:
planUnion[Buffer, bytes]

The serialized Substrait plan to execute.

table_providerobject (optional)

A function to resolve any NamedTable relation to a table. The function will receive two arguments which will be a list of strings representing the table name and a pyarrow.Schema representing the expected schema and should return a pyarrow.Table.

use_threadsbool, default True

If True then multiple threads will be used to run the query. If False then all CPU intensive work will be done on the calling thread.

Returns:
RecordBatchReader

A reader containing the result of the executed query

Examples

>>> import pyarrow as pa
>>> from pyarrow.lib import tobytes
>>> import pyarrow.substrait as substrait
>>> test_table_1 = pa.Table.from_pydict({"x": [1, 2, 3]})
>>> test_table_2 = pa.Table.from_pydict({"x": [4, 5, 6]})
>>> def table_provider(names, schema):
...     if not names:
...        raise Exception("No names provided")
...     elif names[0] == "t1":
...        return test_table_1
...     elif names[1] == "t2":
...        return test_table_2
...     else:
...        raise Exception("Unrecognized table name")
...
>>> substrait_query = '''
...         {
...             "relations": [
...             {"rel": {
...                 "read": {
...                 "base_schema": {
...                     "struct": {
...                     "types": [
...                                 {"i64": {}}
...                             ]
...                     },
...                     "names": [
...                             "x"
...                             ]
...                 },
...                 "namedTable": {
...                         "names": ["t1"]
...                 }
...                 }
...             }}
...             ]
...         }
... '''
>>> buf = pa._substrait._parse_json_plan(tobytes(substrait_query))
>>> reader = pa.substrait.run_query(buf, table_provider=table_provider)
>>> reader.read_all()
pyarrow.Table
x: int64
----
x: [[1,2,3]]