arrow

Module pyarrow

Source
Expand description

Pass Arrow objects from and to PyArrow, using Arrow’s C Data Interface and pyo3.

For underlying implementation, see the ffi module.

One can use these to write Python functions that take and return PyArrow objects, with automatic conversion to corresponding arrow-rs types.

#[pyfunction]
fn double_array(array: PyArrowType<ArrayData>) -> PyResult<PyArrowType<ArrayData>> {
    let array = array.0; // Extract from PyArrowType wrapper
    let array: Arc<dyn Array> = make_array(array); // Convert ArrayData to ArrayRef
    let array: &Int32Array = array.as_any().downcast_ref()
        .ok_or_else(|| PyValueError::new_err("expected int32 array"))?;
    let array: Int32Array = array.iter().map(|x| x.map(|x| x * 2)).collect();
    Ok(PyArrowType(array.into_data()))
}
pyarrow typearrow-rs type
pyarrow.DataTypeDataType
pyarrow.FieldField
pyarrow.SchemaSchema
pyarrow.ArrayArrayData
pyarrow.RecordBatchRecordBatch
pyarrow.RecordBatchReaderArrowArrayStreamReader / Box<dyn RecordBatchReader + Send> (1)

(1) pyarrow.RecordBatchReader can be imported as ArrowArrayStreamReader. Either ArrowArrayStreamReader or Box<dyn RecordBatchReader + Send> can be exported as pyarrow.RecordBatchReader. (Box<dyn RecordBatchReader + Send> is typically easier to create.)

PyArrow has the notion of chunked arrays and tables, but arrow-rs doesn’t have these same concepts. A chunked table is instead represented with Vec<RecordBatch>. A pyarrow.Table can be imported to Rust by calling pyarrow.Table.to_reader() and then importing the reader as a ArrowArrayStreamReader.

Structs§

Traits§

  • Trait for converting Python objects to arrow-rs types.
  • Convert an arrow-rs type into a PyArrow object.
  • Create a new PyArrow object from a arrow-rs type.

Functions§

Type Aliases§