pyarrow.dataset.Expression#

class pyarrow.dataset.Expression#

Bases: _Weakrefable

A logical expression to be evaluated against some input.

To create an expression:

  • Use the factory function pyarrow.compute.scalar() to create a scalar (not necessary when combined, see example below).

  • Use the factory function pyarrow.compute.field() to reference a field (column in table).

  • Compare fields and scalars with <, <=, ==, >=, >.

  • Combine expressions using python operators & (logical and), | (logical or) and ~ (logical not). Note: python keywords and, or and not cannot be used to combine expressions.

  • Create expression predicates using Expression methods such as pyarrow.compute.Expression.isin().

Examples

>>> import pyarrow.compute as pc
>>> (pc.field("a") < pc.scalar(3)) | (pc.field("b") > 7)
<pyarrow.compute.Expression ((a < 3) or (b > 7))>
>>> pc.field('a') != 3
<pyarrow.compute.Expression (a != 3)>
>>> pc.field('a').isin([1, 2, 3])
<pyarrow.compute.Expression is_in(a, {value_set=int64:[
  1,
  2,
  3
], null_matching_behavior=MATCH})>
__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

cast(self[, type, safe, options])

Explicitly set or change the expression's data type.

equals(self, Expression other)

Parameters:

from_substrait(message)

Deserialize an expression from Substrait

is_nan(self)

Check whether the expression is NaN.

is_null(self, bool nan_is_null=False)

Check whether the expression is null.

is_valid(self)

Check whether the expression is not-null (valid).

isin(self, values)

Check whether the expression is contained in values.

to_substrait(self, Schema schema, ...)

Serialize the expression using Substrait

cast(self, type=None, safe=None, options=None)#

Explicitly set or change the expression’s data type.

This creates a new expression equivalent to calling the cast compute function on this expression.

Parameters:
typeDataType, default None

Type to cast array to.

safebool, default True

Whether to check for conversion errors such as overflow.

optionsCastOptions, default None

Additional checks pass by CastOptions

Returns:
castExpression
equals(self, Expression other)#
Parameters:
otherpyarrow.dataset.Expression
Returns:
bool
static from_substrait(message)#

Deserialize an expression from Substrait

The serialized message must be an ExtendedExpression message that has only a single expression. The name of the expression and the schema the expression was bound to will be ignored. Use pyarrow.substrait.deserialize_expressions if this information is needed or if the message might contain multiple expressions.

Parameters:
messagebytes or Buffer or a protobuf Message

The Substrait message to deserialize

Returns:
Expression

The deserialized expression

is_nan(self)#

Check whether the expression is NaN.

This creates a new expression equivalent to calling the is_nan compute function on this expression.

Returns:
is_nanExpression
is_null(self, bool nan_is_null=False)#

Check whether the expression is null.

This creates a new expression equivalent to calling the is_null compute function on this expression.

Parameters:
nan_is_nullbool, default False

Whether floating-point NaNs are considered null.

Returns:
is_nullExpression
is_valid(self)#

Check whether the expression is not-null (valid).

This creates a new expression equivalent to calling the is_valid compute function on this expression.

Returns:
is_validExpression
isin(self, values)#

Check whether the expression is contained in values.

This creates a new expression equivalent to calling the is_in compute function on this expression.

Parameters:
valuesArray or iterable

The values to check for.

Returns:
isinExpression

A new expression that, when evaluated, checks whether this expression’s value is contained in values.

to_substrait(self, Schema schema, bool allow_arrow_extensions=False)#

Serialize the expression using Substrait

The expression will be serialized as an ExtendedExpression message that has a single expression named “expression”

Parameters:
schemaSchema

The input schema the expression will be bound to

allow_arrow_extensionsbool, default False

If False then only functions that are part of the core Substrait function definitions will be allowed. Set this to True to allow pyarrow-specific functions but the result may not be accepted by other compute libraries.

Returns:
Buffer

A buffer containing the serialized Protobuf plan.