pyarrow.dataset.Expression#
- class pyarrow.dataset.Expression#
Bases:
_Weakrefable
A logical expression to be evaluated against some input.
To create an expression:
Use the factory function
pyarrow.compute.scalar()
to create a scalar (not necessary when combined, see example below).Use the factory function
pyarrow.compute.field()
to reference a field (column in table).Compare fields and scalars with
<
,<=
,==
,>=
,>
.Combine expressions using python operators
&
(logical and),|
(logical or) and~
(logical not). Note: python keywordsand
,or
andnot
cannot be used to combine expressions.Create expression predicates using Expression methods such as
pyarrow.compute.Expression.isin()
.
Examples
>>> import pyarrow.compute as pc >>> (pc.field("a") < pc.scalar(3)) | (pc.field("b") > 7) <pyarrow.compute.Expression ((a < 3) or (b > 7))> >>> pc.field('a') != 3 <pyarrow.compute.Expression (a != 3)> >>> pc.field('a').isin([1, 2, 3]) <pyarrow.compute.Expression is_in(a, {value_set=int64:[ 1, 2, 3 ], null_matching_behavior=MATCH})>
- __init__(*args, **kwargs)#
Methods
__init__
(*args, **kwargs)cast
(self[, type, safe, options])Explicitly set or change the expression's data type.
equals
(self, Expression other)- Parameters:
from_substrait
(buffer)Deserialize an expression from Substrait
is_nan
(self)Check whether the expression is NaN.
is_null
(self, bool nan_is_null=False)Check whether the expression is null.
is_valid
(self)Check whether the expression is not-null (valid).
isin
(self, values)Check whether the expression is contained in values.
to_substrait
(self, Schema schema, ...)Serialize the expression using Substrait
- cast(self, type=None, safe=None, options=None)#
Explicitly set or change the expression’s data type.
This creates a new expression equivalent to calling the cast compute function on this expression.
- Parameters:
- Returns:
- cast
Expression
- cast
- equals(self, Expression other)#
- Parameters:
- Returns:
- static from_substrait(buffer)#
Deserialize an expression from Substrait
The serialized message must be an ExtendedExpression message that has only a single expression. The name of the expression and the schema the expression was bound to will be ignored. Use pyarrow.substrait.deserialize_expressions if this information is needed or if the message might contain multiple expressions.
- Parameters:
- buffer
bytes
orBuffer
The Substrait message to deserialize
- buffer
- Returns:
Expression
The deserialized expression
- is_nan(self)#
Check whether the expression is NaN.
This creates a new expression equivalent to calling the is_nan compute function on this expression.
- Returns:
- is_nan
Expression
- is_nan
- is_null(self, bool nan_is_null=False)#
Check whether the expression is null.
This creates a new expression equivalent to calling the is_null compute function on this expression.
- Parameters:
- Returns:
- is_null
Expression
- is_null
- is_valid(self)#
Check whether the expression is not-null (valid).
This creates a new expression equivalent to calling the is_valid compute function on this expression.
- Returns:
- is_valid
Expression
- is_valid
- isin(self, values)#
Check whether the expression is contained in values.
This creates a new expression equivalent to calling the is_in compute function on this expression.
- Parameters:
- Returns:
- isin
Expression
A new expression that, when evaluated, checks whether this expression’s value is contained in values.
- isin
- to_substrait(self, Schema schema, bool allow_arrow_extensions=False)#
Serialize the expression using Substrait
The expression will be serialized as an ExtendedExpression message that has a single expression named “expression”
- Parameters:
- schema
Schema
The input schema the expression will be bound to
- allow_arrow_extensionsbool, default
False
If False then only functions that are part of the core Substrait function definitions will be allowed. Set this to True to allow pyarrow-specific functions but the result may not be accepted by other compute libraries.
- schema
- Returns:
Buffer
A buffer containing the serialized Protobuf plan.