Configuration Settings¶
The following configuration options can be passed to SessionConfig
to control various aspects of query execution.
For applications which do not expose SessionConfig
, like datafusion-cli
, these options may also be set via environment variables.
To construct a session with options from the environment, use SessionConfig::from_env
.
The name of the environment variable is the option’s key, transformed to uppercase and with periods replaced with underscores.
For example, to configure datafusion.execution.batch_size
you would set the DATAFUSION_EXECUTION_BATCH_SIZE
environment variable.
Values are parsed according to the same rules used in casts from Utf8.
If the value in the environment variable cannot be cast to the type of the configuration option, the default value will be used instead and a warning emitted.
Environment variables are read during SessionConfig
initialisation so they must be set beforehand and will not affect running sessions.
key |
type |
default |
description |
---|---|---|---|
datafusion.execution.batch_size |
UInt64 |
8192 |
Default batch size while creating new batches, it’s especially useful for buffer-in-memory batches since creating tiny batches would results in too much metadata memory consumption. |
datafusion.execution.coalesce_batches |
Boolean |
true |
When set to true, record batches will be examined between each operator and small batches will be coalesced into larger batches. This is helpful when there are highly selective filters or joins that could produce tiny output batches. The target batch size is determined by the configuration setting ‘datafusion.execution.coalesce_target_batch_size’. |
datafusion.execution.coalesce_target_batch_size |
UInt64 |
4096 |
Target batch size when coalescing batches. Uses in conjunction with the configuration setting ‘datafusion.execution.coalesce_batches’. |
datafusion.optimizer.filter_null_join_keys |
Boolean |
false |
When set to true, the optimizer will insert filters before a join between a nullable and non-nullable column to filter out nulls on the nullable side. This filter can add additional overhead when the file format does not fully support predicate push down. |