The following environment variables can be used to affect the behavior of Arrow C++ at runtime. Many of these variables are inspected only once per process (for example, when the Arrow C++ DLL is loaded), so you cannot assume that changing their value later will have an effect.
Arrow C++’s Acero module performs computation on streams of data. This computation may involve a form of “type punning” that is technically undefined behavior if the underlying array is not properly aligned. On most modern CPUs this is not an issue, but some older CPUs may crash or suffer poor performance. For this reason it is recommended that all incoming array buffers are properly aligned, but some data sources such as Flight may produce unaligned buffers.
The value of this environment variable controls what will happen when Acero detects an unaligned buffer:
warn: a warning is emitted
ignore: nothing, alignment checking is disabled
reallocate: the buffer is reallocated to a properly aligned address
error: the operation fails with an error
The default behavior is
warn. On modern hardware it is usually safe to change this to
ignore. Changing to
reallocateis the safest option but this will have a significant performance impact as the buffer will need to be copied.
Enable rudimentary memory checks to guard against buffer overflows. The value of this environment variable selects the behavior when a buffer overflow is detected:
abortexits the processus with a non-zero return value;
trapissues a platform-specific debugger breakpoint / trap instruction;
warnprints a warning on stderr and continues execution;
If this variable is not set, or has empty an value, memory checks are disabled.
Override the default number of threads for the global IO thread pool. The value of this environment variable should be a positive integer.
The directory containing the C HDFS library (
libhdfs.soon other platforms). Alternatively, one can set
The backend where to export OpenTelemetry-based execution traces. Possible values are:
ostream: emit textual log messages to stdout;
otlp_http: emit OTLP JSON encoded traces to a HTTP server (by default, the endpoint URL is “http://localhost:4318/v1/traces”);
arrow_otlp_stdout: emit JSON traces to stdout;
arrow_otlp_stderr: emit JSON traces to stderr.
If this variable is not set, no traces are exported.
This environment variable has no effect if Arrow C++ was not built with tracing enabled.
The SIMD optimization level to select. By default, Arrow C++ detects the capabilities of the current CPU at runtime and chooses the best execution paths based on that information. One can override the detection by setting this environment variable to a well-defined value. Supported values are:
NONEdisables any runtime-selected SIMD optimization;
SSE4_2enables any SSE2-based optimizations until SSE4.2 (included);
AVXenables any AVX-based optimizations and earlier;
AVX2enables any AVX2-based optimizations and earlier;
AVX512enables any AVX512-based optimizations and earlier.
This environment variable only has an effect on x86 platforms. Other platforms currently do not implement any form of runtime dispatch.
In addition to runtime dispatch, the compile-time SIMD level can be set using the
ARROW_SIMD_LEVELCMake configuration variable. Unlike runtime dispatch, compile-time SIMD optimizations cannot be changed at runtime (for example, if you compile Arrow C++ with AVX512 enabled, the resulting binary will only run on AVX512-enabled CPUs). Setting
ARROW_USER_SIMD_LEVEL=NONEprevents the execution of explicit SIMD optimization code, but it does not rule out the execution of compiler generated SIMD instructions. E.g., on x86_64 platform, Arrow is built with
ARROW_SIMD_LEVEL=SSE4_2by default. Compiler may generate SSE4.2 instructions from any C/C++ source code. On legacy x86_64 platforms do not support SSE4.2, Arrow binary may fail with SIGILL (Illegal Instruction). User must rebuild Arrow and PyArrow from scratch by setting cmake option
Endpoint URL used for S3-like storage, for example Minio or s3.scality.
The number of entries to keep in the Gandiva JIT compilation cache. The cache is in-memory and does not persist across processes.
The path to the Hadoop installation.
Set the path to the Java Runtime Environment installation. This may be required for HDFS support if Java is installed in a non-standard location.
The number of worker threads in the global (process-wide) CPU thread pool. If this environment variable is not defined, the available hardware concurrency is determined using a platform-specific routine.
An upper bound for the number of worker threads in the global (process-wide) CPU thread pool.
For example, if the current machine has 4 hardware threads and
OMP_THREAD_LIMITis 8, the global CPU thread pool will have 4 worker threads. But if
OMP_THREAD_LIMITis 2, the global CPU thread pool will have 2 worker threads.