Using DataFusion as a library

Default Configuration

DataFusion is published on crates.io, and is well documented on docs.rs.

To get started, add the following to your Cargo.toml file:

[dependencies]
datafusion = "5.1.0"

Optimized Configuration

For an optimized build several steps are required. First, use the below in your Cargo.toml. It is worth noting that using the settings in the [profile.release] section will significantly increase the build time.

[dependencies]
datafusion = { version = "5.0" , features = ["simd"]}
tokio = { version = "^1.0", features = ["rt-multi-thread"] }
snmalloc-rs = "0.2"

[profile.release]
lto = true
codegen-units = 1

Then, in main.rs. update the memory allocator with the below after your imports:

#[global_allocator]
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;

Finally, in order to build with the simd optimization cargo nightly is required. Based on the instruction set architecture you are building on you will want to configure the target-cpu as well, ideally with native or at least avx2.

RUSTFLAGS='-C target-cpu=native' cargo +nightly run --release