Announcing Apache Arrow DataFusion is now Apache DataFusion


Published 07 May 2024
By The Apache Arrow PMC (pmc)

Introduction

TLDR; Apache Arrow DataFusion –> Apache DataFusion

The Arrow PMC and newly created DataFusion PMC are happy to announce that as of April 16, 2024 the Apache Arrow DataFusion subproject is now a top level Apache Software Foundation project.

Background

Apache DataFusion is a fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format.

When DataFusion was donated to the Apache Software Foundation in 2019, the DataFusion community was not large enough to stand on its own and the Arrow project agreed to help support it. The community has grown significantly since 2019, benefiting immensely from being part of Arrow and following The Apache Way.

Why now?

The community discussed graduating to a top level project publicly for almost a year, as the project seemed ready to stand on its own and would benefit from more focused governance. For example, earlier in DataFusion’s life many contributed to both arrow-rs and DataFusion, but as DataFusion has matured many contributors, committers and PMC members focused more and more exclusively on DataFusion.

Looking forward

The future looks bright. There are now 10s of known projects built with DataFusion, and that number continues to grow. We recently held our first in person meetup passed 5000 stars on GitHub, wrote a paper that was accepted at SIGMOD 2024, and began work on Comet, an Apache Spark accelerator initially donated by Apple.

Thank you to everyone in the Arrow community who helped DataFusion grow and mature over the years, and we look forward to continuing our collaboration as projects. All future blogs and announcements will be posted on the Apache DataFusion website.

Get Involved

If you are interested in joining the community, we would love to have you join us. Get in touch using Communication Doc and learn how to get involved in the Contributor Guide. We welcome everyone to try DataFusion on their own data and projects and let us know how it goes, contribute suggestions, documentation, bug reports, or a PR with documentation, tests or code.