Andy Grove
@andygrove.io
2.7K followers 82 following 48 posts
Apache Arrow & DataFusion PMC Member. Original creator of Apache DataFusion.
Posts Media Videos Starter Packs
andygrove.io
It’s steak night tonight and our dog is patiently waiting for her share.
andygrove.io
I like the name “RAD stack” for this.
sap1ens.com
Today, in the Data Streaming Journey, I'm sharing my experience of building data streaming products with the RAD stack: Rust, Arrow, DataFusion.

www.streamingdata.tech/p/streaming-...
Streaming and the RAD Stack
RAD: Rust, Arrow, DataFusion
www.streamingdata.tech
Reposted by Andy Grove
sap1ens.com
Introducing Iron Vector: native, columnar, vectorized, high-performance accelerator for Apache Flink SQL and Table API built on top of Rust, Arrow and DataFusion.

Reduce your Flink compute cost by up to 2x or handle 2x more data with the same infrastructure.
Reposted by Andy Grove
rust-lang.org
We received reports of a phishing campaign targeting crates​.io users. Do not click on links asking to authenticate to protect your account. More information: blog.rust-lang.org/2025/09/12/c...
crates.io phishing campaign | Rust Blog
Empowering everyone to build reliable and efficient software.
blog.rust-lang.org
Reposted by Andy Grove
andrewlamb1111.bsky.social
Thanks to @clflushopt.bsky.social, make massive TPCH datasets with tpchgen-cli 2.0:

SF1000 (1TB raw, 220GB in @ApacheParquet ) in less than 10 mins (6m45s) on aging laptop

Try it now:

pip install tpchgen-cli
tpchgen-cli --scale-factor 1000 --parts 100 --format=parquet

github.com/clflushopt/t...
Reposted by Andy Grove
eatonphil.bsky.social
I've been helping our analytics team integrate our DataFusion-based query engine for Postgres into EDB Postgres Distributed and finally here's an end-to-end demo.

You get HA Postgres plus seamless replication and DataFusion-based queries. This query turned out 6x faster than PG.
andygrove.io
How my day is going
andygrove.io
We now have a roadmap section in the Comet contributor guide, in case anyone was wondering what we are focusing on lately and what features will be arriving in future releases.

datafusion.apache.org/comet/contri...
Comet Roadmap — Apache DataFusion Comet documentation
datafusion.apache.org
Reposted by Andy Grove
ifesdjeen.bsky.social
Cassandra Team at Apple is searching for a fresh grad / person early in their career to join our ranks in SF/Bay Area!

Come work on super interesting problems with world class team. Help us build better Cassandra!

Ping me if you’re interested!

jobs.apple.com/en-us/detail...
Software Engineer, ASE Cassandra Storage - Jobs - Careers at Apple
Apply for a Software Engineer, ASE Cassandra Storage job at Apple. Read about the role and find out if it’s right for you.
jobs.apple.com
Reposted by Andy Grove
timsaucer.bsky.social
We're pleased to announce that Apache DataFusion in Python 46.0.0 is released! Since the last announcement post we've had a lot of great features and new contributors. Please check out the blog post with details.

datafusion.apache.org/blog/2025/03...

#DataFusion #Python #DataFrame #PyData #Apache
Apache DataFusion Python 46.0.0 Released - Apache DataFusion Blog
datafusion.apache.org
andygrove.io
We have TPC-H benchmarks for single node with a small scale factor in the contributors guide. We only benchmark against Spark though and not against Spark RAPIDS.

datafusion.apache.org/comet/contri...
Apache DataFusion Comet: Benchmarks Derived From TPC-H — Apache DataFusion Comet documentation
datafusion.apache.org
andygrove.io
Here's the blog post announcing Comet 0.7.0

datafusion.apache.org/blog/2025/03...
andygrove.io
I hate to say it, but "it depends". I'd recommend running your own benchmarks for your specific workloads. Performance will also vary greatly by environment (number of CPUs vs GPUs, different GPU types, and so on).
andygrove.io
DataFusion Comet 0.7.0 is now available in Maven. We'll be publishing a blog post next week with all the details.

The repo has been updated with the latest benchmark results. For single executor TPC-H @ 100 GB, we now see a 2.2x increase over Spark (up from 2x in 0.6.0).

github.com/apache/dataf...
GitHub - apache/datafusion-comet: Apache DataFusion Comet Spark Accelerator
Apache DataFusion Comet Spark Accelerator. Contribute to apache/datafusion-comet development by creating an account on GitHub.
github.com
andygrove.io
One month on, and I have zero regrets about quitting Facebook & Instagram.

I have replaced the scrolling time with listening to podcasts.

I now stay in touch with family overseas via email and photo sharing, and I use Snapchat for sharing photos with immediate family, privately. Works great.
andygrove.io
I've finally decided to quit using Facebook. My feed is overwhelmed with nonsense content that I am not interested in and cannot seem to block.

It is a real shame, though, because it was a good way to stay connected with family.

Is there a viable alternative? What are others using instead?
andygrove.io
Comet 0.6.0 has been released. This is a smaller release than usual now that we have moved to an approximately monthly release cadence to match core DataFusion.

datafusion.apache.org/blog/2025/02...
Apache DataFusion Comet 0.6.0 Release - Apache DataFusion Blog
datafusion.apache.org
andygrove.io
Check out this excellent presentation from @robtandy.bsky.social on his work with the DataFusion Ray project from last week's DataFusion community meetup.

It is a great overview of how to build a distributed system on top of DataFusion.

www.youtube.com/watch?v=ceTo...
Apache DataFusion Community Meeting 2025/01/22 08:57 MST - Recording
YouTube video by Datadog
www.youtube.com
andygrove.io
Is this using Arrow and/or DataFusion? If so, our Discord is probably a good place to ask.

datafusion.apache.org/contributor-...
Communication — Apache DataFusion documentation
datafusion.apache.org
andygrove.io
I've finally decided to quit using Facebook. My feed is overwhelmed with nonsense content that I am not interested in and cannot seem to block.

It is a real shame, though, because it was a good way to stay connected with family.

Is there a viable alternative? What are others using instead?