Andrew Lamb
andrewlamb1111.bsky.social
Andrew Lamb
@andrewlamb1111.bsky.social
Apache {DataFusion PMC}, Database Internals
Reposted by Andrew Lamb
Why rebuild the wheel? @olimpiupop.bsky.social talks with @andrewlamb1111.bsky.social about how Apache Arrow, Parquet, and the FDAP stack are letting database teams focus on innovation instead of reinventing the basics. youtu.be/Gd-mhbiy8Vo?...
Building Modern Databases with the FDAP Stack • Andrew Lamb & Olimpiu Pop • GOTO 2025
YouTube video by GOTO Conferences
youtu.be
November 24, 2025 at 2:05 PM
Building Modern Databases with the FDAP Stack • Andrew Lamb & Olimpiu Pop • GOTO 2025

www.youtube.com/watch?v=Gd-m...
Building Modern Databases with the FDAP Stack • Andrew Lamb & Olimpiu Pop • GOTO 2025
YouTube video by GOTO Conferences
www.youtube.com
November 24, 2025 at 2:21 PM
Does anyone know a good academic / industrial overview of how to implement (not use) LATERAL joins in SQL? It keeps coming up in DataFusion and I need to get reasonable background on it. github.com/apache/dataf...
November 23, 2025 at 1:05 PM
Save the date -- Wednesday July 22, 2026 for the first Apache DataFusion meetup in Denver: luma.com/jsu6faie
Denver Apache DataFusion Meetup · Luma
Join us for an evening of talks, panel discussion, and community discussion about Apache DataFusion and its growing role in modern data infrastructure. We will…
luma.com
November 23, 2025 at 11:14 AM
One fun nugget from the Boston
@apachedatafusion.bsky.social meetup on Wednesday: DataDog reports they run 68+million queries per hour with DataFusion
November 14, 2025 at 6:18 PM
Here is a nice examination of the benefits of building new systems using the extensibility of @apachedatafusion.bsky.social vs other systems. www.bauplanlabs.com/post/duck-hu...
Duck Hunt: moving Bauplan from DuckDB to DataFusion
Bauplan's journey from DuckDB to Apache DataFusion: how switching SQL engines doubled query performance on Iceberg lakehouses while enabling greater hackability
www.bauplanlabs.com
November 11, 2025 at 3:41 PM
Reposted by Andrew Lamb
Excited to be one of the attendees and present our work on the DataFusion-powered SedonaDB alongside a great lineup of talks! If you're in the Boston area come and say hi!
November 4, 2025 at 6:33 PM
"if you want to go fast, go alone; If you want to go far, go together"
New Apache Parquet Community page is up: parquet.apache.org/community/
November 7, 2025 at 8:06 PM
We are holding the next Apache DataFusion meetup next Wednesday Nov 12 in Boston. lu.ma/w9pw5rce
Boston Apache DataFusion Meetup · Luma
Join us for an evening of talks, panel discussion, and community discussion about Apache DataFusion and its growing role in modern data infrastructure. This…
lu.ma
November 4, 2025 at 6:05 PM
If anyone wants to know why Xiangpeng Hao is a great mentor, they can read this response: github.com/XiangpengHao...
November 3, 2025 at 8:16 PM
New version of Rust Apache Arrow and Apache Parquet is out -- includes new new metadata parser, new avro reader, geometry and variant support 🤯 arrow.apache.org/blog/2025/10...
Apache Arrow Rust 57.0.0 Release
The Apache Arrow team is pleased to announce that the v57.0.0 release of Apache Arrow Rust is now available on crates.io (arrow and parquet) and as source download. See the 57.0.0 changelog for a full...
arrow.apache.org
October 31, 2025 at 10:26 AM
I have heard from 3 people/projects in the last three days they are considering forks of iceberg-rust. I filed a ticket to see if we can figure out how to consolidate efforts: github.com/apache/icebe...
October 28, 2025 at 5:50 PM
Apache DataFusion's policy for AI assisted contribution:

AI is great, but not AI dumps: maintainers could finish the task faster by using AI directly, and the submitters gain little knowledge when acting as a pass through AI proxy.

datafusion.apache.org/contributor-...
Introduction — Apache DataFusion documentation
datafusion.apache.org
October 27, 2025 at 12:51 PM
Some Apache Parquet nerd humor for Friday afternoon

lists.apache.org/thread/36rdg...
October 24, 2025 at 8:24 PM
We made Apache Parquet metadata parsing 3x-9x faster in the latest release of the Rust implementation
arrow.apache.org/blog/2025/10...
October 24, 2025 at 9:55 AM
Reposted by Andrew Lamb
Today's Future Data Systems Seminar Speaker: Ian Cook (@ian.columnar.tech) will present @columnar.tech's work on Apache Arrow's database connectivity API (ADBC). ADBC is available in modern DBMSs. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] Where We're Going, We Don't Need Rows: Columnar Data Connectivity with ADBC - Carnegie Mellon Database Group
ADBC (Arrow Database Connectivity) is Apache Arrow’s answer to ODBC and JDBC:... Read More +
db.cs.cmu.edu
October 20, 2025 at 11:38 AM
More Products built with Apache DataFusion: Palantir Foundry's Pipeline Builder

www.palantir.com/docs/foundry...
October 21, 2025 at 7:52 PM
Prateek Gaur and co at Snowflake reproduced the (great) results for the ALP encoding algorithm from CWI / Azim Afroozeh / Peter Boncz

ALP achieves ZSTD levels of compression and much faster decode. We are discussing adding it to @ApacheParquet: lists.apache.org/thread/tjtln...
October 17, 2025 at 1:05 PM
The talk on Votex @db.cs.cmu.edu youtube.com/watch?v=zyn_... is a great one.

I think it would also be interesting to hear a counterpoint about
Apache Parquet that explains actual technical details of that format, the Cathedral vs Bizzaar management, options with Metadata, etc
Vortex: LLVM for File Formats (Will Manning)
YouTube video by CMU Database Group
youtube.com
October 15, 2025 at 12:57 PM
Our new thrift parser in the Rust Apache Parquet implementation is a 🎁 that keeps on giving performance wise 🚀 github.com/apache/arrow...

We are also working on a blog post that has a deeper explanation
October 10, 2025 at 6:52 PM
Yesterday I learned about the SpatialBench from Sedona github.com/apache/sedon...

Which they based on the tpchgen-rs project from @clflushopt.bsky.social github.com/clflushopt/t...

(BTW I a still looking for some more github watchers on tpchgen-rs so I can get it on homebrew)
October 9, 2025 at 5:38 PM
BTW if anyone wants a good intro to database storage / Log structured storage (aka LSM trees) @db.cs.cmu.edu lecture this fall is a good one: www.youtube.com/watch?v=2_sT...
#05 - Log-Structured Database Storage ✸ SingleStore Database Talk (CMU Intro to Database Systems)
YouTube video by CMU Database Group
www.youtube.com
October 7, 2025 at 1:32 PM
It starts: github.com/clflushopt/t...

@clflushopt.bsky.social is going to make the worlds fastest tpc-ds generator
GitHub - clflushopt/tpcdsgen: WIP (out of tree) Rust implementation of TPC-DS generators.
WIP (out of tree) Rust implementation of TPC-DS generators. - clflushopt/tpcdsgen
github.com
October 2, 2025 at 11:48 AM
Apache DataFusion 50 is released. Read all about it here: datafusion.apache.org/blog/2025/09...
September 29, 2025 at 1:47 PM