Jake Thomas
banner
jakthom.bsky.social
Jake Thomas
@jakthom.bsky.social
Messing around with boats and databases.

Sometimes data systems, sometimes security, sometimes ai/ml, sometimes a blend of it all.
Engineering is fun. That is all.
November 13, 2025 at 5:10 PM
(for those of us who like 🎧🎶 and 🤖...)

open.spotify.com/episode/1n9V...
F3: The Open-Source Data File Format for the Future
open.spotify.com
October 13, 2025 at 9:04 PM
bla...bla bla...bla bla...

build.
August 4, 2025 at 3:56 PM
Sufficiently-advanced software deployments are indistinguishable from magic.
July 25, 2025 at 4:20 PM
"lakebase"....

.....
...
..

.

🤣🤣🤣🤣🤣🤣🤣🤣
June 12, 2025 at 5:33 PM
So.... what exactly happens when databases are commoditized?..
June 11, 2025 at 12:59 PM
June 7, 2025 at 3:44 PM
❤️
chris.blue Chris @chris.blue · May 30
New post! I wrote about malaise in the streaming space. I'm not sure what's going on, but I'm thinking part of it has to do with Kafka.
Kafka: The End of the Beginning
A decade of focus on adoption has payed off. Now it's time to innovate.
materializedview.io
May 31, 2025 at 4:40 PM
🎯🎯🎯🎯
People are realizing that the best ROI comes from landing data in a lakehouse as quickly as possible. Kafka can be part of that story, but Stream Processing is not.
May 31, 2025 at 4:39 PM
Reposted by Jake Thomas
Welcome to the age of $10/month Lakehouses!

How to build and run a Lakehouse on top of @cloudflare.social R2 , Cloudflare Containers and Neon Postgres, all backed by the new DuckLake "SQL as Lakehouse" format, via @duckdb.org.

tobilg.com/the-age-of-1...
Welcome to the age of $10/month Lakehouses
No, this article is not about buying properties close to lakes...
tobilg.com
May 30, 2025 at 6:28 PM
ducklake.select is neat.

But I'd put it into prod tomorrow if it was called Drake.
DuckLake is an integrated data lake and catalog format.
DuckLake delivers advanced data lake features without traditional lakehouse complexity by using Parquet files and your SQL database. It's an open, standalone format from the DuckDB team.
ducklake.select
May 28, 2025 at 7:41 PM
Reposted by Jake Thomas
0x.tools xCapture v3: Linux Performance Analysis with Modern eBPF and DuckDB 👀👀

tanelpoder.com/posts/xcaptu...
April 23, 2025 at 11:15 PM
@jackhcable.bsky.social discussing product security in the AI era 🔥
April 27, 2025 at 7:24 PM
@ethanrosenthal.com speaking 🔥
April 24, 2025 at 12:17 AM
I've been exploring how to do Iceberg with the fewest dependencies possible.

tl;dr -> @duckdb.org , @amazonwebservices.bsky.social S3 Tables, and a sprinkle of Python is awesome.

jakthom.dev/blog/zero-in...
Creating a Zero-Infrastructure Iceberg Data Lake in 5 Minutes
Zero-Infrastructure Iceberg in 5 Minutes
jakthom.dev
April 21, 2025 at 5:32 PM
idc what anyone says, I love deploying static websites by hand
April 21, 2025 at 5:01 PM
Reposted by Jake Thomas
Arroyo is joining @cloudflare.social! We're bringing Arroyo to the Developer Platform as a serverless stream processing system, and will also remain open-source and self-hostable. www.arroyo.dev/blog/arroyo-...
Arroyo is joining Cloudflare
Arroyo has been acquired by Cloudflare to bring serverless SQL stream processing to the Cloudflare Developer Platfrorm, integrated with Queues, Workers, and R2. The Arroyo Engine will remain open-sour...
www.arroyo.dev
April 10, 2025 at 3:05 PM
Zero egress costs will be (are?) the new data gravity.

H/t @eastdakota.com

open.spotify.com/episode/6lUL...
How Cloudflare is Working to Fix the Internet with Matthew Prince
Screaming in the Cloud · Episode
open.spotify.com
April 10, 2025 at 7:36 PM
Aaaand 4/10 is quite the day in the world of data systems....

datafusion.apache.org/blog/2025/04...
tpchgen-rs World’s fastest open source TPC-H data generator, written in Rust - Apache DataFusion Blog
datafusion.apache.org
April 10, 2025 at 3:46 PM
April 10, 2025 at 3:04 PM
So....has anyone built an MCP @duckdb.org extension yet?...
April 8, 2025 at 9:35 PM
Reposted by Jake Thomas
I'm looking forward to see you in person at the Iceberg summit in SF tomorrow.
I'll be speaking about the evolution of data storage from Hadoop to Iceberg and how we're witnessing the Advent of The Open Data Lake.
www.icebergsummit2025.com
April 8, 2025 at 12:10 AM