@xevix.bsky.social
69 followers 33 following 120 posts
Software Developer interested in data, web, languages. Silicon Valley/Tokyo. https://medium.com/@xevix https://github.com/xevix
Posts Media Videos Starter Packs
xevix.bsky.social
Processing 100Tb of CSV files on a single machine is insane, little over 1hr per query, even if on a powerful AWS instance. Question heavily the need for complex systems when this is what’s possible now. Can’t wait for full write-up. Incredible work.

duckdb.org/2025/10/09/b...
Benchmark Results for DuckDB v1.4 LTS
DuckDB v1.4 LTS is both fast and scalable. In in-memory mode, it is the fastest system on ClickBench. In disk-based mode, it can run complex analytical queries on a dataset equivalent to 100 TB CSV fi...
duckdb.org
xevix.bsky.social
It’s interesting the tradeoffs if the main goal is no operating cost and decent startup time. Definitely painful to develop on regularly but for a one and done this makes a lot of sense. I wonder if Rust compile times will come down further one day.
xevix.bsky.social
Taking the DuckDb hoodie on a trip. Not exactly Amsterdam but I’ve heard they like columnar databases here too.
xevix.bsky.social
Congrats to DuckDB team on LTS release w/ many great improvements! Hidden among them you can now use Hive filtering with read_blob, and SHOW TABLES FROM specific db w/o USE.
duckdb.org
📈 DuckDB 1.4.0 is out! This is our first LTS release which comes with *one year of community support*. It also supports database encryption, the MERGE SQL statement and Iceberg writes.

For more details, read the announcement blog post at
duckdb.org/2025/09/16/a...
Reposted
duckdb.org
📈 DuckDB 1.4.0 is out! This is our first LTS release which comes with *one year of community support*. It also supports database encryption, the MERGE SQL statement and Iceberg writes.

For more details, read the announcement blog post at
duckdb.org/2025/09/16/a...
xevix.bsky.social
I tried loading eBird data (1.5B rows CSV ZIP) using DuckDB for fun, inspired by a Clickhouse blog post and a bit of curiosity. Both did well, DuckDB slightly faster querying and Parquet ingest, Clickhouse w/ native zip support, optimized for ingest and multitenancy. xevix.medium.com/ebird-in-duc...
eBird in DuckDB
I saw this post by the Clickhouse team which was doing a cool test of the eBird dataset from Cornell University, and wondered how DuckDB…
xevix.medium.com
Reposted
pvldb.bsky.social
Vol:18 No:8 → Saving Private Hash Join
👥 Authors: Laurens Kuiper, Paul Gross, Peter Boncz, Hannes Mühleisen
📄 PDF: https://www.vldb.org/pvldb/vol18/p2748-kuiper.pdf
Thumbnail: Saving Private Hash Join
xevix.bsky.social
Yeah, I don’t think MS is interested in 3rd party devs so much.
xevix.bsky.social
Unfortunately yes, I was already going to get one for something else and this put me over the edge. Maybe I'll also build a gaming rig one day in the distant future haha.
xevix.bsky.social
Compiling DuckDB on Windows 11 (ARM) using UTM VM on macOS to debug Windows compile issues. It's a shame msvc doesn't exist outside of Windows, mingw/clang don't work the same and cross-compiling is tricky. Compiling takes 5-10 mins (instead of 1-2 mins native), but it works 🎉!
xevix.bsky.social
Just the 1 day of data above is ~125GiB compressed, ~585GiB uncompressed. One month is about 3.75TiB compressed, or 17.5TiB. It makes sense this dataset is so popular for testing and analysis, wow.
xevix.bsky.social
Stretching DuckDB w/ Common Crawl, ~1.7B rows, ~300 parquet files. ~2-3s for single-column aggregations, ~2-3 mins to SUMMARIZE the data, peaking at ~12-14GB memory usage. Not exactly real-time, but the fact you can do this on a laptop with no server setups or Spark pipelines is still amazing.
xevix.bsky.social
Uploaded a simplified query. Had to delete and repost since no edit button on the snippets site, sorry for the spam haha.
xevix.bsky.social
Haha already posted in case someone benefits there too.
xevix.bsky.social
Neat little hack to get Hive partition list in DuckDB, useful for an overview. Might be neat to have built-in. gist.github.com/xevix/04f33d...
xevix.bsky.social
Automator using simple shell script to call sqlfluff. Added keyboard shortcut for the service too. Easier than making browser extensions for each browser, although unfortunately not cross-platform.
xevix.bsky.social
Added an Automator quick action to run sqlfluff for formatting SQL in browser fields, used here in the DuckDB UI. Only needs sqlfluff, optionally configure rules. Would be cool to get built-in one day, but works for now.
xevix.bsky.social
The pieces were already there, but progress is not always linear. Majority of cases now handled by [Mother]Duck[DB|Lake], Spark for extreme cases. Single-node compute 10 years from now is going to be mindblowing, but already exciting what we can do today.
xevix.bsky.social
Apache Drill allowed storing metadata in an RDBMS, Iceberg scaling data, Arrow scaling columnar memory, Parquet columnar storage, Spark distributed compute, DuckDB single-node compute. DuckLake scales metadata and storage w/ compute on single node. Motherduck distributes compute.
Reposted
andypavlo.bsky.social
Shots fired by @firebolthq.bsky.social with their new on-prem executable (www.firebolt.io/blog/introdu...). They have dethroned the Umbra system by The Germans™ at ‪@tum.de in the ClickBench rankings: benchmark.clickhouse.com
ClickBench rankings as of June 2025.
xevix.bsky.social
Vibe coding NOAA GHCN weather visualization from scratch w/ Claude Code and DuckDB MCP. There's Evidence and other vis tools but I don't want a pre-cached set of data, I want it to query live. Cool that this can be put together w/o writing my own HTML/CSS, as a web backend dev 😅