Hannes Mühleisen
@hannes.muehleisen.org
6.4K followers 980 following 120 posts
I like databases and boats. Co-creator of @duckdb.org, Co-Founder and CEO DuckDB Labs. Professor of Data Engineering at Radboud Universiteit.
Posts Media Videos Starter Packs
Reposted by Hannes Mühleisen
pyladiesams.bsky.social
Learn how to build powerful yet lightweight #data workflows using #Python, #DuckDB, and #Smallpond with Valery C. Briz, #Pythonista, senior #dataengineer on the 23rd of October in our #online #workshop 18:00-19:30 CEST
Register here: www.meetup.com/pyladiesams/...
Reposted by Hannes Mühleisen
duckdb.org
🚀 We released DuckDB v1.4.1, the first bugfix release of our LTS edition.

🔎 We expect LTS users to be particularly curious about changes in the system, so we wrote up a short blog post highlighting the most important fixes and improvements.

duckdb.org/2025/10/07/a...
Announcing DuckDB 1.4.1 LTS
Today we are releasing DuckDB 1.4.1, the first bugfix release of our LTS edition.
duckdb.org
Reposted by Hannes Mühleisen
Reposted by Hannes Mühleisen
duckdb.org
✨ We launched a new installation page for DuckDB!

🚀 The new page lets you install the latest stable DuckDB release with just one or two clicks. If the defaults don't fit your use case, no worries: alternative download methods remain available for many clients.
Reposted by Hannes Mühleisen
ssaavedra.eu
After trying @duckdb.org with terabytes of parquet I'm hardly going back for data exploration to anything else. Hell, I'm now spawning DuckDB for analyzing even .csv and .json files due to how ergonomic its SQL is.
Reposted by Hannes Mühleisen
duckdb.org
We published a new deep dive by Laurens Kuiper, who recently redesigned DuckDB's sort.

One data point: ordering the TPC-H SF100 lineitem table with the memory limit set to 30 GB is 3× faster in DuckDB v1.4 than in v1.3.

Read more at duckdb.org/2025/09/24/s...
Redesigning DuckDB's Sort, Again
After four years, we've decided to redesign DuckDB's sort implementation, again. In this post, we present and evaluate the new design.
duckdb.org
Reposted by Hannes Mühleisen
duckdb.org
🚀 We released version 0.3 of the DuckLake specification and the DuckDB ducklake extension today. It includes interoperability with Iceberg, support for geometry types and more.

Check the announcement blog for more details ducklake.select/2025/09/17/d...
Reposted by Hannes Mühleisen
kylewalker.bsky.social
This is the most exciting time ever to be working in data, and I'm not talking about AI.

3 years ago, I wrote a database-centric guide in my book for analyzing the full 92 million record 1910 Census.

Now, with #rstats and @duckdb?

Analyze those 92 million rows in seconds.
Reposted by Hannes Mühleisen
marcoshuerta.com
I'm speaking soon at #PositConf at the 2:40PM session "Get Your Ducks in a Row with Databases" in Regency VI! My talk is "Semantic Search for the Rest of Us with DuckDB (and Llama.cpp)"

#PositConf2025
Reposted by Hannes Mühleisen
duckdb.org
📈 DuckDB 1.4.0 is out! This is our first LTS release which comes with *one year of community support*. It also supports database encryption, the MERGE SQL statement and Iceberg writes.

For more details, read the announcement blog post at
duckdb.org/2025/09/16/a...
hannes.muehleisen.org
We're testing a new distribution channel for @duckdb.org : #docker images. For now they live at `hfmuehleisen/duckdb`, feel free to test them out. And yes, hell got a little colder today.

hub.docker.com/r/hfmuehleis...
hub.docker.com
Reposted by Hannes Mühleisen
Reposted by Hannes Mühleisen
duckdb.org
We are holding the DuckDB Amsterdam Meetup next week, featuring talks by @rolandbouman.bsky.social, Tania Bogatsch and @qxip.bsky.social:

www.meetup.com/duckdb/event...

The event is already at capacity but consider joining the wait list because there are always last-minute RSVP cancellations.
hannes.muehleisen.org
Excited to be a keynote speaker at PyData Amsterdam 2025 (September 24–26). My talk is titled 'Minus Three Tier: Data Architecture Turned Upside Down'.

Use code PYDATADB10 for 10% off tickets
amsterdam.pydata.org/conference
#PDAmsterdam2025 #10YearsPDAmsterdam
Reposted by Hannes Mühleisen
duckdb.org
DuckDB @duckdb.org · Sep 8
Big Data on the Move: Can a Framework Laptop 13 ultrabook run terabyte-sized workloads with DuckDB?

@szarnyasg.org ran the experiments and shared his finding in our latest blog post: duckdb.org/2025/09/08/d...
Big Data on the Move: DuckDB on the Framework Laptop 13
We put DuckDB through its paces on a 12-core ultrabook with 128 GB RAM, running TPC-H queries up to SF10,000.
duckdb.org
Reposted by Hannes Mühleisen
duckdb.org
DuckDB @duckdb.org · Sep 2
We just launched the “DuckDB in Science” site, a curated collection of papers, lectures and podcasts about DuckDB in research: duckdb.org/science/

🎡 If you would like to learn more about DuckDB in Science, consider joining our meetup in London this Thursday: www.meetup.com/duckdb/event...
Reposted by Hannes Mühleisen
Reposted by Hannes Mühleisen
pvldb.bsky.social
Vol:18 No:8 → Saving Private Hash Join
👥 Authors: Laurens Kuiper, Paul Gross, Peter Boncz, Hannes Mühleisen
📄 PDF: https://www.vldb.org/pvldb/vol18/p2748-kuiper.pdf
Thumbnail: Saving Private Hash Join
Reposted by Hannes Mühleisen
duckdb.org
DuckDB @duckdb.org · Aug 20
New blog post by Petrica Leuca:
Basic Feature Engineering with DuckDB

In this post, we show how to perform essential machine learning data preprocessing tasks—like missing value imputation, categorical encoding, and feature scaling—directly in DuckDB using SQL and benchmark it against scikit-learn.
Reposted by Hannes Mühleisen
ocks.org
A little demo of reactive SQL in Observable Notebooks 2.0, first using (native) DuckDB to bake data from a remote source, followed by DuckDB-Wasm to create and query reactive views in the client. Should be released this week!
Reposted by Hannes Mühleisen
duckdb.org
DuckDB @duckdb.org · Aug 18
🎓 On September 4, we are hosting a new kind of meetup in London which will focus on the use of DuckDB in Science and Education!

⚡️ We still have some spots for lightning talks. If you're working with DuckDB in your research and/or classroom, consider sharing your story!

🔗 duckdb.org/events/2025/...
DuckDB Meetup on Science and Education in London
DuckDB is an in-process SQL database management system focused on analytical query processing. It is designed to be easy to install and easy to use. DuckDB has no external dependencies. DuckDB has bin...
duckdb.org
Reposted by Hannes Mühleisen
xevix.bsky.social
Stretching DuckDB w/ Common Crawl, ~1.7B rows, ~300 parquet files. ~2-3s for single-column aggregations, ~2-3 mins to SUMMARIZE the data, peaking at ~12-14GB memory usage. Not exactly real-time, but the fact you can do this on a laptop with no server setups or Spark pipelines is still amazing.
Reposted by Hannes Mühleisen
Reposted by Hannes Mühleisen
purplefrogsys.bsky.social
Not every job needs Spark or BigQuery.
Sometimes, you just need DuckDB.

Find out why it’s a game-changer for local analytics 🐤

👉 Read the Frog Blog by Joe!
www.purplefrogsystems.com/2025/08/why-...

#DuckDB #SQL #DataEngineer
Reposted by Hannes Mühleisen
juliasilge.com
I'm excited to speak this afternoon at #useR2025 on outgrowing your laptop with #Positron for #rstats users!

You can check out my slides at juliasilge.github.io/useR-2025/
Galaxy brain meme format outlining options for working with data: CSV file, parquet & duckdb, databases, and remote SSH sessions