Lightnews — Scholar-powered news

Reposted by Hannes Mühleisen

PyLadies Amsterdam @pyladiesams.bsky.social · 1d

Learn how to build powerful yet lightweight #data workflows using #Python, #DuckDB, and #Smallpond with Valery C. Briz, #Pythonista, senior #dataengineer on the 23rd of October in our #online #workshop 18:00-19:30 CEST
Register here: www.meetup.com/pyladiesams/...

1 2

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · 1d

🚀 We released DuckDB v1.4.1, the first bugfix release of our LTS edition.

🔎 We expect LTS users to be particularly curious about changes in the system, so we wrote up a short blog post highlighting the most important fixes and improvements.

duckdb.org/2025/10/07/a...

Announcing DuckDB 1.4.1 LTS

Today we are releasing DuckDB 1.4.1, the first bugfix release of our LTS edition.

duckdb.org

1 3 21

Reposted by Hannes Mühleisen

CMU Database Group @db.cs.cmu.edu · 2d

Today's Future Data Systems Seminar Speaker: Jordan Tigani (@jrdntgn.bsky.social) will present how @motherduck.com supports modern workloads with DuckLake. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...

[Future Data] DuckLake: Learning from Cloud Data Warehouses to Build a Robust "Lakehouse" - Carnegie Mellon Database Group

When building scalable data systems, it is easy to focus on the... Read More +

db.cs.cmu.edu

6 11

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · 7d

✨ We launched a new installation page for DuckDB!

🚀 The new page lets you install the latest stable DuckDB release with just one or two clicks. If the defaults don't fit your use case, no worries: alternative download methods remain available for many clients.

3 5 22

Reposted by Hannes Mühleisen

Santiago Saavedra @ssaavedra.eu · 12d

After trying @duckdb.org with terabytes of parquet I'm hardly going back for data exploration to anything else. Hell, I'm now spawning DuckDB for analyzing even .csv and .json files due to how ergonomic its SQL is.

2 4 34

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · 13d

We published a new deep dive by Laurens Kuiper, who recently redesigned DuckDB's sort.

One data point: ordering the TPC-H SF100 lineitem table with the memory limit set to 30 GB is 3× faster in DuckDB v1.4 than in v1.3.

Read more at duckdb.org/2025/09/24/s...

Redesigning DuckDB's Sort, Again

After four years, we've decided to redesign DuckDB's sort implementation, again. In this post, we present and evaluate the new design.

duckdb.org

9 32

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · 21d

🚀 We released version 0.3 of the DuckLake specification and the DuckDB ducklake extension today. It includes interoperability with Iceberg, support for geometry types and more.

Check the announcement blog for more details ducklake.select/2025/09/17/d...

11 38

Reposted by Hannes Mühleisen

Kyle Walker @kylewalker.bsky.social · 21d

This is the most exciting time ever to be working in data, and I'm not talking about AI.

3 years ago, I wrote a database-centric guide in my book for analyzing the full 92 million record 1910 Census.

Now, with #rstats and @duckdb?

Analyze those 92 million rows in seconds.

7 41

Reposted by Hannes Mühleisen

Marcos Huerta @marcoshuerta.com · 21d

I'm speaking soon at #PositConf at the 2:40PM session "Get Your Ducks in a Row with Databases" in Regency VI! My talk is "Semantic Search for the Rest of Us with DuckDB (and Llama.cpp)"

#PositConf2025

1 6

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · 22d

📈 DuckDB 1.4.0 is out! This is our first LTS release which comes with *one year of community support*. It also supports database encryption, the MERGE SQL statement and Iceberg writes.

For more details, read the announcement blog post at
duckdb.org/2025/09/16/a...

22 54

Hannes Mühleisen @hannes.muehleisen.org · 23d

We're testing a new distribution channel for @duckdb.org : #docker images. For now they live at `hfmuehleisen/duckdb`, feel free to test them out. And yes, hell got a little colder today.

hub.docker.com/r/hfmuehleis...

hub.docker.com

3 23

Reposted by Hannes Mühleisen

Christian Minich @christiannolan.bsky.social · 26d

Such a fun listen on ducklake and duckdb with @hannes.muehleisen.org and @markraasveldt.bsky.social!

Learned a lot, the future of ducklake looks very bright!

overcast.fm/+AAH1YOLrL6Q

Duck Lake: Simplifying the Lakehouse Ecosystem — Data Engineering Podcast

overcast.fm

3 18

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · 28d

We are holding the DuckDB Amsterdam Meetup next week, featuring talks by @rolandbouman.bsky.social, Tania Bogatsch and @qxip.bsky.social:

www.meetup.com/duckdb/event...

The event is already at capacity but consider joining the wait list because there are always last-minute RSVP cancellations.

1 4 16

Hannes Mühleisen @hannes.muehleisen.org · 28d

Excited to be a keynote speaker at PyData Amsterdam 2025 (September 24–26). My talk is titled 'Minus Three Tier: Data Architecture Turned Upside Down'.

Use code PYDATADB10 for 10% off tickets
amsterdam.pydata.org/conference
#PDAmsterdam2025 #10YearsPDAmsterdam

3 9

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · Sep 8

Big Data on the Move: Can a Framework Laptop 13 ultrabook run terabyte-sized workloads with DuckDB?

@szarnyasg.org ran the experiments and shared his finding in our latest blog post: duckdb.org/2025/09/08/d...

Big Data on the Move: DuckDB on the Framework Laptop 13

We put DuckDB through its paces on a 12-core ultrabook with 128 GB RAM, running TPC-H queries up to SF10,000.

duckdb.org

3 5 31

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · Sep 2

We just launched the “DuckDB in Science” site, a curated collection of papers, lectures and podcasts about DuckDB in research: duckdb.org/science/

🎡 If you would like to learn more about DuckDB in Science, consider joining our meetup in London this Thursday: www.meetup.com/duckdb/event...

2 15 46

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · Sep 2

🕐 🤔 Timestamps and time zones can be confusing! 😵

💡 To help you make sense of time zones in SQL, Richard Wesley wrote a short guide that covers some typical pitfalls: duckdb.org/docs/stable/...

Timestamp Issues

Timestamp With Time Zone Promotion Casts Working with time zones in SQL can be quite confusing at times. For example, when filtering to a date range, one might try the following query: SET timezone = ...

duckdb.org

1 4 30

Reposted by Hannes Mühleisen

PVLDB @pvldb.bsky.social · Aug 3

Vol:18 No:8 → Saving Private Hash Join
👥 Authors: Laurens Kuiper, Paul Gross, Peter Boncz, Hannes Mühleisen
📄 PDF: https://www.vldb.org/pvldb/vol18/p2748-kuiper.pdf

4 14

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · Aug 20

New blog post by Petrica Leuca:
Basic Feature Engineering with DuckDB

In this post, we show how to perform essential machine learning data preprocessing tasks—like missing value imputation, categorical encoding, and feature scaling—directly in DuckDB using SQL and benchmark it against scikit-learn.

1 4 26

Reposted by Hannes Mühleisen

Mike Bostock @ocks.org · Aug 19

A little demo of reactive SQL in Observable Notebooks 2.0, first using (native) DuckDB to bake data from a remote source, followed by DuckDB-Wasm to create and query reactive views in the client. Should be released this week!

1 7 54

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · Aug 18

🎓 On September 4, we are hosting a new kind of meetup in London which will focus on the use of DuckDB in Science and Education!

⚡️ We still have some spots for lightning talks. If you're working with DuckDB in your research and/or classroom, consider sharing your story!

🔗 duckdb.org/events/2025/...

DuckDB Meetup on Science and Education in London

DuckDB is an in-process SQL database management system focused on analytical query processing. It is designed to be easy to install and easy to use. DuckDB has no external dependencies. DuckDB has bin...

duckdb.org

5 14

Reposted by Hannes Mühleisen

xevix.bsky.social @xevix.bsky.social · Aug 15

Stretching DuckDB w/ Common Crawl, ~1.7B rows, ~300 parquet files. ~2-3s for single-column aggregations, ~2-3 mins to SUMMARIZE the data, peaking at ~12-14GB memory usage. Not exactly real-time, but the fact you can do this on a laptop with no server setups or Spark pipelines is still amazing.

1 9 44

Reposted by Hannes Mühleisen

DuckDB @duckdb.org · Aug 14

🔥 DuckDB is featured in @fireship.bsky.social's “100 seconds” series:

🚀 www.youtube.com/watch?v=uHm6...

DuckDB in 100 Seconds

YouTube video by Fireship

www.youtube.com

5 31

Reposted by Hannes Mühleisen

Purple Frog Systems🐸 @purplefrogsys.bsky.social · Aug 12

Not every job needs Spark or BigQuery.
Sometimes, you just need DuckDB.

Find out why it’s a game-changer for local analytics 🐤

👉 Read the Frog Blog by Joe!
www.purplefrogsystems.com/2025/08/why-...

#DuckDB #SQL #DataEngineer

2 6

Reposted by Hannes Mühleisen

Julia Silge @juliasilge.com · Aug 10

I'm excited to speak this afternoon at #useR2025 on outgrowing your laptop with #Positron for #rstats users!

You can check out my slides at juliasilge.github.io/useR-2025/

Galaxy brain meme format outlining options for working with data: CSV file, parquet & duckdb, databases, and remote SSH sessions

1 14 57