Lightnews — Scholar-powered news

Micah Wylde

@micahw.com

S2 is incredibly cool, and now you can run it yourself!

shikhar @schmizz.net · 14d

s2-lite is here – an open source @s2.dev Stream Store! It's a single binary you can run anywhere. Powered by SlateDB, so you can point it at an object storage bucket for durable streams with real-time reads. github.com/s2-streamsto...

January 21, 2026 at 7:19 PM

Reposted by Micah Wylde

Craig Dennis

@craigsdennis.dev

I built a tool called LinkedOut to solve the "links in the air" problem during my talks. It’s a serverless data lake built on the @cloudflare.social Data Platform.

Ingest: Pipelines Store: R2 + Apache Iceberg Security: Access

Real-time analytics with zero egress fees. Full build video below.

January 21, 2026 at 12:46 AM

Reposted by Micah Wylde

Jeremy Morrell

@jeremymorrell.dev

Another brand new new feature is the R2 data catalog: blog.cloudflare.com/cloudflare-d...

Build something with Pipelines and R2 SQL. I suggest receiving OpenTelemetry data and then surfacing that in a web app (logs should be fairly straightforward), but there are tons of uses for this.

Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare

The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage.

blog.cloudflare.com

September 27, 2025 at 2:13 AM

Reposted by Micah Wylde

Jeremy Morrell

@jeremymorrell.dev

It's early, but I'm excited about direction that the Cloudflare Data Platform is taking. Trying to set up similar pipelines on other clouds would typically be $$$ and take tons of expertise. Managing kafka and multiple services for ingestion, compaction, etc blog.cloudflare.com/cloudflare-d...

Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare

The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage.

blog.cloudflare.com

September 25, 2025 at 3:40 PM

Micah Wylde

@micahw.com

The news is finally out! Cloudflare has a Data Platform! We're starting with serverless streaming pipelines (powered by arroyo), a managed Iceberg Catalog, and a new distributed SQL engine built on top of DataFusion

Cloudflare @cloudflare.social · Sep 25

The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage. https://cfl.re/4nHl2lk #BirthdayWeek

Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare

The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage.

cfl.re

September 25, 2025 at 4:44 PM

Reposted by Micah Wylde

Andrew Lamb

@andrewlamb1111.bsky.social

Reminder: San Francisco @ApacheDataFusio meetup tomorrow: lu.ma/uuxd443e

SF Apache DataFusion Meetup · Luma

Join us for an evening of learning, networking, and diving into Apache DataFusion, the blazing-fast query execution framework for Rust-based data…

lu.ma

June 9, 2025 at 3:03 AM

Reposted by Micah Wylde

Cloudflare

@cloudflare.social

Cloudflare is at Snowflake Summit in San Francisco this week!

Swing by our booth 2605 to chat about the new Cloudflare R2 Data Catalog and how it can make your data management and analytics easier!

June 4, 2025 at 8:47 PM

Micah Wylde

@micahw.com

Next Monday after the Snowflake Summit keynote! Hang out on our beautiful roof with other cool data folks, and hear some great speakers from LanceDB, @mooncakelabs.bsky.social, Eventual, Marimo, Bobsled, and @cloudflare-dev.bsky.social!

lu.ma/dbq1hfij

Modern Data w/ Cloudflare + Friends · Luma

Come talk about modern data formats, streaming ingestion, query engines and how you feel about Iceberg at Cloudflare's HQ. We'll be running a series of…

lu.ma

May 27, 2025 at 8:21 PM

Reposted by Micah Wylde

Chris

@chris.blue

Ok, y'all. This took me several weeks and a ton of help from @frankmcsherry.bsky.social and @lalithsuresh.bsky.social. I dug into timely dataflow, differential dataflow, and DBSP to get you up to speed on IVM engines and materialized views. Enjoy!

Everything You Need to Know About Incremental View Maintenance

An overview of incremental view maintenance, why it’s useful, and how you can implement it.

materializedview.io

April 18, 2025 at 6:30 PM

Micah Wylde

@micahw.com

I’m only a week into life at @cloudflare-dev.bsky.social but already amazed by how much of Cloudflare is built _on_ Cloudflare. I’d never have guessed you could get so far with just workers + durable objects!

April 16, 2025 at 3:02 PM

Micah Wylde

@micahw.com

Arroyo is joining @cloudflare.social! We're bringing Arroyo to the Developer Platform as a serverless stream processing system, and will also remain open-source and self-hostable. www.arroyo.dev/blog/arroyo-...

Arroyo is joining Cloudflare

Arroyo has been acquired by Cloudflare to bring serverless SQL stream processing to the Cloudflare Developer Platfrorm, integrated with Queues, Workers, and R2. The Arroyo Engine will remain open-sour...

www.arroyo.dev

April 10, 2025 at 3:05 PM

Reposted by Micah Wylde

rmoff 🏃‍♂️🫖🥓

@rmoff.net

Couple of big announcements from @cloudflare.social today for folk in #dataBS:

* Acquisition of Arroyo, launch of Pipelines for streaming ingestion: blog.cloudflare.com/cloudflare-a...
* Launch of R2 Data Catalog—a managed Apache Iceberg catalog for R2 blog.cloudflare.com/r2-data-cata...

Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines

We’ve just shipped our new streaming ingestion service, Pipelines — and we’ve acquired Arroyo, enabling us to bring new SQL-based, stateful transformations to Pipelines and R2.

blog.cloudflare.com

April 10, 2025 at 2:50 PM

Micah Wylde

@micahw.com

Arroyo 0.14.0 is now available, including new lookup joins, support for nested updating aggregates, struct types, new syntax, and a bunch of improvements and fixes: www.arroyo.dev/blog/arroyo-...

Announcing Arroyo 0.14.0

Arroyo 0.14 is now available! This release introduces support for lookup joins, more powerful updating SQL, new syntax, structs in DDL, and more!

www.arroyo.dev

March 26, 2025 at 4:59 PM

Micah Wylde

@micahw.com

I know by month 2 we're all inured to this stuff, but this is a beyond crazy mix of incompetence and illegality www.theatlantic.com/politics/arc...

The Trump Administration Accidentally Texted Me Its War Plans

U.S. national-security leaders included me in a group chat about upcoming military strikes in Yemen. I didn’t think it could be real. Then the bombs started falling.

www.theatlantic.com

March 24, 2025 at 5:49 PM

Micah Wylde

@micahw.com

Arroyo is sitting at 3,999 stars... who's going to put us over the top github.com/ArroyoSystem...

March 1, 2025 at 12:05 AM

Micah Wylde

@micahw.com

You'd think that the key to being a fast streaming engine is like clever join algorithms, but it's mostly just being really good at JSON. Arroyo uses Arrow and the arrow-rs JSON decoder along with some streaming extensions. I think it's pretty cool, so I wrote up a long explanation of how it works

Fast columnar JSON decoding with arrow-rs

JSON is the most common serialization format used in streaming pipelines, so it pays to be able to deserialize it fast. This post covers in detail how the arrow-json library works to perform very effi...

www.arroyo.dev

February 25, 2025 at 5:42 PM

Micah Wylde

@micahw.com

Our team at Arroyo recently needed to rebuild our (very ad-hoc) analytics infra to account for our growth. We spent some time working out the best way to set up a near-real-time data lake today, and ended up with a pretty sweet approach we're calling the LOAD stack: www.arroyo.dev/blog/buildin...

Building a near-real-time data lake with the LOAD stack

The LOAD stack (log storage/object storage/Arroyo/DuckDB) makes it easy to build an affordable real-time data lake with minimal operational overhead. This tutorial will guide you through the process o...

www.arroyo.dev

January 23, 2025 at 5:43 PM

Micah Wylde

@micahw.com

Arroyo 0.13.0 is now available! This one includes some big improvements to the core engine a (including the operator chaining work I wrote about previously: bsky.app/profile/mica...) and a bunch of other features. All the details on our blog: www.arroyo.dev/blog/arroyo-...

December 19, 2024 at 5:23 PM

Micah Wylde

@micahw.com

Is #DataBS interested in the internals of streaming engines? The next release of arroyo.dev (0.13) has a new feature in the core dataflow called operator chaining which gets at some of the interesting details of how these systems work. So let’s dive in to streaming dataflow 🧵

Arroyo — Cloud-native stream processing

Arroyo is the easiest way to run SQL queries against your streaming data

arroyo.dev

December 5, 2024 at 9:29 PM

Micah Wylde

@micahw.com

nothing like a potential natural disaster to bring us all together here 🤗

December 5, 2024 at 7:35 PM

Micah Wylde

@micahw.com

Sad to see Redis Labs burning whatever shreds of credibility they still had with the open source community. Making money as an open source co is hard but there has to be better ways than this github.com/redis-rs/red...

Future Crate Maintenance and Redis Inc. Relationship · Issue #1419 · redis-rs/redis-rs

Hello users. I haven't actively maintained this library in a very long time as you probably noticed. I am still controlling the entry on crates.io for it alongside the redis release team and @badbo...

github.com

November 26, 2024 at 7:08 PM

Micah Wylde

@micahw.com

Happy new Jepsen Report Day to all who celebrate! jepsen.io/analyses/buf...

Confirms my priors that almost no one should be directly calling the Kafka client today—use a stream processing engine for data use cases or a durable execution engine for applications.

Jepsen: Bufstream 0.1.0

jepsen.io

November 12, 2024 at 6:41 PM

Micah Wylde

@micahw.com

If you missed #p99conf last week, talks are now available to stream on YouTube. I spoke about the design decisions that went into Arroyo's incredible performance: youtube.com/watch?v=7H4C...

Come for the Rust hot takes, stay for my terrible hand-drawn architecture diagrams 😅

YouTube

Share your videos with friends, family, and the world

youtube.com

October 30, 2024 at 5:37 PM

Micah Wylde

@micahw.com

Guess we're all here now 👋

October 28, 2024 at 10:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news