Micah Wylde
banner
micahw.com
Micah Wylde
@micahw.com
Co-founder arroyo.dev, building next-gen streaming systems. Prev Splunk, Lyft, Sift, Quantcast.
Reposted by Micah Wylde
Another brand new new feature is the R2 data catalog: blog.cloudflare.com/cloudflare-d...

Build something with Pipelines and R2 SQL. I suggest receiving OpenTelemetry data and then surfacing that in a web app (logs should be fairly straightforward), but there are tons of uses for this.
Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare
The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage.
blog.cloudflare.com
September 27, 2025 at 2:13 AM
Reposted by Micah Wylde
It's early, but I'm excited about direction that the Cloudflare Data Platform is taking. Trying to set up similar pipelines on other clouds would typically be $$$ and take tons of expertise. Managing kafka and multiple services for ingestion, compaction, etc blog.cloudflare.com/cloudflare-d...
Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare
The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage.
blog.cloudflare.com
September 25, 2025 at 3:40 PM
The news is finally out! Cloudflare has a Data Platform! We're starting with serverless streaming pipelines (powered by arroyo), a managed Iceberg Catalog, and a new distributed SQL engine built on top of DataFusion
September 25, 2025 at 4:44 PM
Reposted by Micah Wylde
Reminder: San Francisco @ApacheDataFusio meetup tomorrow: lu.ma/uuxd443e
SF Apache DataFusion Meetup · Luma
Join us for an evening of learning, networking, and diving into Apache DataFusion, the blazing-fast query execution framework for Rust-based data…
lu.ma
June 9, 2025 at 3:03 AM
Reposted by Micah Wylde
Cloudflare is at Snowflake Summit in San Francisco this week!

Swing by our booth 2605 to chat about the new Cloudflare R2 Data Catalog and how it can make your data management and analytics easier!
June 4, 2025 at 8:47 PM
Next Monday after the Snowflake Summit keynote! Hang out on our beautiful roof with other cool data folks, and hear some great speakers from LanceDB, @mooncakelabs.bsky.social, Eventual, Marimo, Bobsled, and @cloudflare-dev.bsky.social!

lu.ma/dbq1hfij
Modern Data w/ Cloudflare + Friends · Luma
Come talk about modern data formats, streaming ingestion, query engines and how you feel about Iceberg at Cloudflare's HQ. We'll be running a series of…
lu.ma
May 27, 2025 at 8:21 PM
Reposted by Micah Wylde
Ok, y'all. This took me several weeks and a ton of help from @frankmcsherry.bsky.social and @lalithsuresh.bsky.social. I dug into timely dataflow, differential dataflow, and DBSP to get you up to speed on IVM engines and materialized views. Enjoy!
Everything You Need to Know About Incremental View Maintenance
An overview of incremental view maintenance, why it’s useful, and how you can implement it.
materializedview.io
April 18, 2025 at 6:30 PM
I’m only a week into life at @cloudflare-dev.bsky.social but already amazed by how much of Cloudflare is built _on_ Cloudflare. I’d never have guessed you could get so far with just workers + durable objects!
April 16, 2025 at 3:02 PM
Arroyo is joining @cloudflare.social! We're bringing Arroyo to the Developer Platform as a serverless stream processing system, and will also remain open-source and self-hostable. www.arroyo.dev/blog/arroyo-...
Arroyo is joining Cloudflare
Arroyo has been acquired by Cloudflare to bring serverless SQL stream processing to the Cloudflare Developer Platfrorm, integrated with Queues, Workers, and R2. The Arroyo Engine will remain open-sour...
www.arroyo.dev
April 10, 2025 at 3:05 PM
Reposted by Micah Wylde
Couple of big announcements from @cloudflare.social today for folk in #dataBS:

* Acquisition of Arroyo, launch of Pipelines for streaming ingestion: blog.cloudflare.com/cloudflare-a...
* Launch of R2 Data Catalog—a managed Apache Iceberg catalog for R2 blog.cloudflare.com/r2-data-cata...
Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines
We’ve just shipped our new streaming ingestion service, Pipelines — and we’ve acquired Arroyo, enabling us to bring new SQL-based, stateful transformations to Pipelines and R2.
blog.cloudflare.com
April 10, 2025 at 2:50 PM
Arroyo 0.14.0 is now available, including new lookup joins, support for nested updating aggregates, struct types, new syntax, and a bunch of improvements and fixes: www.arroyo.dev/blog/arroyo-...
Announcing Arroyo 0.14.0
Arroyo 0.14 is now available! This release introduces support for lookup joins, more powerful updating SQL, new syntax, structs in DDL, and more!
www.arroyo.dev
March 26, 2025 at 4:59 PM
I know by month 2 we're all inured to this stuff, but this is a beyond crazy mix of incompetence and illegality www.theatlantic.com/politics/arc...
The Trump Administration Accidentally Texted Me Its War Plans
U.S. national-security leaders included me in a group chat about upcoming military strikes in Yemen. I didn’t think it could be real. Then the bombs started falling.
www.theatlantic.com
March 24, 2025 at 5:49 PM
Arroyo is sitting at 3,999 stars... who's going to put us over the top github.com/ArroyoSystem...
March 1, 2025 at 12:05 AM
You'd think that the key to being a fast streaming engine is like clever join algorithms, but it's mostly just being really good at JSON. Arroyo uses Arrow and the arrow-rs JSON decoder along with some streaming extensions. I think it's pretty cool, so I wrote up a long explanation of how it works
Fast columnar JSON decoding with arrow-rs
JSON is the most common serialization format used in streaming pipelines, so it pays to be able to deserialize it fast. This post covers in detail how the arrow-json library works to perform very effi...
www.arroyo.dev
February 25, 2025 at 5:42 PM
Our team at Arroyo recently needed to rebuild our (very ad-hoc) analytics infra to account for our growth. We spent some time working out the best way to set up a near-real-time data lake today, and ended up with a pretty sweet approach we're calling the LOAD stack: www.arroyo.dev/blog/buildin...
Building a near-real-time data lake with the LOAD stack
The LOAD stack (log storage/object storage/Arroyo/DuckDB) makes it easy to build an affordable real-time data lake with minimal operational overhead. This tutorial will guide you through the process o...
www.arroyo.dev
January 23, 2025 at 5:43 PM
Arroyo 0.13.0 is now available! This one includes some big improvements to the core engine a (including the operator chaining work I wrote about previously: bsky.app/profile/mica...) and a bunch of other features. All the details on our blog: www.arroyo.dev/blog/arroyo-...
December 19, 2024 at 5:23 PM
Is #DataBS interested in the internals of streaming engines? The next release of arroyo.dev (0.13) has a new feature in the core dataflow called operator chaining which gets at some of the interesting details of how these systems work. So let’s dive in to streaming dataflow 🧵
Arroyo — Cloud-native stream processing
Arroyo is the easiest way to run SQL queries against your streaming data
arroyo.dev
December 5, 2024 at 9:29 PM
nothing like a potential natural disaster to bring us all together here 🤗
December 5, 2024 at 7:35 PM
Sad to see Redis Labs burning whatever shreds of credibility they still had with the open source community. Making money as an open source co is hard but there has to be better ways than this github.com/redis-rs/red...
Future Crate Maintenance and Redis Inc. Relationship · Issue #1419 · redis-rs/redis-rs
Hello users. I haven't actively maintained this library in a very long time as you probably noticed. I am still controlling the entry on crates.io for it alongside the redis release team and @badbo...
github.com
November 26, 2024 at 7:08 PM
Happy new Jepsen Report Day to all who celebrate! jepsen.io/analyses/buf...

Confirms my priors that almost no one should be directly calling the Kafka client today—use a stream processing engine for data use cases or a durable execution engine for applications.
Jepsen: Bufstream 0.1.0
jepsen.io
November 12, 2024 at 6:41 PM
If you missed #p99conf last week, talks are now available to stream on YouTube. I spoke about the design decisions that went into Arroyo's incredible performance: youtube.com/watch?v=7H4C...

Come for the Rust hot takes, stay for my terrible hand-drawn architecture diagrams 😅
YouTube
Share your videos with friends, family, and the world
youtube.com
October 30, 2024 at 5:37 PM
Guess we're all here now 👋
October 28, 2024 at 10:27 PM