Matt Green
mgreen.bsky.social
Matt Green
@mgreen.bsky.social
Data, Streaming, coding, (maybe) AI?

building a real time streaming engine.

https://www.denormalized.io/
Pinned
Excited to finally release the python bindings for our embeddable stream processing engine 🎉

`pip install denormalized`

Check out the API docs at probably-nothing-labs.github.io/denormalized...

And the Code over at github.com/probably-not...
denormalized/py-denormalized at main · probably-nothing-labs/denormalized
Embeddable stream processing engine based on Apache DataFusion - probably-nothing-labs/denormalized
github.com
the amount of coffee I'm drinking why agents write my code is giving me the jitters
May 23, 2025 at 6:56 PM
Just launched a free AI powered Spark Optimizer that examines spark logs to make cost and efficiency recommendations!

Looking for feedback from Spark users #databs

datasre.ai
DataSRE.ai - Intelligent Spark Infrastructure Management
Save up to 50% on your Spark compute bills with AI-driven optimization, intelligent autoscaling, and automated infrastructure management.
datasre.ai
April 9, 2025 at 9:49 PM
do people here still care about tech or we just focused on the tariff news too?
April 4, 2025 at 7:34 PM
Ever wish your AI agent had more external dependencies?

announcing mcp-leftpad!

Using the power of MCP, AI can now easily leftpad strings

www.npmjs.com/package/mcp-...
mcp-leftpad
An MCP server that exposes left-pad as a tool. Latest version: 1.0.0, last published: 6 days ago. Start using mcp-leftpad in your project by running `npm i mcp-leftpad`. There are no other projects in...
www.npmjs.com
March 18, 2025 at 4:47 PM
spent a few hours building an mcp server to connect ollama models to mcp clients (like claude desktop)
github.com/emgeee/mcp-o...
GitHub - emgeee/mcp-ollama: Query model running with Ollama from within Claude Desktop or other MCP clients
Query model running with Ollama from within Claude Desktop or other MCP clients - emgeee/mcp-ollama
github.com
February 5, 2025 at 6:14 PM
been getting into agents a lot more recently and I've been struggling to define just what they mean. Best I can come up with is LLMs + tool usage executing in a dynamic compute graph. Sound right?
January 8, 2025 at 9:35 PM
Just published a new example of using the Denormalized Stream processing engine to compute real-time fraud features and sink to a Feast datastore - blog post coming out soon! github.com/feast-dev/fe...
GitHub - feast-dev/feast-denormalized-tutorial: Feast + Denormalized
Feast + Denormalized. Contribute to feast-dev/feast-denormalized-tutorial development by creating an account on GitHub.
github.com
December 10, 2024 at 10:03 PM
@mozilla.ai builders day has begun!
December 5, 2024 at 5:51 PM
Finally got around to playing with @anthropic.com MCP protocol and it's very well done. Seems like a step in the right direction and I hope it continues to gain traction
November 27, 2024 at 5:15 PM
anytime someone asks me how they should get started developing with LLMs
November 27, 2024 at 4:43 PM
just dropped denormalized 0.0.10
- (fix): engine no longer panics when handling really late data
- (feat) default to using the kafka timestamp if no timestamp column is specified

pypi.org/project/deno...
denormalized
Embeddable stream processing engine
pypi.org
November 25, 2024 at 8:42 PM
Merkle Trees are such a powerful data structure for distributed systems. They can allow you save a lot of bandwidth for the small cost of re-computing hashes of data. They also are what allow you to trust data retrieved from untrusted sources www.baeldung.com/cs/merkle-tr...
How Do Merkle Trees Work? | Baeldung on Computer Science
A quick and practical guide to Merkle trees.
www.baeldung.com
November 25, 2024 at 8:00 PM
Fantastic post.

When I started getting involved with the DataFusion project I never understood why it would take a month for my contributed changes to be released. It meant I basically had to maintain my own fork and build my project against that -- a process that become rather burdensome
xuanwo.io Xuanwo @xuanwo.io · Nov 21
Publishing a post "What Did ASF Do Wrong?" to explore ways to attract more contributors—feel free to join the discussion with me.

xuanwo.io/2024/09-what...
What did ASF do wrong?
An infrastructure engineer, focused on distributed storage system
xuanwo.io
November 22, 2024 at 9:54 PM
playing with the bsky firehose and it seems that record timestamps are in an inconsistent format. Is there some library that can properly parse these into unix time automatically?
November 21, 2024 at 7:08 PM
Underrated use case for AI: formatting text that seems structured-ish
November 20, 2024 at 3:24 AM
the dream
It appears to be run time linter rather than interactive, just like the python library it replaces. For an interactive linter you need the parser to be both fast and fault tolerant. Guess who’s dropping that :)

Also, this is the second sqlfluff rust port. github.com/quarylabs/sq...
GitHub - quarylabs/sqruff: Fast SQL formatter/linter
Fast SQL formatter/linter. Contribute to quarylabs/sqruff development by creating an account on GitHub.
github.com
November 19, 2024 at 11:26 PM
YAML stops becoming "human readable" after like 100 lines
November 19, 2024 at 4:44 PM
PyTorch is no longer publishing to conda nightly-- seems like a pretty interesting development in the python packaging ecosystem dev-discuss.pytorch.org/t/pytorch-de...
PyTorch Deprecation of Conda Nightly Builds
Please see: [Announcement] Deprecating PyTorch’s official Anaconda channel · Issue #138506 · pytorch/pytorch · GitHub PyTorch will stop publishing Anaconda packages that depend on Anaconda’s default ...
dev-discuss.pytorch.org
November 17, 2024 at 1:43 AM
the more time I spent on bsky the more it's starting to feel like the early days of twitter. Also, custom feeds are really cool
November 15, 2024 at 7:40 PM
Excited to finally release the python bindings for our embeddable stream processing engine 🎉

`pip install denormalized`

Check out the API docs at probably-nothing-labs.github.io/denormalized...

And the Code over at github.com/probably-not...
denormalized/py-denormalized at main · probably-nothing-labs/denormalized
Embeddable stream processing engine based on Apache DataFusion - probably-nothing-labs/denormalized
github.com
November 15, 2024 at 6:40 PM
Follow Friday: @franciscojarceo.bsky.social is the maintainer of the feast feature store and a fine data lad
November 15, 2024 at 4:26 PM
watched some demos the other night that made me realize just how close we are to the future of AI's talking to other AI's. Things are going to get messy
November 15, 2024 at 12:13 AM
With the recent release of uv 0.5 I looked into it again. I didn't realize that the eventual plan is to have uv subsume rye as a full-blown python env management tool github.com/astral-sh/ry...
How will `rye` and `uv` coexist in the future? · astral-sh rye · Discussion #1164
I have been following the two projects and needless to say the progress so far has been great, but now that both uv and rye fall under the astral umbrella, what does the future hold for both these ...
github.com
November 13, 2024 at 10:34 PM
had a lot of success last night using claude to generate a dockerfile that runs both Kafka and a custom Rust script for generating fake data.

Highly recommend!
November 12, 2024 at 9:18 PM
Any hardcore users of Flink out there? what are your experiences?
November 5, 2024 at 8:43 PM