Lightnews — Scholar-powered news

Aflah 🍉🕊️

@aflah02101.bsky.social

Research Software Engineer @MPI-SWS • OSS @EleutherAI• Prev @Goldman Sachs, @LCS2, GSoC @TensorFlow • IIIT Delhi '24 • #CEASEFIRENOW 🕊️

Posts Replies Media Videos

Aflah 🍉🕊️

@aflah02101.bsky.social

Fun Story:
The project started as I was playing with
Stas Bekman's MAMF script and looking at the logs. I realized the script logs a ton of VERY USEFUL data and given my Modal credits were expiring soon, I just decided to spend them all on getting as many numbers as possible!

January 8, 2026 at 7:18 PM

Aflah 🍉🕊️

@aflah02101.bsky.social

I'm also on the job market looking for research scientist style roles. If that is something you're hiring for, feel free to reach out!

January 8, 2026 at 7:18 PM

Aflah 🍉🕊️

@aflah02101.bsky.social

Looking to get more feedback to make this better both in terms of the analysis on the webapp as well as how to add more data that will be useful for practitioners

January 8, 2026 at 7:18 PM

Aflah 🍉🕊️

@aflah02101.bsky.social

So far, I've tracked over 1.5 million shapes across 7 different GPUs with the majority being for Blackwell and Hopper GPUs

There are still lots of things to be done and I'd love to cover more shapes, dtypes and pytorch versions in the future given access to more GPUs

January 8, 2026 at 7:18 PM

Aflah 🍉🕊️

@aflah02101.bsky.social

I'd also like to thank @modal-labs.bsky.social for their generous GPU grants. These measurements are performed on GPUs with credits I had left over from previous grants.

January 8, 2026 at 7:18 PM

Aflah 🍉🕊️

@aflah02101.bsky.social

The benchmark sweeps a large space of shapes and records the best achieved TFLOPS per shape, producing a map of where hardware performs well and where it doesn’t.

The measurements are made using Stas Bekman's super helpful MAMF finder script (couldn't find his bsky handle to tag)

January 8, 2026 at 7:18 PM

Aflah 🍉🕊️

@aflah02101.bsky.social

Some use cases:
• Capacity planning for real workloads
• Comparing GPUs on identical shapes/dtypes
• Identifying performance cliffs and bottlenecks
• Tracking regressions across PyTorch versions (WIP)

January 8, 2026 at 7:18 PM

Aflah 🍉🕊️

@aflah02101.bsky.social

Meet MAMF Explorer 🚀
It lets you explore Maximum Achievable Matmul FLOPS (MAMF) across matrix shapes, dtypes, and hardware.

MAMF is a practical upper bound on matmul throughput for a given GPU + software stack, not a theoretical peak.

January 8, 2026 at 7:18 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news