Aflah 🍉🕊️
banner
aflah02101.bsky.social
Aflah 🍉🕊️
@aflah02101.bsky.social
Research Software Engineer @MPI-SWS • OSS @EleutherAI• Prev @Goldman Sachs, @LCS2, GSoC @TensorFlow • IIIT Delhi '24 • #CEASEFIRENOW 🕊️
Fun Story:
The project started as I was playing with
Stas Bekman's MAMF script and looking at the logs. I realized the script logs a ton of VERY USEFUL data and given my Modal credits were expiring soon, I just decided to spend them all on getting as many numbers as possible!
January 8, 2026 at 7:18 PM
I'm also on the job market looking for research scientist style roles. If that is something you're hiring for, feel free to reach out!
January 8, 2026 at 7:18 PM
Looking to get more feedback to make this better both in terms of the analysis on the webapp as well as how to add more data that will be useful for practitioners
January 8, 2026 at 7:18 PM
So far, I've tracked over 1.5 million shapes across 7 different GPUs with the majority being for Blackwell and Hopper GPUs

There are still lots of things to be done and I'd love to cover more shapes, dtypes and pytorch versions in the future given access to more GPUs
January 8, 2026 at 7:18 PM
I'd also like to thank @modal-labs.bsky.social for their generous GPU grants. These measurements are performed on GPUs with credits I had left over from previous grants.
January 8, 2026 at 7:18 PM
The benchmark sweeps a large space of shapes and records the best achieved TFLOPS per shape, producing a map of where hardware performs well and where it doesn’t.

The measurements are made using Stas Bekman's super helpful MAMF finder script (couldn't find his bsky handle to tag)
January 8, 2026 at 7:18 PM
Some use cases:
• Capacity planning for real workloads
• Comparing GPUs on identical shapes/dtypes
• Identifying performance cliffs and bottlenecks
• Tracking regressions across PyTorch versions (WIP)
January 8, 2026 at 7:18 PM
Meet MAMF Explorer 🚀
It lets you explore Maximum Achievable Matmul FLOPS (MAMF) across matrix shapes, dtypes, and hardware.

MAMF is a practical upper bound on matmul throughput for a given GPU + software stack, not a theoretical peak.
January 8, 2026 at 7:18 PM