Suvinay
@suvinay.bsky.social
36 followers 26 following 9 posts
👨‍💻 Building AI systems (TPUs) at Google | 🎙️ Co-host the Computer Architecture Podcast | 🎓 EECS Ph.D. @ MIT, B.Tech @ IIT Madras | Views my own | suvinay.com
Posts Media Videos Starter Packs
Pinned
suvinay.bsky.social
Starting with this exciting line of work from @tjin.bsky.social and colleagues at MIT. We tackle the question of: Can we train LLMs to parallelize autoregressive decoding automatically, backed by a performant runtime to exploit this parallelism for improved inference speedup?
tjin.bsky.social
Excited to share our work with friends from MIT/Google on Learned Asynchronous Decoding! LLM responses often contain chunks of tokens that are semantically independent. What if we can train LLMs to identify such chunks and decode them in parallel, thereby speeding up inference? 1/N
suvinay.bsky.social
Scaling Laws provide a valuable lens in guiding model design and computational budgets. Our recent work extends this lens to the realm of _fine-grained_ sparsity. Check out our #ICLR2025 paper, and the thread below from lead-author @tjin.bsky.social summarizing our findings.
tjin.bsky.social
📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N
suvinay.bsky.social
And finally, for those interested in more technical details, and codesign across multiple layers of the stack from hardware, circuits to software and all the way up to the datacenter: youtu.be/RyV9xQDpO6U?...
TPUs: Codesigning Computing Systems for Artificial Intelligence | MIT | 2023.11.07
YouTube video by Suvinay Subramanian
youtu.be
suvinay.bsky.social
A couple of fun videos that provide a sneak peek into TPUs and how they are plugged into our datacenters: [1] youtu.be/FsxthdQ_sL4?... [2] youtu.be/9i1ZM0dPyRo?...
https://youtube.com/watch?v=Fsxthd…
suvinay.bsky.social
For a historical account on the journey of developing TPUs, check out Norm Jouppi 's talk at SuperComputing'24: youtu.be/a-1xJmfYxyU?...
SC24 IEEE-CS Seymour Cray Computer Engineering Award
YouTube video by SC Conference Series
youtu.be
suvinay.bsky.social
At Google, we announced the latest generation of our AI supercomputers (TPUs) -- Ironwood -- this week. Check out the blogpost in quote for the highlights. blog.google/products/googl…

Pointers to deep-dives and more technical details in thread. [contd...👇]
https://blog.google/products/googl…
suvinay.bsky.social
Together with Lisa Hsu (Meta), we have been hosting the Computer Architecture Podcast -- we recently crossed 50K downloads. Check out our latest episode with Prof. Arka Basu: comparchpodcast.podbean.com -- we discuss GPUs, but a different vantage point than AI which is all the rage.
Computer Architecture Podcast | comparchpodcast
A show that brings you closer to the cutting edge in computer architecture and the remarkable people behind it. Hosted by Dr. Suvinay Subramanian, who is a computer architect at Google in the Systems ...
comparchpodcast.podbean.com
suvinay.bsky.social
Starting with this exciting line of work from @tjin.bsky.social and colleagues at MIT. We tackle the question of: Can we train LLMs to parallelize autoregressive decoding automatically, backed by a performant runtime to exploit this parallelism for improved inference speedup?
tjin.bsky.social
Excited to share our work with friends from MIT/Google on Learned Asynchronous Decoding! LLM responses often contain chunks of tokens that are semantically independent. What if we can train LLMs to identify such chunks and decode them in parallel, thereby speeding up inference? 1/N
suvinay.bsky.social
Hello world! Dipping my toes into social media. My excellent intern(s) at Google with whom I have had the pleasure of working, were kind enough to nudge me to help signal-boost their work. Will also try to share updates on TPUs, AI chips & systems, and computer architecture.