Tian Jin
tjin.bsky.social
Tian Jin
@tjin.bsky.social
PhD student @MIT CSAIL
Reposted by Tian Jin
Excited to share our research on what matters in sparse LLM pre-training. Stop by our poster @ ICLR 🗓️ April 24th session #2.
📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N
April 22, 2025 at 10:09 AM
Reposted by Tian Jin
Scaling Laws provide a valuable lens in guiding model design and computational budgets. Our recent work extends this lens to the realm of _fine-grained_ sparsity. Check out our #ICLR2025 paper, and the thread below from lead-author @tjin.bsky.social summarizing our findings.
📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N
April 22, 2025 at 1:32 AM
Reposted by Tian Jin
Tian and Karolina and team are at ICLR. Come say hi.
📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N
April 21, 2025 at 1:00 PM
📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N
April 21, 2025 at 7:15 AM
Excited to share our work with friends from MIT/Google on Learned Asynchronous Decoding! LLM responses often contain chunks of tokens that are semantically independent. What if we can train LLMs to identify such chunks and decode them in parallel, thereby speeding up inference? 1/N
February 27, 2025 at 12:38 AM