Lightnews — Scholar-powered news

Reposted by Tian Jin

@gkdziugaite.bsky.social

Excited to share our research on what matters in sparse LLM pre-training. Stop by our poster @ ICLR 🗓️ April 24th session #2.

Tian Jin @tjin.bsky.social · Apr 21

📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N

April 22, 2025 at 10:09 AM

Reposted by Tian Jin

Suvinay

@suvinay.bsky.social

Scaling Laws provide a valuable lens in guiding model design and computational budgets. Our recent work extends this lens to the realm of _fine-grained_ sparsity. Check out our #ICLR2025 paper, and the thread below from lead-author @tjin.bsky.social summarizing our findings.

Tian Jin @tjin.bsky.social · Apr 21

📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N

April 22, 2025 at 1:32 AM

Reposted by Tian Jin

Dan Roy

@roydanroy.bsky.social

Tian and Karolina and team are at ICLR. Come say hi.

Tian Jin @tjin.bsky.social · Apr 21

📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N

April 21, 2025 at 1:00 PM

Tian Jin

@tjin.bsky.social

📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N

April 21, 2025 at 7:15 AM

Tian Jin

@tjin.bsky.social

Excited to share our work with friends from MIT/Google on Learned Asynchronous Decoding! LLM responses often contain chunks of tokens that are semantically independent. What if we can train LLMs to identify such chunks and decode them in parallel, thereby speeding up inference? 1/N

February 27, 2025 at 12:38 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news