Lightnews — Scholar-powered news

@davidgrangier.bsky.social

19 followers 17 following 5 posts

Posts Replies Media Videos

davidgrangier.bsky.social

@davidgrangier.bsky.social

#NeurIPS2025. Mixing different datasets to train your LLM?
✨ We can help you find the perfect blend!
📈 Few small-model experiments → scaling law fit → your optimal mixture.
🎯 Easy + efficient.

Chat with us 💬 Poster #3414. Thu, Dec 4, 11am arxiv.org/abs/2507.09404

Scaling Laws for Optimal Data Mixtures

Large foundation models are typically trained on data from multiple domains, with the data mixture--the proportion of each domain used--playing a critical role in model performance. The standard appro...

arxiv.org

December 2, 2025 at 6:47 PM

davidgrangier.bsky.social

@davidgrangier.bsky.social

#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).

1/3

April 21, 2025 at 11:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news