davidgrangier.bsky.social
@davidgrangier.bsky.social
#NeurIPS2025. Mixing different datasets to train your LLM?
✨ We can help you find the perfect blend!
📈 Few small-model experiments → scaling law fit → your optimal mixture.
🎯 Easy + efficient.

Chat with us 💬 Poster #3414. Thu, Dec 4, 11am arxiv.org/abs/2507.09404
Scaling Laws for Optimal Data Mixtures
Large foundation models are typically trained on data from multiple domains, with the data mixture--the proportion of each domain used--playing a critical role in model performance. The standard appro...
arxiv.org
December 2, 2025 at 6:47 PM
#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).

1/3
April 21, 2025 at 11:55 PM