@davidgrangier.bsky.social
19 followers 17 following 3 posts
Posts Media Videos Starter Packs
davidgrangier.bsky.social
3/3

Mixture of experts on high latency networks with No Need to Talk iclr.cc/virtual/2025... (Thu Apr 24 3pm).

Joint work with @matpagliardini.bsky.social , Anastasiia Filippova, @pierreablin.bsky.social, Simin Fan, Skyler Seto, Angelos Katharopoulos, Ronan Collobert
ICLR Poster No Need to Talk: Asynchronous Mixture of Language ModelsICLR 2025
iclr.cc
davidgrangier.bsky.social
#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).

1/3