John Ryan
John Ryan
@johnpryan.bsky.social
ML @ Isomorphic Labs
Thank you!
December 20, 2024 at 2:57 PM
The normalizing flow architecture they cite is from a paper which isn't yet published. Excited to read it though because JetFormer seems to have the largest normalising flows I've even, and they claim to have minimal issues with training stability 👀
December 3, 2024 at 10:20 PM
Reposted by John Ryan
DeMo was created in March 2024 by Bowen Peng and Jeffrey Quesnelle and has been published on arXiv in collaboration with Diederik P. Kingma, co-founder of OpenAI and inventor of the Adam optimizer and VAEs.

The paper is available here: arxiv.org/abs/2411.19870

And code: github.com/bloc97/DeMo
DeMo: Decoupled Momentum Optimization
Training large neural networks typically requires sharing gradients between accelerators through specialized high-speed interconnects. Drawing from the signal processing principles of frequency decomp...
arxiv.org
December 2, 2024 at 4:46 PM
The moderation lists you can subscribe to are an interesting mechanism, to make a good one will probably require a decent amount of resources so I hope bluesky will provide some solid ones / fund people who will build & maintain community trusted lists.
November 24, 2024 at 8:36 PM
November 18, 2024 at 7:27 AM