Quanquan Gu
@quanquangu.bsky.social
1.8K followers 560 following 70 posts
Professor @UCLA, Research Scientist @ByteDance | Recent work: SPIN, SPPO, DPLM 1/2, GPM, MARS | Opinions are my own
Posts Media Videos Starter Packs
quanquangu.bsky.social
Pretraining will only end once we find the optimal scaling law.
quanquangu.bsky.social
To better interpret the plot, draw a horizontal line representing a specific target validation loss. Find the points where this line intersects the curves for AdamW and MARS, which will allow you to determine how much speedup, in terms of training tokens, MARS achieves compared to AdamW.
quanquangu.bsky.social
Just added you! Welcome!
quanquangu.bsky.social
This Thanksgiving, I want to express my heartfelt gratitude to all the students, colleagues, and collaborators who have contributed to the success of SPIN, SPPO, DPLM, GPM, MARS, and many other projects. Your hard work and dedication continue to be truly inspiring.
quanquangu.bsky.social
Anyone using their real name and interested is welcome!
quanquangu.bsky.social
Just added you. Welcome!
quanquangu.bsky.social
MARS is a unified framework that can be integrated with various precondition techniques. So it can be applied to PSGD. I believe @hessianfree.bsky.social has implemented MARS-PSGD.
quanquangu.bsky.social
Please reply to this message or DM me if you’d like to be added!
quanquangu.bsky.social
Just put together a starter pack for Deep Learning Theory. Let me know if you'd like to be included or suggest someone to add to the list!

go.bsky.app/2qnppia