Jörg Franke
@jfranke.bsky.social
2 followers 3 following 5 posts
PhD student in the Machine Learning Lab at the University of Freiburg - Core Deep Learning Research with some applications in bio.
Posts Media Videos Starter Packs
jfranke.bsky.social
🧵4/5 - For example, when pretrain GPT2s, AdamCPR outperforms AdamW with the same budget or only requires 2/3 of the budget to reach the same score.
jfranke.bsky.social
🧵3/5 - CPR can be used with any gradient-based optimization algorithm, e.g. Adam. You can find our AdamCPR implementation at github.com/automl/CPR or via pip install pytorch-cpr
jfranke.bsky.social
🧵2/5 - We reformulate regularization as an inequality-constrained optimization problem which leads to a couple of benefits:
✅ Individual and dynamic weight regularization
✅ Outperforms weight decay
✅ No additional or fewer hyperparameters
✅ Minor or no runtime overhead
jfranke.bsky.social
Thrilled to present our work on Constrained Parameter Regularization (CPR) at #NeurIPS2024!
Our novel deep learning regularization outperforms weight decay across various tasks. neurips.cc/virtual/2024...
This is joint work with Michael Hefenbrock, Gregor Köhler, and Frank Hutter
🧵👇
NeurIPS Poster Improving Deep Learning Optimization through Constrained Parameter RegularizationNeurIPS 2024
neurips.cc