Mikail Khona
banner
khonamikail.bsky.social
Mikail Khona
@khonamikail.bsky.social
Chronic research nomad. Likes all kinds of neural networks and math
The easiest experiment would be to graft shampoo with the step size schedule of whatever optimizer you are currently using
March 29, 2025 at 5:20 PM
Train your model with an optimizer that orthogonalizes gradients (like muon or shampoo) and maybe everything gets cleaned up even more
March 29, 2025 at 5:14 PM
March 29, 2025 at 5:06 PM