Lightnews — Scholar-powered news

Mikail Khona

@khonamikail.bsky.social

Chronic research nomad. Likes all kinds of neural networks and math

Mikail Khona

@khonamikail.bsky.social

The easiest experiment would be to graft shampoo with the step size schedule of whatever optimizer you are currently using

March 29, 2025 at 5:20 PM

Mikail Khona

@khonamikail.bsky.social

Train your model with an optimizer that orthogonalizes gradients (like muon or shampoo) and maybe everything gets cleaned up even more

March 29, 2025 at 5:14 PM

Mikail Khona

@khonamikail.bsky.social

March 29, 2025 at 5:06 PM

Light up
your news