Whether it’s Mistral LoRAs or Vision Transformers, the variance drops off a cliff. A tiny handful of 'unique' directions capture nearly ALL the information.
It’s not random. It’s a signature.
Whether it’s Mistral LoRAs or Vision Transformers, the variance drops off a cliff. A tiny handful of 'unique' directions capture nearly ALL the information.
It’s not random. It’s a signature.
We found the opposite.
Using spectral decomposition, we saw that weights from disjoint tasks converge to the same principal directions.
We found the opposite.
Using spectral decomposition, we saw that weights from disjoint tasks converge to the same principal directions.