Researchers have hypothesized "Universality" - that different models learn similar circuits to solve tasks.
Our work provides the strongest proof, till date, for this hypothesis, and beyond features.
Researchers have hypothesized "Universality" - that different models learn similar circuits to solve tasks.
Our work provides the strongest proof, till date, for this hypothesis, and beyond features.
1️⃣ Model Merging: Combine models analytically without hyperparameter tuning.
2️⃣ Efficient Training: Adapt to new tasks by learning tiny coefficients, not full weights.
3️⃣ Compression: 100x+ memory reduction.
1️⃣ Model Merging: Combine models analytically without hyperparameter tuning.
2️⃣ Efficient Training: Adapt to new tasks by learning tiny coefficients, not full weights.
3️⃣ Compression: 100x+ memory reduction.
Whether it’s Mistral LoRAs or Vision Transformers, the variance drops off a cliff. A tiny handful of 'unique' directions capture nearly ALL the information.
It’s not random. It’s a signature.
Whether it’s Mistral LoRAs or Vision Transformers, the variance drops off a cliff. A tiny handful of 'unique' directions capture nearly ALL the information.
It’s not random. It’s a signature.
We found the opposite.
Using spectral decomposition, we saw that weights from disjoint tasks converge to the same principal directions.
We found the opposite.
Using spectral decomposition, we saw that weights from disjoint tasks converge to the same principal directions.