Somin W
sominw.bsky.social
Somin W
@sominw.bsky.social
cs phd @ northeastern.
🔊 New work w/ @silvioamir.bsky.social & @byron.bsky.social! We show you can distill a model’s mechanism, not just its answers -- teaching a small LM to run it's circuit same as a larger teacher model. We call it Circuit Distillation. (1/4)
September 30, 2025 at 11:32 PM
📢 Can we trace a small distilled model back to its teacher? 🤔New work (w/ @chantalsh.bsky.social, @silvioamir.bsky.social & @byron.bsky.social) finds some footprints left by LLMs in distillation! [1/6]

🔗 Full paper: arxiv.org/abs/2502.06659
Who Taught You That? Tracing Teachers in Model Distillation
Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a stud...
arxiv.org
February 11, 2025 at 5:16 PM