Martina Vilas
martinagvilas.bsky.social
Martina Vilas
@martinagvilas.bsky.social
Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science. Prev. intern @MicrosoftResearch.

https://martinagvilas.github.io/
Hidden states have distinctive temporal patterns for correct paths. They show:

✴️ Larger overall representational change (Net ↑)
✴️ Less wandering in latent space (Cumulative ↓)
✴️ More direct progress toward final state (Aligned ↑)
October 22, 2025 at 3:38 PM
Across 3 reasoning models (DeepSeek-R1, Phi-4-Reasoning-Plus, Qwen3) and diverse domains (GPQA, AIME, TSP), LT signals:

✅ Significantly predict correctness
✅ Outperform output-based confidence measures and cross-layer signals
October 22, 2025 at 3:38 PM
We track how representations evolve through the trace and extract 3 complementary signals:

📊 Net Change: Overall shift (start → end)
🔄 Cumulative Change: Total movement
🎯 Aligned Change: Progress toward final state
October 22, 2025 at 3:38 PM
Identifying trace quality is critical: it enables more reliable predictions, improves efficiency by avoiding wasted compute, and can be used to guide models toward productive reasoning strategies.

Our solution: Look inside the temporal evolution of the model's latent space! 🔍
October 22, 2025 at 3:38 PM
December 5th our ML theory group at Cohere For AI is hosting @mathildepapillon.bsky.social to discuss their recent review arxiv.org/abs/2407.09468 on geometric/topological/algebraic ML.

Join us online 💫
December 2, 2024 at 1:14 PM
[1/2] Position paper at #ICML2024 “An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience"
November 17, 2024 at 2:06 PM