Martina Vilas
@martinagvilas.bsky.social
Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science. Prev. intern @MicrosoftResearch.
https://martinagvilas.github.io/
https://martinagvilas.github.io/
Pinned
Martina Vilas
@martinagvilas.bsky.social
· Nov 17
Hi BlueSky! 🦋 I’m a computer science PhD student with a background in cognitive neuroscience. Working at the intersection of these topics, my research focuses on reverse engineer the cognitive capacities of AI models 🧠💻
Some recent examples 👇
Some recent examples 👇
Reposted by Martina Vilas
When to call it quits in LLM reasoning? 🛑
Martina's internship project suggests trace monitoring metrics and classifiers that can detect when an LLM reasoning trace is going to fail in mid way. The approach saves up to 70% of token usage, and it even helps with increasing accuracy by 2%-3%.
Martina's internship project suggests trace monitoring metrics and classifiers that can detect when an LLM reasoning trace is going to fail in mid way. The approach saves up to 70% of token usage, and it even helps with increasing accuracy by 2%-3%.
Can we predict which reasoning paths will succeed before seeing the answer? 🤔
Our new paper (arxiv.org/abs/2510.10494) proposes latent-trajectory signals from LLMs' hidden states to identify high-quality reasoning, cutting inference costs by up to 70% while maintaining accuracy
Our new paper (arxiv.org/abs/2510.10494) proposes latent-trajectory signals from LLMs' hidden states to identify high-quality reasoning, cutting inference costs by up to 70% while maintaining accuracy
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remain...
arxiv.org
October 22, 2025 at 10:39 PM
When to call it quits in LLM reasoning? 🛑
Martina's internship project suggests trace monitoring metrics and classifiers that can detect when an LLM reasoning trace is going to fail in mid way. The approach saves up to 70% of token usage, and it even helps with increasing accuracy by 2%-3%.
Martina's internship project suggests trace monitoring metrics and classifiers that can detect when an LLM reasoning trace is going to fail in mid way. The approach saves up to 70% of token usage, and it even helps with increasing accuracy by 2%-3%.
Can we predict which reasoning paths will succeed before seeing the answer? 🤔
Our new paper (arxiv.org/abs/2510.10494) proposes latent-trajectory signals from LLMs' hidden states to identify high-quality reasoning, cutting inference costs by up to 70% while maintaining accuracy
Our new paper (arxiv.org/abs/2510.10494) proposes latent-trajectory signals from LLMs' hidden states to identify high-quality reasoning, cutting inference costs by up to 70% while maintaining accuracy
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remain...
arxiv.org
October 22, 2025 at 3:38 PM
Can we predict which reasoning paths will succeed before seeing the answer? 🤔
Our new paper (arxiv.org/abs/2510.10494) proposes latent-trajectory signals from LLMs' hidden states to identify high-quality reasoning, cutting inference costs by up to 70% while maintaining accuracy
Our new paper (arxiv.org/abs/2510.10494) proposes latent-trajectory signals from LLMs' hidden states to identify high-quality reasoning, cutting inference costs by up to 70% while maintaining accuracy
Reposted by Martina Vilas
All Eureka inference-time scaling insights are now available here: www.microsoft.com/en-us/resear... It was fun sharing these and more together with Vidhisha Balachandran @vidhishab.bsky.social and Vibhav Vineet at #ICLR2025.
Eureka Inference-Time Scaling Insights: Where We Stand and What Lies Ahead - Microsoft Research
Understanding and measuring the potential of inference-time scaling for reasoning. The new Eureka study tests nine state-of-the-art models on eight diverse reasoning tasks.
www.microsoft.com
April 29, 2025 at 3:36 PM
All Eureka inference-time scaling insights are now available here: www.microsoft.com/en-us/resear... It was fun sharing these and more together with Vidhisha Balachandran @vidhishab.bsky.social and Vibhav Vineet at #ICLR2025.
Looking forward to presenting this work next week at #ICLR2025! DM me if you are attending and want to grab a coffee to discuss these topics 💫
I will be presenting this ✨ spotlight 💫 paper at #ICLR2025 with @martinagvilas.bsky.social. Come say hi if you're interested in DNN circuits, complexity and #interpretability
📆 Poster Session 4 (#530)
🕰️ Fri 25 Apr. 3:00-5:30 PM
📝 openreview.net/forum?id=Qog...
📊 iclr.cc/virtual/2025...
📆 Poster Session 4 (#530)
🕰️ Fri 25 Apr. 3:00-5:30 PM
📝 openreview.net/forum?id=Qog...
📊 iclr.cc/virtual/2025...
April 18, 2025 at 6:55 PM
Looking forward to presenting this work next week at #ICLR2025! DM me if you are attending and want to grab a coffee to discuss these topics 💫
December 5th our ML theory group at Cohere For AI is hosting @mathildepapillon.bsky.social to discuss their recent review arxiv.org/abs/2407.09468 on geometric/topological/algebraic ML.
Join us online 💫
Join us online 💫
December 2, 2024 at 1:14 PM
December 5th our ML theory group at Cohere For AI is hosting @mathildepapillon.bsky.social to discuss their recent review arxiv.org/abs/2407.09468 on geometric/topological/algebraic ML.
Join us online 💫
Join us online 💫
Reposted by Martina Vilas
I’m putting together a starter pack for researchers working on human-centered AI evaluation. Reply or DM me if you’d like to be added, or if you have suggestions! Thank you!
(It looks NLP-centric at the moment, but that’s due to the current limits of my own knowledge 🙈)
go.bsky.app/G3w9LpE
(It looks NLP-centric at the moment, but that’s due to the current limits of my own knowledge 🙈)
go.bsky.app/G3w9LpE
November 21, 2024 at 3:56 PM
I’m putting together a starter pack for researchers working on human-centered AI evaluation. Reply or DM me if you’d like to be added, or if you have suggestions! Thank you!
(It looks NLP-centric at the moment, but that’s due to the current limits of my own knowledge 🙈)
go.bsky.app/G3w9LpE
(It looks NLP-centric at the moment, but that’s due to the current limits of my own knowledge 🙈)
go.bsky.app/G3w9LpE
Reposted by Martina Vilas
I tried to find everyone who works in the area but I certainly missed some folks so please lmk...
go.bsky.app/BYkRryU
go.bsky.app/BYkRryU
November 23, 2024 at 5:11 AM
I tried to find everyone who works in the area but I certainly missed some folks so please lmk...
go.bsky.app/BYkRryU
go.bsky.app/BYkRryU
Reposted by Martina Vilas
Does anyone know of any feeds (or similar) for student internship opportunities in ML/CV/NLP?
November 22, 2024 at 7:19 AM
Does anyone know of any feeds (or similar) for student internship opportunities in ML/CV/NLP?
Reposted by Martina Vilas
I've found starter packs on NLP, vision, graphics, etc. But personally, I would love to know and hear from researchers working on vision-language. So, let me know if you'd like to join this starter pack, would be happy to add!
go.bsky.app/TENRRBb
go.bsky.app/TENRRBb
November 19, 2024 at 9:56 PM
I've found starter packs on NLP, vision, graphics, etc. But personally, I would love to know and hear from researchers working on vision-language. So, let me know if you'd like to join this starter pack, would be happy to add!
go.bsky.app/TENRRBb
go.bsky.app/TENRRBb
Reposted by Martina Vilas
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:
Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢
🧵⬇️
Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢
🧵⬇️
November 20, 2024 at 4:35 PM
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:
Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢
🧵⬇️
Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢
🧵⬇️
Reposted by Martina Vilas
LLMs tend to match problem-solving strategies based on textual similarity rather than truly understanding the underlying principles of mathematical problems.
Paper: Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology
Paper: Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology
November 18, 2024 at 9:29 PM
LLMs tend to match problem-solving strategies based on textual similarity rather than truly understanding the underlying principles of mathematical problems.
Paper: Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology
Paper: Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology
Reposted by Martina Vilas
A starter pack of people working on interpretability / explainability of all kinds, using theoretical and/or empirical approaches.
Reply or DM if you want to be added, and help me reach others!
go.bsky.app/DZv6TSS
Reply or DM if you want to be added, and help me reach others!
go.bsky.app/DZv6TSS
November 14, 2024 at 5:00 PM
A starter pack of people working on interpretability / explainability of all kinds, using theoretical and/or empirical approaches.
Reply or DM if you want to be added, and help me reach others!
go.bsky.app/DZv6TSS
Reply or DM if you want to be added, and help me reach others!
go.bsky.app/DZv6TSS
Reposted by Martina Vilas
If you’re interested in mechanistic interpretability, I just found this starter pack and wanted to boost it (thanks for creating it @butanium.bsky.social !). Excited to have a mech interp community on bluesky 🎉
go.bsky.app/LisK3CP
go.bsky.app/LisK3CP
November 19, 2024 at 12:28 AM
If you’re interested in mechanistic interpretability, I just found this starter pack and wanted to boost it (thanks for creating it @butanium.bsky.social !). Excited to have a mech interp community on bluesky 🎉
go.bsky.app/LisK3CP
go.bsky.app/LisK3CP
Reposted by Martina Vilas
I forgot from whom in my feed I got this from, but anyway, this network analyzer is crazy efficient. It gives you ideas for accounts to follow based on your own followees. I just added 50 accounts or so.
bsky-follow-finder.theo.io
bsky-follow-finder.theo.io
Bluesky Network Analyzer
Find accounts that you don't follow (yet) but are followed by lots of accounts that you do follow.
bsky-follow-finder.theo.io
November 18, 2024 at 9:32 PM
I forgot from whom in my feed I got this from, but anyway, this network analyzer is crazy efficient. It gives you ideas for accounts to follow based on your own followees. I just added 50 accounts or so.
bsky-follow-finder.theo.io
bsky-follow-finder.theo.io
Reposted by Martina Vilas
there are many smart speakers and thinkers around AI/ML and/or NLP. but i find almost everything to be kinda predictable by now, minor stylistic variations on the same story. who are some *interesting* speakers i should listen/read? i want things that may surprise or inspire me.
November 16, 2024 at 8:41 PM
there are many smart speakers and thinkers around AI/ML and/or NLP. but i find almost everything to be kinda predictable by now, minor stylistic variations on the same story. who are some *interesting* speakers i should listen/read? i want things that may surprise or inspire me.
Reposted by Martina Vilas
Any Latin Americans here working in Cognitive Science, very broadly construed? (Neuroscience, Psychology, Artificial Intelligence, Anthropology, Linguistics, Economics, Ethics, Philosophy, and more…)
I thought I’d create a starter pack but I could only find a handful of us. Say hi?
I thought I’d create a starter pack but I could only find a handful of us. Say hi?
November 17, 2024 at 1:37 PM
Any Latin Americans here working in Cognitive Science, very broadly construed? (Neuroscience, Psychology, Artificial Intelligence, Anthropology, Linguistics, Economics, Ethics, Philosophy, and more…)
I thought I’d create a starter pack but I could only find a handful of us. Say hi?
I thought I’d create a starter pack but I could only find a handful of us. Say hi?
Reposted by Martina Vilas
It is intuitive to observe some complex-looking model behavior (e.g., the classification of images of different animals using an abstract category) and infer an interesting capacity of the model (e.g., the ability to build rich representations that abstract away from particular animals).
November 17, 2024 at 2:34 PM
It is intuitive to observe some complex-looking model behavior (e.g., the classification of images of different animals using an abstract category) and infer an interesting capacity of the model (e.g., the ability to build rich representations that abstract away from particular animals).
Hi BlueSky! 🦋 I’m a computer science PhD student with a background in cognitive neuroscience. Working at the intersection of these topics, my research focuses on reverse engineer the cognitive capacities of AI models 🧠💻
Some recent examples 👇
Some recent examples 👇
November 17, 2024 at 2:06 PM
Hi BlueSky! 🦋 I’m a computer science PhD student with a background in cognitive neuroscience. Working at the intersection of these topics, my research focuses on reverse engineer the cognitive capacities of AI models 🧠💻
Some recent examples 👇
Some recent examples 👇
Reposted by Martina Vilas
I made a starter pack with the people doing something related to Neurosymbolic AI that I could find.
Let me know if I missed you!
go.bsky.app/RMJ8q3i
Let me know if I missed you!
go.bsky.app/RMJ8q3i
November 11, 2024 at 3:27 PM
I made a starter pack with the people doing something related to Neurosymbolic AI that I could find.
Let me know if I missed you!
go.bsky.app/RMJ8q3i
Let me know if I missed you!
go.bsky.app/RMJ8q3i
Reposted by Martina Vilas
New here? Interested in AI/ML? Check out these great starter packs!
AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS
You can also search all starter packs here: blueskydirectory.com/starter-pack...
AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS
You can also search all starter packs here: blueskydirectory.com/starter-pack...
November 9, 2024 at 9:13 AM
New here? Interested in AI/ML? Check out these great starter packs!
AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS
You can also search all starter packs here: blueskydirectory.com/starter-pack...
AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS
You can also search all starter packs here: blueskydirectory.com/starter-pack...