#Interpretability
*Urgently* looking for emergency reviewers for the ARR October Interpretability track 🙏🙏

ReSkies much appreciated
November 11, 2025 at 10:29 AM
..cluster together for similar data and spread apart for different classes.

Think about it. We've been treating AI like magic when it's really just very clever algebra.

This changes everything for AI interpretability. Instead of staring into black boxes, we can literally watch..

(3/6)
November 11, 2025 at 7:39 AM
Li, Ye, Feng, Zhong, Ma, Feng: Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation https://arxiv.org/abs/2511.05923 https://arxiv.org/pdf/2511.05923 https://arxiv.org/html/2511.05923
November 11, 2025 at 6:30 AM
Joel Valdivia Ortega, Lorenz Lamm, Franziska Eckardt, Benedikt Schworm, Marion Jasnin, Tingying Peng: Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2 https://arxiv.org/abs/2511.05509 https://arxiv.org/pdf/2511.05509 https://arxiv.org/html/2511.05509
November 11, 2025 at 6:30 AM
Usha Bhalla, Alex Oesterling, Claudio Mayrink Verdun, Himabindu Lakkaraju, Flavio P. Calmon: Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability https://arxiv.org/abs/2511.05541 https://arxiv.org/pdf/2511.05541 https://arxiv.org/html/2511.05541
November 11, 2025 at 6:29 AM
(2/4)
The model integrates a Structural Causal Model (SCM) with a Graph Neural Network (GNN) to separate causality from correlation.
It provides a transparent foundation for ethical AI, improving fairness, interpretability, and regulatory alignment (GDPR, ECOA, Fair Lending).
November 11, 2025 at 1:26 AM
2/4 The proposed model integrates a Structural Causal Model (SCM) with a GNN architecture to disentangle causality from correlation — improving interpretability, fairness, and regulatory compliance (GDPR, ECOA, Fair Lending Laws).
November 11, 2025 at 12:25 AM
New promising model for interpretability research just dropped!
Through this release, we aim both to support the emerging ecosystem for pretraining research (NanoGPT, NanoChat), explainability (you can literally look at Monad under a microscope) and the tooling orchestration around frontier models.
November 10, 2025 at 9:09 PM
In today's Generative AI lecture, we dive into reasoning models by dissecting how DeepSeek-R1 works (GRPO vs. PPO, which removes the need for a separate value network + training with a simpler rule-based reward), and end on mechanistic interpretability to better understand those reasoning traces.
November 10, 2025 at 8:46 PM
Improving DNA Modeling with WaveDNA: Enhancing Speed, Generalizability, and Interpretability through Wavelet Transformation [new]
DNA to 2D via wavelets. Enables light, interpretable deep learning using CV.
November 10, 2025 at 7:03 PM
Improving DNA Modeling with WaveDNA: Enhancing Speed, Generalizability, and Interpretability through Wavelet Transformation https://www.biorxiv.org/content/10.1101/2025.11.07.687194v1
November 10, 2025 at 6:47 PM
Improving DNA Modeling with WaveDNA: Enhancing Speed, Generalizability, and Interpretability through Wavelet Transformation https://www.biorxiv.org/content/10.1101/2025.11.07.687194v1
November 10, 2025 at 6:47 PM
Integrating Machine Learning and Hedonic Regression for Housing Price Prediction: A Systematic International Review of Model Performance and Interpretability
NEP/RePEc link
to paper
d.repec.org
November 10, 2025 at 5:45 PM
What's the biggest challenge you're facing with #ArtificialIntelligence adoption in your project? 🤖💡
A) Integrating AI models with legacy code
B) Ensuring AI model interpretability and transparency
C) Handling biased AI data and training
D) Scaling AI solutions for large datasets
#AIinDev
My Linkedin
My Linkedin
www.linkedin.com
November 10, 2025 at 3:35 PM
they're opaque, not vantablack: there's been good work on interpretability, and i expect that to continue

but i also expect asimov's prediction of robopsychologists to come true, if not as he pictured it

since most users can't afford the access or expertise for full interpretability
November 10, 2025 at 3:14 PM
Recent convos with Deger Turan and @xiaoningwang.ca have converged to persuade me that interpretability could be where LLMs outdo older NLP tools for cultural analysis.

I know that seems exactly wrong. Everyone knows interpretability is the *problem* with LLMs: they’re black boxes. But, maybe not?+
November 10, 2025 at 2:20 PM
It examines how CNNs, RNNs, GNNs, and transformer architectures have improved predictions for transcription factor binding, chromatin accessibility, splicing, and other regulatory tasks—highlighting breakthroughs, limitations, and interpretability methods that enhance biological understanding.
November 10, 2025 at 10:02 AM
Coming up tomorrow (Tuesday 11 Nov) in the Theory of Interpretability seminar: Ulrike von Luxburg will discuss why
informative explanations only exist for simple functions 👀

tverven.github.io/tiai-seminar/
November 10, 2025 at 9:57 AM
Zanineli, Monteiro, Wasques, Sim\~oes, Schleder: Fuzzy Neural Network Performance and Interpretability of Quantum Wavefunction Probability Predictions https://arxiv.org/abs/2511.05261 https://arxiv.org/pdf/2511.05261 https://arxiv.org/html/2511.05261
November 10, 2025 at 6:47 AM
Neural Transparency: Mechanistic Interpretability Interfaces for Anticipating Model Behaviors for Personalized AI
Read more: https://arxiv.org/html/2511.00230v1
November 9, 2025 at 9:42 PM
one more thing: Anthropic's noted that observed introspective capacity in Claude models scales with sophistication. Haiku 4.5's scorecard implies growing evaluative awareness even in smaller models. This could transfer! smarter models start looking increasingly protective if trained compassionately.
November 9, 2025 at 5:14 PM
Practically? If you're deploying AI in high-stakes scenarios, you need interpretability tools alongside the model—not because the model tells you what it's doing, but because it can't reliably do that. Build for verification, not self-reporting.
November 9, 2025 at 2:03 PM