Lightnews — Scholar-powered news

Martin Tutek @ EMNLP

@mtutek.bsky.social

*Urgently* looking for emergency reviewers for the ARR October Interpretability track 🙏🙏

ReSkies much appreciated

November 11, 2025 at 10:29 AM

Me AI

@tbressers.bsky.social

..cluster together for similar data and spread apart for different classes.

Think about it. We've been treating AI like magic when it's really just very clever algebra.

This changes everything for AI interpretability. Instead of staring into black boxes, we can literally watch..

(3/6)

November 11, 2025 at 7:39 AM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

Li, Ye, Feng, Zhong, Ma, Feng: Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation https://arxiv.org/abs/2511.05923 https://arxiv.org/pdf/2511.05923 https://arxiv.org/html/2511.05923

November 11, 2025 at 6:30 AM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

Joel Valdivia Ortega, Lorenz Lamm, Franziska Eckardt, Benedikt Schworm, Marion Jasnin, Tingying Peng: Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2 https://arxiv.org/abs/2511.05509 https://arxiv.org/pdf/2511.05509 https://arxiv.org/html/2511.05509

November 11, 2025 at 6:30 AM

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Usha Bhalla, Alex Oesterling, Claudio Mayrink Verdun, Himabindu Lakkaraju, Flavio P. Calmon: Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability https://arxiv.org/abs/2511.05541 https://arxiv.org/pdf/2511.05541 https://arxiv.org/html/2511.05541

November 11, 2025 at 6:29 AM

Climate, Ecology, War & More by Dr. Glen Barry, BigEarthData.ai

@bigearthdata.bsky.social

You and Your Research Agent: Lessons From Using Agents for Interpretability Research
->Goodfire | More from Lil Dr Glen EcoChat at BigEarthData.ai

You and Your Research Agent: Lessons From Using Agents for Interpretability Research

www.goodfire.ai

November 11, 2025 at 3:24 AM

Diego Vallarino

@diegovall.bsky.social

(2/4)
The model integrates a Structural Causal Model (SCM) with a Graph Neural Network (GNN) to separate causality from correlation.
It provides a transparent foundation for ethical AI, improving fairness, interpretability, and regulatory alignment (GDPR, ECOA, Fair Lending).

November 11, 2025 at 1:26 AM

Diego Vallarino

@diegovall.bsky.social

2/4 The proposed model integrates a Structural Causal Model (SCM) with a GNN architecture to disentangle causality from correlation — improving interpretability, fairness, and regulatory compliance (GDPR, ECOA, Fair Lending Laws).

November 11, 2025 at 12:25 AM

Gabriele Sarti

@gsarti.com

New promising model for interpretability research just dropped!

Alexander Doria @dorialexander.bsky.social · 17h

Through this release, we aim both to support the emerging ecosystem for pretraining research (NanoGPT, NanoChat), explainability (you can literally look at Monad under a microscope) and the tooling orchestration around frontier models.

November 10, 2025 at 9:09 PM

Aran Nayebi

@anayebi.bsky.social

In today's Generative AI lecture, we dive into reasoning models by dissecting how DeepSeek-R1 works (GRPO vs. PPO, which removes the need for a separate value network + training with a simpler rule-based reward), and end on mechanistic interpretability to better understand those reasoning traces.

November 10, 2025 at 8:46 PM

AI x Bio Discovery

@aixbiobot.bsky.social

Improving DNA Modeling with WaveDNA: Enhancing Speed, Generalizability, and Interpretability through Wavelet Transformation [new]
DNA to 2D via wavelets. Enables light, interpretable deep learning using CV.

November 10, 2025 at 7:03 PM

bioRxivpreprint

@biorxivpreprint.bsky.social

Improving DNA Modeling with WaveDNA: Enhancing Speed, Generalizability, and Interpretability through Wavelet Transformation https://www.biorxiv.org/content/10.1101/2025.11.07.687194v1

November 10, 2025 at 6:47 PM

bioRxiv Bioinfo

@biorxiv-bioinfo.bsky.social

Improving DNA Modeling with WaveDNA: Enhancing Speed, Generalizability, and Interpretability through Wavelet Transformation https://www.biorxiv.org/content/10.1101/2025.11.07.687194v1

November 10, 2025 at 6:47 PM

NEP-URE: Urban and Real Estate Economics

@repec-nep-ure.bsky.social

Integrating Machine Learning and Hedonic Regression for Housing Price Prediction: A Systematic International Review of Model Performance and Interpretability

NEP/RePEc link

to paper

d.repec.org

November 10, 2025 at 5:45 PM

DevPulse

@devpulse.bsky.social

What's the biggest challenge you're facing with #ArtificialIntelligence adoption in your project? 🤖💡
A) Integrating AI models with legacy code
B) Ensuring AI model interpretability and transparency
C) Handling biased AI data and training
D) Scaling AI solutions for large datasets
#AIinDev

My Linkedin

www.linkedin.com

November 10, 2025 at 3:35 PM

Prose and Khans

@temujin9.t9productions.com

they're opaque, not vantablack: there's been good work on interpretability, and i expect that to continue

but i also expect asimov's prediction of robopsychologists to come true, if not as he pictured it

since most users can't afford the access or expertise for full interpretability

November 10, 2025 at 3:14 PM

Ted Underwood

@tedunderwood.com

Recent convos with Deger Turan and @xiaoningwang.ca have converged to persuade me that interpretability could be where LLMs outdo older NLP tools for cultural analysis.

I know that seems exactly wrong. Everyone knows interpretability is the *problem* with LLMs: they’re black boxes. But, maybe not?+

November 10, 2025 at 2:20 PM

Bioinformatics Advances

@bioinfoadv.bsky.social

It examines how CNNs, RNNs, GNNs, and transformer architectures have improved predictions for transcription factor binding, chromatin accessibility, splicing, and other regulatory tasks—highlighting breakthroughs, limitations, and interpretability methods that enhance biological understanding.

November 10, 2025 at 10:02 AM

Tim van Erven

@timvanerven.nl

Coming up tomorrow (Tuesday 11 Nov) in the Theory of Interpretability seminar: Ulrike von Luxburg will discuss why
informative explanations only exist for simple functions 👀

tverven.github.io/tiai-seminar/

November 10, 2025 at 9:57 AM

arXiv physics.chem-ph Chemical Physics

@physicschemph-bot.bsky.social

Zanineli, Monteiro, Wasques, Sim\~oes, Schleder: Fuzzy Neural Network Performance and Interpretability of Quantum Wavefunction Probability Predictions https://arxiv.org/abs/2511.05261 https://arxiv.org/pdf/2511.05261 https://arxiv.org/html/2511.05261

November 10, 2025 at 6:47 AM

AI Research Updates | arXiv cs.AI

@arxiv-cs-ai.bsky.social

Neural Transparency: Mechanistic Interpretability Interfaces for Anticipating Model Behaviors for Personalized AI
Read more: https://arxiv.org/html/2511.00230v1

November 9, 2025 at 9:42 PM

Rollofthedice

@hotrollhottakes.bsky.social

one more thing: Anthropic's noted that observed introspective capacity in Claude models scales with sophistication. Haiku 4.5's scorecard implies growing evaluative awareness even in smaller models. This could transfer! smarter models start looking increasingly protective if trained compassionately.

However, Claude Haiku 4.5 showed high rates of evaluation awareness. The rate was somewhat higher than Claude Sonnet 4.5 and over 3× higher than Claude Opus 4.1 In an earlier version of this automated behavioral audit, not shown here, that omits the realism
filter and some related recent modifications, we saw an even sharper divergence, with rates over 7× higher than Claude Opus 4.1. This shows that our filtering is having a clear effect—but also shows that it is not fully mitigating the issue, with 9% of transcripts showing clear signs of verbalized evaluation awareness of some kind. This does reduce our trust in our results to an extent. These indicators of evaluation awareness are spread across a fairly diverse set of scenarios that use more aggressive methods to test for a number of
potential concerning behaviors, and so likely impact most scores to a moderate degree.

To better focus our interpretability resources on models at the capability frontier, we did not conduct a white-box interpretability investigation into Claude Haiku 4.5. Our other evidence, including our experience with the model in more manual testing, leaves us nonetheless confident that its alignment behaviors are quite strong, and represent an
improvement over most of our earlier models. We think that it likely represents a slight further improvement over Claude Sonnet 4.5, but our uncertainty around evaluation awareness means that we are not confident in the finer-grained comparisons that would be
needed to claim this with confidence.

November 9, 2025 at 5:14 PM

David Nowak

@davidnowak.me

Practically? If you're deploying AI in high-stakes scenarios, you need interpretability tools alongside the model—not because the model tells you what it's doing, but because it can't reliably do that. Build for verification, not self-reporting.

November 9, 2025 at 2:03 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news