@wolfstammer.bsky.social
120 followers 130 following 6 posts
PhD candidate at AI & ML lab @ TU Darmstadt (he/him). Research on deep learning, representation learning, neuro-symbolic AI, explainable AI, verifiable AI and interactive AI
Posts Media Videos Starter Packs
wolfstammer.bsky.social
🧠🔍 Can deep models be verifiably right for the right reasons?

At ICML’s Actionable Interpretability Workshop, we present Neural Concept Verifier—bringing Prover–Verifier Games to concept space.

📅 Poster: Sat, July 19
📄 arxiv.org/abs/2507.07532
#ICML2025 #XAI #NeuroSymbolic
wolfstammer.bsky.social
Can concept-based models handle complex, object-rich images? We think so! Meet Object-Centric Concept Bottlenecks (OCB) — adding object-awareness to interpretable AI. Led by David Steinmann w/ @toniwuest.bsky.social & @kerstingaiml.bsky.social .
📄 arxiv.org/abs/2505.244...
#AI #XAI #NeSy #CBM #ML
Reposted
tuda.bsky.social
Reasonable Artificial Intelligence und The Adaptive Mind: Die TU Darmstadt wird im Rahmen der Exzellenzstrategie des Bundes und der Länder mit gleich zwei geförderten Clusterprojekten ausgezeichnet. Ein Meilenstein für unsere Universität! www.tu-darmstadt.de/universitaet...
Zwei Exzellenzcluster für die TU Darmstadt
Großer Erfolg für die Technische Universität Darmstadt: Zwei ihrer Forschungsprojekte werden künftig als Exzellenzcluster gefördert. Die Exzellenzkommission im Wettbewerb der prestigeträchtigen Exzell...
www.tu-darmstadt.de
wolfstammer.bsky.social
🚨 New #ICML2025 paper!
"Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?"
We test Vision-Language Models on classic visual puzzles—and even simple concepts like “spiral direction” or “left vs. right” trip them up. Big gap to human reasoning remains.
📄 arxiv.org/pdf/2410.19546
Reposted
martinmundt.bsky.social
🔥Our work “Where is the Truth? The Risk of Getting Confounded in a Continual World" was accepted with a spotlight poster at ICML!
arxiv.org/abs/2402.06434

-> we introduce continual confounding + the ConCon dataset, where confounders over time render continual knowledge accumulation insufficient ⬇️
wolfstammer.bsky.social
I am happy to share that my dissertation is now officially available online!
Feel free to take a look :) tuprints.ulb.tu-darmstadt.de/29712/
Reposted
nsaphra.bsky.social
2018: Saliency maps give plausible interpretations of random weights, triggering skepticism and catalyzing the mechinterp cultural movement, which now advocates for SAEs.

2025: SAEs give plausible interpretations of random weights, triggering skepticism and ...
Sanity Checks for Saliency Maps
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings. Sparse Autoencoders Can Interpret Randomly Initialized Transformers
Thomas Heap, Tim Lawson, Lucy Farnik, Laurence Aitchison
Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the internal representations of transformers. In this paper, we apply SAEs to 'interpret' random transformers, i.e., transformers where the parameters are sampled IID from a Gaussian rather than trained on text data. We find that random and trained transformers produce similarly interpretable SAE latents, and we confirm this finding quantitatively using an open-source auto-interpretability pipeline. Further, we find that SAE quality metrics are broadly similar for random and trained transformers. We find that these results hold across model sizes and layers. We discuss a number of number interesting questions that this work raises for the use of SAEs and auto-interpretability in the context of mechanistic interpretability.
Reposted
jjcmoon.bsky.social
We all know backpropagation can calculate gradients, but it can do much more than that!

Come to my #AAAI2025 oral tomorrow (11:45, Room 119B) to learn more.
wolfstammer.bsky.social
Happy to share that I successfully defended my PhD on Feb 19th with distinction! My work on "The Value of Symbolic Concepts for AI Explanations and Interactions" has been a rewarding journey. Huge thanks to my mentors, peers, and committee for their support! Excited for what’s next! 🚀