Reposted by: André Panisson, Nichola Raihani, Alison Powell , and 1 more André Panisson, Nichola Raihani, Alison Powell, Jae-Young Son
The way you conceptualize AI systems affects how you interact with them, do science on them, and create policy and apply laws to them.
Hope you will check it out!
www.science.org/doi/full/10....
#LLMs #AI #Interpretability
Reposted by: André Panisson
Presents a framework categorizing MLLM explainability across data, model, and training perspectives to enhance transparency and trustworthiness.
📝 arxiv.org/abs/2412.02104
Reposted by: André Panisson
#ERCCoG award for #RUNES. For the next five years, I will be working on the mathematical, computational, and experimental (!!) sides to understand how higher-order interactions change how we think and coordinate.
arxiv.org/abs/2411.14257
arxiv.org/abs/2406.04093
Reposted by: André Panisson
Reposted by: André Panisson
by @norabelrose.bsky.social et al.
An open-source pipeline for finding interpretable features in LLMs with sparse autoencoders and automated explainability methods from @eleutherai.bsky.social.
arxiv.org/abs/2410.13928
📍 “A True-to-the-Model Axiomatic Benchmark for Graph-based Explainers”
🗓️ Tuesday 4–6 PM CET
📌 Poster Session 2, GatherTown
Join us to discuss graph ML explainability and benchmarks
#ExplainableAI #GraphML
openreview.net/forum?id=HSQTv3R8Iz
Reposted by: André Panisson
-NeurIPS2024 Communication Chairs
by Stefan M. Herzog — Reposted by: André Panisson
How can AI *boost* human decision-making instead of replacing it? We talk about this in our new paper.
doi.org/10.1037/dec0...
#AI #XAI #InterpretableAI #IAI #boosting #competences
🧵👇
Reposted by: André Panisson
But then I found the paper "Mechanistic?" by
@nsaphra.bsky.social and @sarah-nlp.bsky.social, which clarified things.
Reposted by: André Panisson, Rachel Killean
💙, Mar🐫
openreview.net/forum?id=WCR...
They simplify tuning with k-sparse autoencoders and results show many improvements in explainability. Code, models (not all!) and visualizer included. openreview.net/forum?id=tcs...