@hadasorgad.bsky.social
26 followers 2 following 4 posts
Posts Media Videos Starter Packs
hadasorgad.bsky.social
Deadline extended! ⏳

The Actionable Interpretability Workshop at #ICML2025 has moved its submission deadline to May 19th. More time to submit your work 🔍🧠✨ Don’t miss out!
Reposted
amuuueller.bsky.social
Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!
Logo for MIB: A Mechanistic Interpretability Benchmark
hadasorgad.bsky.social
• Model Innovation – Designs and training inspired by interpretability.
• Impact Measurement – Benchmarks for real-world effectiveness.
• Critical Perspectives – Feasibility, limits, and future directions.

Website >>> actionable-interpretability.github.io
General Information
ICML 2025 - Vancouver
actionable-interpretability.github.io
hadasorgad.bsky.social
• Real-world Applications – Tackling bias, hallucinations, adversarial threats, and use in critical domains like healthcare, finance and cybersecurity.
• Method Comparison – Interpretability vs. alternative methods such as fine-tuning, prompting, etc.
hadasorgad.bsky.social
We aim to foster discussions on how interpretability research can inform concrete improvements in model design, safety, and robustness.

Topics of interest: ⬇️