Dilyara Bareeva
dilya.bsky.social
Dilyara Bareeva
@dilya.bsky.social
PhD Candidate in Interpretability @FraunhoferHHI | 📍Berlin, Germany
dilyabareeva.github.io
Our lightweight adversarial fine-tuning attack lets you bend a feature to visualize any arbitrary concept. Off-manifold, we impose a hyperbolic activation landscape with its optimum at the target, while preserving on-distribution activations through a weighted two-term loss. 🕵️‍♀️
November 29, 2025 at 4:38 PM
✈️🇲🇽 Next Wednesday (Dec 3), 1–4 p.m. CST, I’ll be presenting Manipulating Feature Visualizations with Gradient Slingshots at NeurIPS 2025 in Mexico City!

Feature Visualization has long been a staple interpretability tool. Our work shows it’s far from reliable! 🚨
November 29, 2025 at 4:38 PM