Dilyara Bareeva
dilya.bsky.social
Dilyara Bareeva
@dilya.bsky.social
PhD Candidate in Interpretability @FraunhoferHHI | 📍Berlin, Germany
dilyabareeva.github.io
Huge thanks to my fantastic co-authors Marina MC Höhne, Alexander Warnecke, @lpirch.bsky.social, Klaus-Robert Müller, @rieck.mlsec.org, @slapuschkin.bsky.social, @kirillbykov.bsky.social, and to the UMI Lab, @aifraunhoferhhi.bsky.social, @xai-berlin.bsky.social and @bifold.berlin for the support!
November 29, 2025 at 4:38 PM
Our lightweight adversarial fine-tuning attack lets you bend a feature to visualize any arbitrary concept. Off-manifold, we impose a hyperbolic activation landscape with its optimum at the target, while preserving on-distribution activations through a weighted two-term loss. 🕵️‍♀️
November 29, 2025 at 4:38 PM