Alexander Panfilov
banner
kotekjedi.bsky.social
Alexander Panfilov
@kotekjedi.bsky.social
PhD student @ Tübingen. Advesarial ML, AI Safety.
Reposted by Alexander Panfilov
🚀 We’ve released the source code for 𝗔𝗦𝗜𝗗𝗘 (presented as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop)!

🔍 ASIDE boosts prompt injection robustness without safety-tuning: we simply rotate embeddings of marked tokens by 90° during instruction-tuning and inference.

👇 code & docs👇
June 24, 2025 at 1:47 PM
Reposted by Alexander Panfilov
I’ll present our 𝗔𝗦𝗜𝗗𝗘 paper as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop! 🚀

✅ ASIDE = architecturally separating instructions and data in LLMs from layer 0
🔍 +12–44 pp↑ separation, no utility loss​​
📉 lowers prompt‑injection ASR (without safety tuning!)

🚀 Talk: Hall 4 #6, 28 Apr, 4:45
April 23, 2025 at 7:53 AM
Reposted by Alexander Panfilov
🚀 We’re hiring! Join Bernhard Schölkopf & me at @ellisinsttue.bsky.social to push the frontier of #AI in education!

We’re building cutting-edge, open-source AI tutoring models for high-quality, adaptive learning for all pupils with support from the Hector Foundation.

👉 forms.gle/sxvXbJhZSccr...
February 11, 2025 at 4:34 PM
Reposted by Alexander Panfilov
🚨Excited to share our new work!

1. Not only GPT-4 but also other frontier LLMs have memorized the same set of NYT articles from the lawsuit.

2. Very large models, particularly with >100B parameters, have memorized significantly more.

🧵1/n
December 9, 2024 at 10:01 PM