Lightnews — Scholar-powered news

Alexander Panfilov

@kotekjedi.bsky.social

16 followers 19 following 0 posts

PhD student @ Tübingen. Advesarial ML, AI Safety.

Posts Replies Media Videos

Reposted by Alexander Panfilov

Egor Zverev

@egorzverev.bsky.social

🚀 We’ve released the source code for 𝗔𝗦𝗜𝗗𝗘 (presented as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop)!

🔍 ASIDE boosts prompt injection robustness without safety-tuning: we simply rotate embeddings of marked tokens by 90° during instruction-tuning and inference.

👇 code & docs👇

June 24, 2025 at 1:47 PM

Reposted by Alexander Panfilov

Egor Zverev

@egorzverev.bsky.social

I’ll present our 𝗔𝗦𝗜𝗗𝗘 paper as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop! 🚀

✅ ASIDE = architecturally separating instructions and data in LLMs from layer 0
🔍 +12–44 pp↑ separation, no utility loss
📉 lowers prompt‑injection ASR (without safety tuning!)

🚀 Talk: Hall 4 #6, 28 Apr, 4:45

April 23, 2025 at 7:53 AM

Reposted by Alexander Panfilov

Wieland Brendel

@wielandbrendel.bsky.social

🚀 We’re hiring! Join Bernhard Schölkopf & me at @ellisinsttue.bsky.social to push the frontier of #AI in education!

We’re building cutting-edge, open-source AI tutoring models for high-quality, adaptive learning for all pupils with support from the Hector Foundation.

👉 forms.gle/sxvXbJhZSccr...

February 11, 2025 at 4:34 PM

Reposted by Alexander Panfilov

Maksym Andriushchenko

@maksym-andr.bsky.social

🚨Excited to share our new work!

1. Not only GPT-4 but also other frontier LLMs have memorized the same set of NYT articles from the lawsuit.

2. Very large models, particularly with >100B parameters, have memorized significantly more.

🧵1/n

December 9, 2024 at 10:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news