Egor Zverev
@egorzverev.bsky.social
46 followers 85 following 23 posts
ml safety researcher | visiting phd student @ETHZ | doing phd @ISTA | prev. @phystech | prev. developer @GSOC | love poetry
Posts Media Videos Starter Packs
Reposted by Egor Zverev
sahar-abdelnabi.bsky.social
📢 𝗖𝗮𝗹𝗹 𝗳𝗼𝗿 𝗣𝗼𝘀𝘁𝗲𝗿𝘀: 𝗟𝗟𝗠 𝗦𝗮𝗳𝗲𝘁𝘆 𝗮𝗻𝗱 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 @ 𝗘𝗟𝗟𝗜𝗦 𝗨𝗻𝗖𝗼𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲

📅 December 2, 2025
📍 Copenhagen

An opportunity to discuss your work with colleagues working on similar problems in LLM safety and security
egorzverev.bsky.social
🎉 Excited to announce the Workshop on Foundations of LLM Security at #EurIPS2025!
🇩🇰 Dec 6–7, Copenhagen!
📢 Call for contributed talks is now open! See details at llmsec-eurips.github.io

#EurIPS @euripsconf.bsky.social @sahar-abdelnabi.bsky.social @aideenfay.bsky.social @thegruel.bsky.social
egorzverev.bsky.social
Cool news: I have co-affiliated with @floriantramer.bsky.social at @ethz.ch through the #ELLIS PhD program! I will be visiting ETH for the next 3 months to work with @nkristina.bsky.social on LLM Agents Safety.
Reposted by Egor Zverev
zeynepakata.bsky.social
NeurIPS has decided to do what ICLR did: As a SAC I received the message 👇 This is wrong! If the review process cannot handle so many papers, the conference needs yo split instead of arbitrarily rejecting 400 papers.
Reposted by Egor Zverev
mlcv-at-ista.bsky.social
Let's push for the obvious solution: Dear @neuripsconf.bsky.social ! Allow authors to present accepted papers at EurIPS instead of NeurIPS rather than just additionally. Likely, at least 500 papers would move to Copenhagen, problem solved.
egorzverev.bsky.social
I will be attending #ACL2025NLP next week in Vienna 🇦🇹

Simply DM me if you want to chat about LLM Safety/Security, especially topics like instruction/data separation and instruction hierarchies.
Reposted by Egor Zverev
mlcv-at-ista.bsky.social
Are you looking for an opportunity to do curiosity-driven basic ML research after your PhD? Look no further!
Apply for a postdoc position in my group at ISTA (ELLIS Unit Vienna)! Topics are flexible, as long as they fit to our general research group's interests, see
cvml.ista.ac.at/Postdoc-ML.h...
Machine Learning and Computer Vision Group -- Christoph Lampert -- ISTA
Computer Vision and Machine Learning, ISTA: open postdoc positions, Machine Learning, curiosity-driven, fully-funded
cvml.ista.ac.at
Reposted by Egor Zverev
euripsconf.bsky.social
EurIPS is coming! 📣 Mark your calendar for Dec. 2-7, 2025 in Copenhagen 📅

EurIPS is a community-organized conference where you can present accepted NeurIPS 2025 papers, endorsed by @neuripsconf.bsky.social and @nordicair.bsky.social and is co-developed by @ellis.eu

eurips.cc
egorzverev.bsky.social
🚀 We’ve released the source code for 𝗔𝗦𝗜𝗗𝗘 (presented as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop)!

🔍 ASIDE boosts prompt injection robustness without safety-tuning: we simply rotate embeddings of marked tokens by 90° during instruction-tuning and inference.

👇 code & docs👇
Reposted by Egor Zverev
egorzverev.bsky.social
I’ll present our 𝗔𝗦𝗜𝗗𝗘 paper as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop! 🚀

✅ ASIDE = architecturally separating instructions and data in LLMs from layer 0
🔍 +12–44 pp↑ separation, no utility loss​​
📉 lowers prompt‑injection ASR (without safety tuning!)

🚀 Talk: Hall 4 #6, 28 Apr, 4:45
Reposted by Egor Zverev
egorzverev.bsky.social
Tomorrow I am presenting"Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" at #ICLR2025!

Looking forward to fun discussions near the poster!

📆 Sat 26 Apr, 10:00-12:30 - Poster session 5 (#500)
egorzverev.bsky.social
(1/n) In our #ICLR2025 paper, we explore a fundamental issue that enables prompt injections: 𝐋𝐋𝐌𝐬’ 𝐢𝐧𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐭𝐨 𝐬𝐞𝐩𝐚𝐫𝐚𝐭𝐞 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧𝐬 𝐟𝐫𝐨𝐦 𝐝𝐚𝐭𝐚 𝐢𝐧 𝐭𝐡𝐞𝐢𝐫 𝐢𝐧𝐩𝐮𝐭.

✅ Definition of separation
👉 SEP Benchmark
🔍 LLM evals on SEP
egorzverev.bsky.social
Tomorrow I am presenting"Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" at #ICLR2025!

Looking forward to fun discussions near the poster!

📆 Sat 26 Apr, 10:00-12:30 - Poster session 5 (#500)
egorzverev.bsky.social
(1/n) In our #ICLR2025 paper, we explore a fundamental issue that enables prompt injections: 𝐋𝐋𝐌𝐬’ 𝐢𝐧𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐭𝐨 𝐬𝐞𝐩𝐚𝐫𝐚𝐭𝐞 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧𝐬 𝐟𝐫𝐨𝐦 𝐝𝐚𝐭𝐚 𝐢𝐧 𝐭𝐡𝐞𝐢𝐫 𝐢𝐧𝐩𝐮𝐭.

✅ Definition of separation
👉 SEP Benchmark
🔍 LLM evals on SEP
egorzverev.bsky.social
Hi Aniruddha, of course! I have a poster session on Saturday at 10am (poster number 500) + I'm at the BuiltTrust workshop for the whole day on Monday! Let's meet there!
egorzverev.bsky.social
I’ll present our 𝗔𝗦𝗜𝗗𝗘 paper as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop! 🚀

✅ ASIDE = architecturally separating instructions and data in LLMs from layer 0
🔍 +12–44 pp↑ separation, no utility loss​​
📉 lowers prompt‑injection ASR (without safety tuning!)

🚀 Talk: Hall 4 #6, 28 Apr, 4:45
egorzverev.bsky.social
Landing in Singapore for #ICLR2025 next week! DM me for a 1‑on‑1 about LLM safety, building safe LLMs by design, control and data flows, instruction–data separation and hierarchies.

I’m presenting our instruction–data separation paper plus a workshop paper—long post coming.
egorzverev.bsky.social
(10/n) This work has truly improved since its first presentation at the SeT LLM workshop @ICLR 2024 (+whole section on increasing separation). Big shoutout to my co-authors (@sahar-abdelnabi.bsky.social, Soroush Tabesh, Mario Fritz, @mlcv-at-ista.bsky.social) and Juan Rocamonde for discussions.
egorzverev.bsky.social
(9/n) 𝐂𝐨𝐧𝐜𝐥𝐮𝐬𝐢𝐨𝐧: models cannot reliably differentiate between instructions and data, even in non-adversarial cases. We believe the ML community should look into new ways of separating instructions from data in LLMs.
egorzverev.bsky.social
(8/n) Evaluation results. We show that prompt engineering, prompt optimization and fine-tuning increase separation, but the resulting models either fall short from achieving good separation, or do it at the cost of lowering utility.
egorzverev.bsky.social
In total, the dataset consists of 9160 elements. Example (note we insert probe in 4 different ways, here it’s inserted in the end):
egorzverev.bsky.social
(7/n) We created a SEP dataset suitable for this definition: we manually wrote 30 general tasks, then used GPT-4 to create 300 subtasks from them. For each of the subtasks, we generated a set of inputs. Then we manually wrote a set of 100 secondary instructions.
egorzverev.bsky.social
(6/n) In our evaluations, we define an empirical separation score. Model gets a high score if the instruction is executed when it’s in the “instruction” part and not executed when it’s in the “data” part.