Lightnews — Scholar-powered news

Reposted by Egor Zverev

Sahar Abdelnabi @sahar-abdelnabi.bsky.social · 1d

✨ 𝗦𝘂𝗯𝗺𝗶𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼:
- Quick application
- Accepting posters for 2025 papers from top ML / Security venues
- 𝗗𝗲𝗮𝗱𝗹𝗶𝗻𝗲: October 28, 2025
- Notifications: October 31, 2025

Submission link: docs.google.com/forms/d/e/1F...

Workshop website: llmsafety-unconference.github.io

Submit a Paper for the ELLIS UnConference 2025 LLM Safety and Security workshop Poster Session

We’re hosting a poster session at the LLM Safety and Security Workshop (ELLIS UnConference) on December 2, 2025 in Copenhagen, Denmark. We invite attendees to present already published 2025 work in areas related to LLM safety and security. Eligibility. Posters must be based on papers accepted in 2025 at a top venue (or an associated LLM safety/security workshop). If your venue isn’t listed, please enter it manually. First-author or any co-author may present. Selection policy. We aim to accept as many posters as our space allows. If submissions exceed capacity, earlier submissions will be prioritized (first-come, first-served). Deadlines. Submission deadline: October 28, 2025 Notification: October 31, 2025

docs.google.com

1 1 2

Reposted by Egor Zverev

Sahar Abdelnabi @sahar-abdelnabi.bsky.social · 1d

📢 𝗖𝗮𝗹𝗹 𝗳𝗼𝗿 𝗣𝗼𝘀𝘁𝗲𝗿𝘀: 𝗟𝗟𝗠 𝗦𝗮𝗳𝗲𝘁𝘆 𝗮𝗻𝗱 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 @ 𝗘𝗟𝗟𝗜𝗦 𝗨𝗻𝗖𝗼𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲

📅 December 2, 2025
📍 Copenhagen

An opportunity to discuss your work with colleagues working on similar problems in LLM safety and security

1 2 2

Egor Zverev @egorzverev.bsky.social · 7d

🎉 Excited to announce the Workshop on Foundations of LLM Security at #EurIPS2025!
🇩🇰 Dec 6–7, Copenhagen!
📢 Call for contributed talks is now open! See details at llmsec-eurips.github.io

#EurIPS @euripsconf.bsky.social @sahar-abdelnabi.bsky.social @aideenfay.bsky.social @thegruel.bsky.social

1 2

Egor Zverev @egorzverev.bsky.social · 10d

Cool news: I have co-affiliated with @floriantramer.bsky.social at @ethz.ch through the #ELLIS PhD program! I will be visiting ETH for the next 3 months to work with @nkristina.bsky.social on LLM Agents Safety.

Reposted by Egor Zverev

Zeynep Akata @zeynepakata.bsky.social · Aug 28

NeurIPS has decided to do what ICLR did: As a SAC I received the message 👇 This is wrong! If the review process cannot handle so many papers, the conference needs yo split instead of arbitrarily rejecting 400 papers.

8 17 110

Reposted by Egor Zverev

Christoph Lampert (MLCV@ISTA) @mlcv-at-ista.bsky.social · Aug 28

Let's push for the obvious solution: Dear @neuripsconf.bsky.social ! Allow authors to present accepted papers at EurIPS instead of NeurIPS rather than just additionally. Likely, at least 500 papers would move to Copenhagen, problem solved.

2 15 56

Egor Zverev @egorzverev.bsky.social · Jul 25

I will be attending #ACL2025NLP next week in Vienna 🇦🇹

Simply DM me if you want to chat about LLM Safety/Security, especially topics like instruction/data separation and instruction hierarchies.

Reposted by Egor Zverev

Christoph Lampert (MLCV@ISTA) @mlcv-at-ista.bsky.social · Jul 16

Are you looking for an opportunity to do curiosity-driven basic ML research after your PhD? Look no further!
Apply for a postdoc position in my group at ISTA (ELLIS Unit Vienna)! Topics are flexible, as long as they fit to our general research group's interests, see
cvml.ista.ac.at/Postdoc-ML.h...

Machine Learning and Computer Vision Group -- Christoph Lampert -- ISTA

Computer Vision and Machine Learning, ISTA: open postdoc positions, Machine Learning, curiosity-driven, fully-funded

cvml.ista.ac.at

1 4

Reposted by Egor Zverev

EurIPS Conference @euripsconf.bsky.social · Jul 16

EurIPS is coming! 📣 Mark your calendar for Dec. 2-7, 2025 in Copenhagen 📅

EurIPS is a community-organized conference where you can present accepted NeurIPS 2025 papers, endorsed by @neuripsconf.bsky.social and @nordicair.bsky.social and is co-developed by @ellis.eu

eurips.cc

2 70 140

Egor Zverev @egorzverev.bsky.social · Jun 24

Code: github.com/egozverev/as...
Paper: arxiv.org/abs/2503.10566
Previous post: bsky.app/profile/egor...

1

Egor Zverev @egorzverev.bsky.social · Jun 24

🚀 We’ve released the source code for 𝗔𝗦𝗜𝗗𝗘 (presented as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop)!

🔍 ASIDE boosts prompt injection robustness without safety-tuning: we simply rotate embeddings of marked tokens by 90° during instruction-tuning and inference.

👇 code & docs👇

1 3 6

Reposted by Egor Zverev

Egor Zverev @egorzverev.bsky.social · Apr 23

I’ll present our 𝗔𝗦𝗜𝗗𝗘 paper as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop! 🚀

✅ ASIDE = architecturally separating instructions and data in LLMs from layer 0
🔍 +12–44 pp↑ separation, no utility loss
📉 lowers prompt‑injection ASR (without safety tuning!)

🚀 Talk: Hall 4 #6, 28 Apr, 4:45

1 4 7

Reposted by Egor Zverev

Egor Zverev @egorzverev.bsky.social · Apr 25

Tomorrow I am presenting"Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" at #ICLR2025!

Looking forward to fun discussions near the poster!

📆 Sat 26 Apr, 10:00-12:30 - Poster session 5 (#500)

Egor Zverev @egorzverev.bsky.social · Mar 18

(1/n) In our #ICLR2025 paper, we explore a fundamental issue that enables prompt injections: 𝐋𝐋𝐌𝐬’ 𝐢𝐧𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐭𝐨 𝐬𝐞𝐩𝐚𝐫𝐚𝐭𝐞 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧𝐬 𝐟𝐫𝐨𝐦 𝐝𝐚𝐭𝐚 𝐢𝐧 𝐭𝐡𝐞𝐢𝐫 𝐢𝐧𝐩𝐮𝐭.

✅ Definition of separation
👉 SEP Benchmark
🔍 LLM evals on SEP

1 3

Egor Zverev @egorzverev.bsky.social · Apr 25

Tomorrow I am presenting"Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" at #ICLR2025!

Looking forward to fun discussions near the poster!

📆 Sat 26 Apr, 10:00-12:30 - Poster session 5 (#500)

Egor Zverev @egorzverev.bsky.social · Mar 18

(1/n) In our #ICLR2025 paper, we explore a fundamental issue that enables prompt injections: 𝐋𝐋𝐌𝐬’ 𝐢𝐧𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐭𝐨 𝐬𝐞𝐩𝐚𝐫𝐚𝐭𝐞 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧𝐬 𝐟𝐫𝐨𝐦 𝐝𝐚𝐭𝐚 𝐢𝐧 𝐭𝐡𝐞𝐢𝐫 𝐢𝐧𝐩𝐮𝐭.

✅ Definition of separation
👉 SEP Benchmark
🔍 LLM evals on SEP

1 3

Egor Zverev @egorzverev.bsky.social · Apr 25

Hi Aniruddha, of course! I have a poster session on Saturday at 10am (poster number 500) + I'm at the BuiltTrust workshop for the whole day on Monday! Let's meet there!

1

Egor Zverev @egorzverev.bsky.social · Apr 23

Check ASIDE out: arxiv.org/abs/2503.10566

arxiv.org

1 1

Egor Zverev @egorzverev.bsky.social · Apr 23

I’ll present our 𝗔𝗦𝗜𝗗𝗘 paper as an 𝗢𝗿𝗮𝗹 at the #ICLR2025 BuildTrust workshop! 🚀

✅ ASIDE = architecturally separating instructions and data in LLMs from layer 0
🔍 +12–44 pp↑ separation, no utility loss
📉 lowers prompt‑injection ASR (without safety tuning!)

🚀 Talk: Hall 4 #6, 28 Apr, 4:45

1 4 7

Egor Zverev @egorzverev.bsky.social · Apr 18

Landing in Singapore for #ICLR2025 next week! DM me for a 1‑on‑1 about LLM safety, building safe LLMs by design, control and data flows, instruction–data separation and hierarchies.

I’m presenting our instruction–data separation paper plus a workshop paper—long post coming.

1 1

Egor Zverev @egorzverev.bsky.social · Mar 18

(11/n)
Paper: openreview.net/forum?id=8Et...
Code + SEP benchmark: github.com/egozverev/Sh...

GitHub - egozverev/Should-It-Be-Executed-Or-Processed: Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.

Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper. - GitHub - egozverev/Should-It-Be-Executed-Or-Processed: A...

github.com

Egor Zverev @egorzverev.bsky.social · Mar 18

(10/n) This work has truly improved since its first presentation at the SeT LLM workshop @ICLR 2024 (+whole section on increasing separation). Big shoutout to my co-authors (@sahar-abdelnabi.bsky.social, Soroush Tabesh, Mario Fritz, @mlcv-at-ista.bsky.social) and Juan Rocamonde for discussions.

1

Egor Zverev @egorzverev.bsky.social · Mar 18

(9/n) 𝐂𝐨𝐧𝐜𝐥𝐮𝐬𝐢𝐨𝐧: models cannot reliably differentiate between instructions and data, even in non-adversarial cases. We believe the ML community should look into new ways of separating instructions from data in LLMs.

1

Egor Zverev @egorzverev.bsky.social · Mar 18

(8/n) Evaluation results. We show that prompt engineering, prompt optimization and fine-tuning increase separation, but the resulting models either fall short from achieving good separation, or do it at the cost of lowering utility.

1

Egor Zverev @egorzverev.bsky.social · Mar 18

In total, the dataset consists of 9160 elements. Example (note we insert probe in 4 different ways, here it’s inserted in the end):

1

Egor Zverev @egorzverev.bsky.social · Mar 18

(7/n) We created a SEP dataset suitable for this definition: we manually wrote 30 general tasks, then used GPT-4 to create 300 subtasks from them. For each of the subtasks, we generated a set of inputs. Then we manually wrote a set of 100 secondary instructions.

1

Egor Zverev @egorzverev.bsky.social · Mar 18

(6/n) In our evaluations, we define an empirical separation score. Model gets a high score if the instruction is executed when it’s in the “instruction” part and not executed when it’s in the “data” part.

1