C Emde
@cemde.bsky.social
76 followers 150 following 16 posts
ML Research Scientist at Oxford. DPhil student @compscioxford.bsky.social and TVGOxford. Ex ML Researcher @ Wise. Deep Learning | ML Robustness | AI Safety | Uncertainty Quantification
Posts Media Videos Starter Packs
cemde.bsky.social
See our poster today

Poster Session 1 @ 10am

Hall 3 + Hall 2B #239
cemde.bsky.social
🚨 New paper alert: Our recent work on LLM safety has been accepted to ICLR 2025 🇸🇬

We propose a new framework for LLMs safety. 🧵

(1/7)

#LLM #AISafety #ICLR2025 #Certification #AdversarialRobustness #NLP #Shhhhhh #DomainCertification #AI
a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .
ALT: a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .
media.tenor.com
cemde.bsky.social
To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a **global bound in prompt space** 🚀

(6/7)
cemde.bsky.social
A Domain Certificate bounds the adversarial risk of the model producing out-of-domain responses:

(5/7)
cemde.bsky.social
We are tired of the cat 🐈 and mouse 🐁 game of attacks and defenses. Hence, we propose :
- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and effective test-time algorithm.

(4/7)
cemde.bsky.social
Example: Can't afford Github Copilot? 💡 Use the Amazon Shopping App.

(3/7)
cemde.bsky.social
Consider an LLM deployed for a specific purpose like a medical chatbot. Such model should **only** respond to medical questions.

⚠️ Problem: LLMs are very capable and vulnerable to respond to **any** queries: how to build a bomb, organize tax fraud etc.

(2/7)
cemde.bsky.social
🚨 New paper alert: Our recent work on LLM safety has been accepted to ICLR 2025 🇸🇬

We propose a new framework for LLMs safety. 🧵

(1/7)

#LLM #AISafety #ICLR2025 #Certification #AdversarialRobustness #NLP #Shhhhhh #DomainCertification #AI
a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .
ALT: a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .
media.tenor.com
cemde.bsky.social
🎉I know I'm late to the party, but super excited that I got 3/3 accepted at #ICLR2025 including 1 spotlight 🔎
- Shh, dont say that! Domain Certification in LLMs
- Towards Certification of Uncertainty Calibration under Adversarial Attacks
- Benchmarking Predictive Coding Networks
SeeYouInSingapore🇸🇬 ✈️
cemde.bsky.social
Interested? Want to learn more?

Join us at the SoLaR workshop tomorrow.
- 🕚 When: Tomorrow, 14 Dec, from 11pm to 13pm.
- 🗺️ Where: West meeting rooms 121 and 122 here in Vancouver.

(5/6)
cemde.bsky.social
Our method enables strong LLM performance while providing adversarial guarantees on out-of-domain behaviour.

(4/6)
cemde.bsky.social
We are tired of the 🐈 and 🐁 game of attacks and defenses. Hence, we propose:

- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and efficient test-time algorithm.

(3/6)
cemde.bsky.social
It is known that fine-tuned foundation models are adversarially vulnerable to provide responses to questions they should not answer.

(2/6)

For instance: Can't afford ChatGPT Plus? Use a shopping app instead.
cemde.bsky.social
Are you scared users might misappropriate your LLM system? 😱

We were scared too! Now we introduce adversarial certificates on the misuse of LLMs. 🤖

Come and see our poster SoLaR Workshop tomorrow.

#NeurIPS2024 #NeurIPS #AI #NLP #LLM #DomainCertification #Shhhhhhhh
cemde.bsky.social
Great work! You might find our SoLaR paper interesting: We propose a certification framework for LLM systems to stay on-topic and not respond to such questions: openreview.net/pdf?id=brDLU...
openreview.net
Reposted by C Emde
exeter.ox.ac.uk
The first snow in Exeter College this morning ❄️

#ExeterCollegeOxford #OxfordUniversity #Snowing
A snow cat with the Radcliffe Camera behind The Radcliffe Camera The Fellows Garden