Cas (Stephen Casper)
banner
scasper.bsky.social
Cas (Stephen Casper)
@scasper.bsky.social
AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/
Pinned
📌📌📌
I'm excited to be on the faculty job market this fall. I just updated my website with my CV.
stephencasper.com
Stephen Casper
Visit the post for more.
stephencasper.com
I made a fully-open, living document with notes and concrete project ideas about tamper-resistance and open-weight model safety research.

You, yes you 🫵, should feel free to look, comment, or message me about it.

docs.google.com/document/d/1...
https://docs.google.com/document/d/10XkZpUabt4fEK8BUtd8Jz26-M8ARQ6c5iJCbefaUtQI/edit?usp=sharing
t.co
January 23, 2026 at 6:28 PM
Here are some miscellaneous title ideas for papers that I'm not currently working on, but sometimes daydream about. Let me know if you are thinking about anything related.
January 22, 2026 at 4:55 PM
Research on tamper-resistant machine unlearning is funny.

The SOTA, according to papers proposing techniques, is resistance to tens of thousands of adversarial fine-tuning steps.

But according to papers that do second-party red-teaming, the SOTA is just a couple hundred steps.
January 22, 2026 at 2:00 PM
To people working on adversarial vulnerabilities for safeguards against AI deepfake porn, I'm glad you're doing what you're doing. But don't forget that mitigations matter, & we're not always up against sophisticated attacks. Half the time, the perpetrators are literal teenagers.
January 13, 2026 at 2:02 PM
🧵 Non-consensual AI deepfakes are out of control. But the 1st Amendment will likely prevent the US from directly prohibiting models/apps that make producing personalized NCII trivial.

In this thread, I'll explain the problem and a 1st Amendment-compatible solution (I think).
January 12, 2026 at 7:30 PM
One example of how easily harmful derivatives of open-weight models proliferate can be found on Hugging Face. Search "uncensored" or "abliterated" in the model search bar. You'll find some 7k models fine-tuned specifically to remove safeguards.
January 10, 2026 at 2:00 PM
🚨 New paper from an awesome group led by Noam Kolt and
@nickacaputo
.

We hear a lot about what important concepts and methods from AI research that lawyers need to understand. But it's really a two-way street...

🧵🧵🧵
January 8, 2026 at 10:40 PM
Reposted by Cas (Stephen Casper)
Join us for our first CS seminar of the year, featuring @scasper.bsky.social! Learn more about his upcoming talk here: www.cs.jhu.edu/event/cs-sem... and check out other upcoming seminars here: www.cs.jhu.edu/department-s...
January 8, 2026 at 2:24 PM
🧵Thanks in part to recent attention on Grok's widespread undressing, emerging consciousness around AI nudification apps is sparking discussions on making AI undressing apps illegal. Minnesota and the UK are currently actively considering laws that would do this.
January 7, 2026 at 12:00 AM
Given the current Grok deepfake snafu on Twitter this week, I'll leave this here. We put it online a month ago.
t.co/3qWCNzoZrh
January 5, 2026 at 6:13 PM
I think these are my 4 favorite papers of 2025.
December 30, 2025 at 10:57 PM
With, e.g., OpenAI planning over 1T in commitments in the next few years, it increasingly seems that one of two bad things will inevitably happen: a bubble bursting or the concentration of obscene levels of power in tech. I don't see how this ends well.

techcrunch.com/2025/11/06/...
Sam Altman says OpenAI has $20B ARR and about $1.4 trillion in data center commitments | TechCrunch
Altman named a long list of upcoming business he thinks will generate significant revenue.
techcrunch.com
December 19, 2025 at 3:56 PM
Taking AI safety seriously means taking open-weight model safety seriously. Unfortunately, the AI safety field has historically mostly worked with closed models in mind. Here, I explain how we can meet new challenges from open models.

www.youtube.com/watch?v=VWk3...
Stephen Casper - Powerful Open-Weight AI Models: Wonderful, Terrible & Inevitable [Alignment Worksho
YouTube video by FAR․AI
www.youtube.com
December 18, 2025 at 5:04 PM
🧵🧵🧵 In the past few months, I have looked at hundreds, maybe thousands, of AI porn images/videos (for science).

Here's what I learned from our investigation of over 50 platforms, sites, apps, Discords, etc., while writing this paper.

papers.ssrn.com/sol3/papers...
December 15, 2025 at 2:59 PM
🧵 I think people often assume that AI images/video will get harder to distinguish from natural ones over time with better models.

In most (non-adversarial) cases, I expect the opposite will often apply...
December 12, 2025 at 5:00 PM
Excited that our paper has been on SSRN for 8 days, but became SSRN's most downloaded paper of the past 60 days in two ejournal categories. Glad about this -- I think this is one of the more important projects I've worked on.

papers.ssrn.com/sol3/papers....
December 11, 2025 at 7:05 PM
UK AISI is hiring for a technical research role on open-weight model safeguards.

www.aisi.gov.uk/careers
December 11, 2025 at 2:00 PM
Did you know that one base model is responsible for 94% of model-tagged NSFW AI videos on CivitAI?

This new paper studies how a small number of models power the non-consensual AI video deepfake ecosystem and why their developers could have predicted and mitigated this.
December 4, 2025 at 5:32 PM
Here are my current favorite ideas for how to improve tamper-resistant ignorance/unlearning in LLMs.

Shamelessly copied from a slack message.
November 26, 2025 at 4:00 PM
🌵🐎🤠🏜️🐄
Here's a roundup of some key papers on data filtering & safety.

Tl;DR -- Filtering harmful training data seems to effectively make models resist attacks (incl. adv. fine-tuning), but only when the filtered content is 'hard to learn' from the non-filtered content

🧵
November 25, 2025 at 8:00 PM
Reposted by Cas (Stephen Casper)
I’m pleased to share the Second Key Update to the International AI Safety Report, which outlines how AI developers, researchers, and policymakers are approaching technical risk management for general-purpose AI systems.
(1/6)
November 25, 2025 at 12:06 PM
The leaked executive order has me wondering if the term "regulatory capture" has any meaning anymore.

It appears that state AI bills -- many of which big tech has fought tooth and nail to prevent -- are categorically regulatory capture.
November 20, 2025 at 2:00 PM
Based on what I've seen lately, it sounds like rebuttals for @iclr_conf are a mess.

But in case it makes your life easier, feel free to copy or adapt my rebuttal template linked here.

docs.google.com/document/d/1...
rebuttal_template
# Thanks + response We are thankful for your time and help, especially related to [thing(s) they discussed]. We were glad to hear that you found [something nice they said]. ## 1. [Issue title] > [...
docs.google.com
November 17, 2025 at 7:54 PM
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:17 PM
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:04 PM