Cas (Stephen Casper)
banner
scasper.bsky.social
Cas (Stephen Casper)
@scasper.bsky.social
AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/
The 2026 International AI Safety Report has 221 pages of cutting-edge research from a team of 36 writers and over 100 contributors referencing 1451 citations...

...And also this cheeky riddle that I wrote on page 23 (original content).

internationalaisafetyreport.org/publication/...
February 3, 2026 at 2:48 PM
Turns out, there are a TON of image/video AI models hosted on CivitAI with dogwhistles for NCII and/or CSAM in their names. 👀

Max Kamachee and I just updated our "Video Deepfake Abuse" paper with this new fig:

🔗 papers.ssrn.com/sol3/papers....
January 30, 2026 at 9:43 PM
This is not a new report (it's from last summer). But it's now finally available on SSRN, more accessibly than before. Great working with Claire Short on this.

papers.ssrn.com/sol3/papers....
January 27, 2026 at 12:21 PM
One example of how easily harmful derivatives of open-weight models proliferate can be found on Hugging Face. Search "uncensored" or "abliterated" in the model search bar. You'll find some 7k models fine-tuned specifically to remove safeguards.
January 10, 2026 at 2:00 PM
🚨 New paper from an awesome group led by Noam Kolt and
@nickacaputo
.

We hear a lot about what important concepts and methods from AI research that lawyers need to understand. But it's really a two-way street...

🧵🧵🧵
January 8, 2026 at 10:40 PM
🧵Thanks in part to recent attention on Grok's widespread undressing, emerging consciousness around AI nudification apps is sparking discussions on making AI undressing apps illegal. Minnesota and the UK are currently actively considering laws that would do this.
January 7, 2026 at 12:00 AM
Given the current Grok deepfake snafu on Twitter this week, I'll leave this here. We put it online a month ago.
t.co/3qWCNzoZrh
January 5, 2026 at 6:13 PM
I think these are my 4 favorite papers of 2025.
December 30, 2025 at 10:57 PM
10/16: Here is a snap of a quote from a Reddit user that we felt sufficiently unsettled by to include in the paper.
December 15, 2025 at 2:59 PM
🧵🧵🧵 In the past few months, I have looked at hundreds, maybe thousands, of AI porn images/videos (for science).

Here's what I learned from our investigation of over 50 platforms, sites, apps, Discords, etc., while writing this paper.

papers.ssrn.com/sol3/papers...
December 15, 2025 at 2:59 PM
🧵 I think people often assume that AI images/video will get harder to distinguish from natural ones over time with better models.

In most (non-adversarial) cases, I expect the opposite will often apply...
December 12, 2025 at 5:00 PM
Excited that our paper has been on SSRN for 8 days, but became SSRN's most downloaded paper of the past 60 days in two ejournal categories. Glad about this -- I think this is one of the more important projects I've worked on.

papers.ssrn.com/sol3/papers....
December 11, 2025 at 7:05 PM
UK AISI is hiring for a technical research role on open-weight model safeguards.

www.aisi.gov.uk/careers
December 11, 2025 at 2:00 PM
Unfortunately, though, the relationships between developer choices and downstream harms can be difficult to study because prominent open-weight AI video model developers rarely report on safeguards against harmful adaptation of their models.
December 4, 2025 at 5:32 PM
...For example, Stable Diffusion 1.x models were trained on lots of porn, while Stable Diffusion 2.x models were trained on filtered SFW data. This seems to have made a big empirical difference. SD 1.x models are responsible for >1,000x more NSFW content on CivitAI than SD 2.x.
December 4, 2025 at 5:32 PM
Currently, a small number of open-weight models are dominantly used for NSFW video generation, including Wan2.x. Variants of these models specialized for NSFW videos are widely shared across a small number of key online distribution platforms, including CivitAI.
December 4, 2025 at 5:32 PM
We study the supply chain behind video-NCII capabilities, observing how model developers and distribution platforms represent critical bottlenecks.
December 4, 2025 at 5:32 PM
For example, ActiveFence found a 400% increase in web threads related to AI-generated non-consensual intimate imagery (NCII) between 2022 and 2023.
December 4, 2025 at 5:32 PM
Did you know that one base model is responsible for 94% of model-tagged NSFW AI videos on CivitAI?

This new paper studies how a small number of models power the non-consensual AI video deepfake ecosystem and why their developers could have predicted and mitigated this.
December 4, 2025 at 5:32 PM
Here are my current favorite ideas for how to improve tamper-resistant ignorance/unlearning in LLMs.

Shamelessly copied from a slack message.
November 26, 2025 at 4:00 PM
For more thoughts, see our agenda paper.

t.co/CVkAKNXZme
November 25, 2025 at 8:00 PM
In general, it's still hard to study the impacts of data filtering because the experiments are expensive, & developers don't generally report much about what they do. For example, we found very limited/inconsistent reporting in some recent analysis.
t.co/CVkAKNXZme
November 25, 2025 at 8:00 PM
5. Biorisk evals paper (Nov 2025)

They tested filtration of species/genus data against adv. fine-tuning. It didn't work well. This suggests filtering may work better if applied to entire tasks/domains rather than specific instances.

arxiv.org/abs/2510.27629
November 25, 2025 at 8:00 PM
4. Deep ignorance paper (August 2025) @kyletokens.bsky.social

We showed that filtering biothreat-related pretraining data is SOTA for making models resist adversarial fine-tuning. We proposed an amendment to the hypothesis from papers 1 and 2 above.

deepignorance.ai
November 25, 2025 at 8:00 PM
3. Estimating worst-case open-weight risks paper (Aug 2025)

They reported an instance where filtering biothreat data didn't have a big impact. But without more info on how and how much they filtered, it's hard to draw strong conclusions.

arxiv.org/abs/2508.03153
November 25, 2025 at 8:00 PM