Lightnews — Scholar-powered news

Cas (Stephen Casper)

@scasper.bsky.social

I made a fully-open, living document with notes and concrete project ideas about tamper-resistance and open-weight model safety research.

You, yes you 🫵, should feel free to look, comment, or message me about it.

docs.google.com/document/d/1...

https://docs.google.com/document/d/10XkZpUabt4fEK8BUtd8Jz26-M8ARQ6c5iJCbefaUtQI/edit?usp=sharing

t.co

January 23, 2026 at 6:28 PM

Cas (Stephen Casper)

@scasper.bsky.social

Here are some miscellaneous title ideas for papers that I'm not currently working on, but sometimes daydream about. Let me know if you are thinking about anything related.

January 22, 2026 at 4:55 PM

Cas (Stephen Casper)

@scasper.bsky.social

Research on tamper-resistant machine unlearning is funny.

The SOTA, according to papers proposing techniques, is resistance to tens of thousands of adversarial fine-tuning steps.

But according to papers that do second-party red-teaming, the SOTA is just a couple hundred steps.

January 22, 2026 at 2:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

To people working on adversarial vulnerabilities for safeguards against AI deepfake porn, I'm glad you're doing what you're doing. But don't forget that mitigations matter, & we're not always up against sophisticated attacks. Half the time, the perpetrators are literal teenagers.

January 13, 2026 at 2:02 PM

Cas (Stephen Casper)

@scasper.bsky.social

🧵 Non-consensual AI deepfakes are out of control. But the 1st Amendment will likely prevent the US from directly prohibiting models/apps that make producing personalized NCII trivial.

In this thread, I'll explain the problem and a 1st Amendment-compatible solution (I think).

January 12, 2026 at 7:30 PM

Cas (Stephen Casper)

@scasper.bsky.social

One example of how easily harmful derivatives of open-weight models proliferate can be found on Hugging Face. Search "uncensored" or "abliterated" in the model search bar. You'll find some 7k models fine-tuned specifically to remove safeguards.

January 10, 2026 at 2:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

🚨 New paper from an awesome group led by Noam Kolt and
@nickacaputo
.

We hear a lot about what important concepts and methods from AI research that lawyers need to understand. But it's really a two-way street...

🧵🧵🧵

January 8, 2026 at 10:40 PM

Reposted by Cas (Stephen Casper)

JHU Computer Science

@jhucompsci.bsky.social

Join us for our first CS seminar of the year, featuring @scasper.bsky.social! Learn more about his upcoming talk here: www.cs.jhu.edu/event/cs-sem... and check out other upcoming seminars here: www.cs.jhu.edu/department-s...

Computer Science Seminar Series: Making Robust AI Safeguards Run Deep. January 15, 2026, 228 Malone Hall. Refreshments available 12:15 p.m. Seminar begins 12:30 p.m. Stephen Casper, Massachusetts Institute of Technology.

January 8, 2026 at 2:24 PM

Cas (Stephen Casper)

@scasper.bsky.social

🧵Thanks in part to recent attention on Grok's widespread undressing, emerging consciousness around AI nudification apps is sparking discussions on making AI undressing apps illegal. Minnesota and the UK are currently actively considering laws that would do this.

January 7, 2026 at 12:00 AM

Cas (Stephen Casper)

@scasper.bsky.social

Given the current Grok deepfake snafu on Twitter this week, I'll leave this here. We put it online a month ago.
t.co/3qWCNzoZrh

January 5, 2026 at 6:13 PM

Cas (Stephen Casper)

@scasper.bsky.social

I think these are my 4 favorite papers of 2025.

December 30, 2025 at 10:57 PM

Cas (Stephen Casper)

@scasper.bsky.social

With, e.g., OpenAI planning over 1T in commitments in the next few years, it increasingly seems that one of two bad things will inevitably happen: a bubble bursting or the concentration of obscene levels of power in tech. I don't see how this ends well.

techcrunch.com/2025/11/06/...

Sam Altman says OpenAI has $20B ARR and about $1.4 trillion in data center commitments | TechCrunch

Altman named a long list of upcoming business he thinks will generate significant revenue.

techcrunch.com

December 19, 2025 at 3:56 PM

Cas (Stephen Casper)

@scasper.bsky.social

Taking AI safety seriously means taking open-weight model safety seriously. Unfortunately, the AI safety field has historically mostly worked with closed models in mind. Here, I explain how we can meet new challenges from open models.

www.youtube.com/watch?v=VWk3...

Stephen Casper - Powerful Open-Weight AI Models: Wonderful, Terrible & Inevitable [Alignment Worksho

YouTube video by FAR․AI

www.youtube.com

December 18, 2025 at 5:04 PM

Cas (Stephen Casper)

@scasper.bsky.social

🧵🧵🧵 In the past few months, I have looked at hundreds, maybe thousands, of AI porn images/videos (for science).

Here's what I learned from our investigation of over 50 platforms, sites, apps, Discords, etc., while writing this paper.

papers.ssrn.com/sol3/papers...

December 15, 2025 at 2:59 PM

Cas (Stephen Casper)

@scasper.bsky.social

🧵 I think people often assume that AI images/video will get harder to distinguish from natural ones over time with better models.

In most (non-adversarial) cases, I expect the opposite will often apply...

December 12, 2025 at 5:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

Excited that our paper has been on SSRN for 8 days, but became SSRN's most downloaded paper of the past 60 days in two ejournal categories. Glad about this -- I think this is one of the more important projects I've worked on.

papers.ssrn.com/sol3/papers....

December 11, 2025 at 7:05 PM

Cas (Stephen Casper)

@scasper.bsky.social

UK AISI is hiring for a technical research role on open-weight model safeguards.

www.aisi.gov.uk/careers

December 11, 2025 at 2:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

Did you know that one base model is responsible for 94% of model-tagged NSFW AI videos on CivitAI?

This new paper studies how a small number of models power the non-consensual AI video deepfake ecosystem and why their developers could have predicted and mitigated this.

December 4, 2025 at 5:32 PM

Cas (Stephen Casper)

@scasper.bsky.social

Here are my current favorite ideas for how to improve tamper-resistant ignorance/unlearning in LLMs.

Shamelessly copied from a slack message.

November 26, 2025 at 4:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

🌵🐎🤠🏜️🐄
Here's a roundup of some key papers on data filtering & safety.

Tl;DR -- Filtering harmful training data seems to effectively make models resist attacks (incl. adv. fine-tuning), but only when the filtered content is 'hard to learn' from the non-filtered content

🧵

November 25, 2025 at 8:00 PM

Reposted by Cas (Stephen Casper)

Yoshua Bengio

@yoshuabengio.bsky.social

I’m pleased to share the Second Key Update to the International AI Safety Report, which outlines how AI developers, researchers, and policymakers are approaching technical risk management for general-purpose AI systems.
(1/6)

November 25, 2025 at 12:06 PM

Cas (Stephen Casper)

@scasper.bsky.social

The leaked executive order has me wondering if the term "regulatory capture" has any meaning anymore.

It appears that state AI bills -- many of which big tech has fought tooth and nail to prevent -- are categorically regulatory capture.

November 20, 2025 at 2:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

Based on what I've seen lately, it sounds like rebuttals for @iclr_conf are a mess.

But in case it makes your life easier, feel free to copy or adapt my rebuttal template linked here.

docs.google.com/document/d/1...

rebuttal_template

# Thanks + response We are thankful for your time and help, especially related to [thing(s) they discussed]. We were glad to hear that you found [something nice they said]. ## 1. [Issue title] > [...

docs.google.com

November 17, 2025 at 7:54 PM

Cas (Stephen Casper)

@scasper.bsky.social

🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵

November 12, 2025 at 2:17 PM

Cas (Stephen Casper)

@scasper.bsky.social

🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵

November 12, 2025 at 2:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news