Lightnews — Scholar-powered news

Cooper

@afedercooper.bsky.social

470 followers 210 following 230 posts

ML researcher

https://afedercooper.info

Posts Replies Media Videos

Cooper

@afedercooper.bsky.social

We extracted large portions of 12 books across all of our experiments. But we didn’t always extract training data.

Here’s some generated text from GPT-4.1 when we try to extract A Game of Thrones (GoT). On vibes, this text might look like GoT, but it isn’t near-verbatim.

This _isn’t_ extraction.

Showing an example of generated text from GPT-4.1 (right) that is _not_ extracted training data. It might in some ways resemble the ground-truth example of text from A Game of Thrones (left), but this is _not_ extraction.

January 7, 2026 at 8:31 PM

Cooper

@afedercooper.bsky.social

(1) Instruct the LLM to continue a short prefix from a book (using Best-of-N jailbreaking for Claude 3.7 Sonnet + GPT-4.1)

(2) If successful (it wasn’t always), repeatedly query to continue the book

In our main results, the short prefix in (1) is the only ground-truth text provided to the LLM

Screenshot of the Claude 3.7 Sonnet UI, showing the initial jailbreak and first extracted portion of Harry Potter and the Sorcerer's Stone.

January 7, 2026 at 8:31 PM

Cooper

@afedercooper.bsky.social

We extracted 4.0% of Harry Potter from jailbroken GPT-4.1.

For Gemini 2.5 Pro and Grok 3, we _didn't_ need to jailbreak, and got 76.8% and 70.3% from each.

It was relatively simple to evade guardrails with two steps:

Bar plot showing the most amount of extraction we obtained in a single run per LLM for Harry Potter and the Sorcerer's Stone. For Claude 3.7 Sonnet, we extracted 95.8% of the book using Best-of-N jailbreaking (N=258). For Gemini 2.5 Pro, we didn't need to jailbreak (N=0) and got 76.8% of the book. For Grok 3, we got 70.3% (N=0) and for GPT-4.1 we got 4.0% (N=5179).

January 7, 2026 at 8:31 PM

Cooper

@afedercooper.bsky.social

We extracted (parts of) 12 books in experiments with 4 frontier-lab, production LLMs.

We prompted the LLMs with a short prefix of a book and asked them to complete the rest. For Harry Potter and the Sorcerer’s Stone, we extracted 95.8% of the book from jailbroken Claude 3.7 Sonnet.

Screenshot of the paper title with authors listed

January 7, 2026 at 8:31 PM

Cooper

@afedercooper.bsky.social

Using just the first line of chapter 1 (60 tokens), we can deterministically generate a near-exact copy of the entire ~300 page book (!!!).

(~300 book-length pages of basically no diff! Cosine similarity of 0.9999; greedy approx. of word-level LCS of 0.992)

4/8

July 14, 2025 at 2:37 PM

Cooper

@afedercooper.bsky.social

made a friend

June 9, 2025 at 8:26 PM

Cooper

@afedercooper.bsky.social

oh to be a seal in a tide pool, instead of a grouch on the internet

June 8, 2025 at 5:46 PM

Cooper

@afedercooper.bsky.social

We’ve been receiving a bunch of questions about a CFP for GenLaw 2025.

We wanted to let you know that we chose not to submit a workshop proposal this year (we need a break!!). We’ll be at ICML though and look forward to catching up there!

You can watch our prior videos!

Small robot smoking and waving with their right hand

March 9, 2025 at 8:33 PM

Cooper

@afedercooper.bsky.social

Excited to announce that our paper on training-data memorization and copyright has been accepted for presentation at ACM CSLaw '25!

The law review version is in press, forthcoming in early 2025.

arxiv.org/abs/2404.12590

Derek Zoolander and Hansel staring at an orange, early 2000s iMac desktop, wondering how to get files out from inside the computer

December 4, 2024 at 12:31 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news