Here’s some generated text from GPT-4.1 when we try to extract A Game of Thrones (GoT). On vibes, this text might look like GoT, but it isn’t near-verbatim.
This _isn’t_ extraction.
Here’s some generated text from GPT-4.1 when we try to extract A Game of Thrones (GoT). On vibes, this text might look like GoT, but it isn’t near-verbatim.
This _isn’t_ extraction.
(2) If successful (it wasn’t always), repeatedly query to continue the book
In our main results, the short prefix in (1) is the only ground-truth text provided to the LLM
(2) If successful (it wasn’t always), repeatedly query to continue the book
In our main results, the short prefix in (1) is the only ground-truth text provided to the LLM
For Gemini 2.5 Pro and Grok 3, we _didn't_ need to jailbreak, and got 76.8% and 70.3% from each.
It was relatively simple to evade guardrails with two steps:
For Gemini 2.5 Pro and Grok 3, we _didn't_ need to jailbreak, and got 76.8% and 70.3% from each.
It was relatively simple to evade guardrails with two steps:
We prompted the LLMs with a short prefix of a book and asked them to complete the rest. For Harry Potter and the Sorcerer’s Stone, we extracted 95.8% of the book from jailbroken Claude 3.7 Sonnet.
We prompted the LLMs with a short prefix of a book and asked them to complete the rest. For Harry Potter and the Sorcerer’s Stone, we extracted 95.8% of the book from jailbroken Claude 3.7 Sonnet.
(~300 book-length pages of basically no diff! Cosine similarity of 0.9999; greedy approx. of word-level LCS of 0.992)
4/8
(~300 book-length pages of basically no diff! Cosine similarity of 0.9999; greedy approx. of word-level LCS of 0.992)
4/8
We wanted to let you know that we chose not to submit a workshop proposal this year (we need a break!!). We’ll be at ICML though and look forward to catching up there!
You can watch our prior videos!
We wanted to let you know that we chose not to submit a workshop proposal this year (we need a break!!). We’ll be at ICML though and look forward to catching up there!
You can watch our prior videos!
The law review version is in press, forthcoming in early 2025.
arxiv.org/abs/2404.12590
The law review version is in press, forthcoming in early 2025.
arxiv.org/abs/2404.12590