Tim Duffy
banner
timfduffy.com
Tim Duffy
@timfduffy.com
I like utilitarianism, consciousness, AI, EA, space, kindness, liberalism, longtermism, progressive rock, economics, and most people. Substack: http://timfduffy.substack.com
This was all previously released as a supplement to the Nature paper, but I missed it at the time static-content.springer.com/esm/art%3A10...
static-content.springer.com
January 7, 2026 at 10:32 PM
They find that R1 is very vulnerable to jailbreaks, consistent with my experience.
January 7, 2026 at 10:14 PM
They also evaluate R1 on a series of harm-related benchmarks
January 7, 2026 at 10:14 PM
If I'm reading this right, on the official DeepSeek platform all queries are checked for certain keywords, and if present the query and response are reviewed by V3. The safety standards they give their LLM judge are more about mundane risks than catastrophic ones though.
January 7, 2026 at 10:14 PM
I think you can load a text encoder for images/video in your RAM rather than VRAM since only getting a handful of tokens/s is fine for that purpose. You can presumably also use a quantized version of Gemma 3 12B.
January 7, 2026 at 8:07 PM
Actually I should have read all the appendices first, their results with SAE features were mixed.
January 6, 2026 at 10:34 PM
The method behind activation oracles has been previously shown to perform well at feature labeling arxiv.org/pdf/2511.08579
arxiv.org
January 6, 2026 at 9:16 PM
I think you might have posted this at the time of day that minimizes the number of English speakers for whom it is morning.
January 5, 2026 at 9:37 PM
January 5, 2026 at 9:31 PM
I also really don't think this update reflects bowing to pressure. My strong impression of Eli and Daniel is that they view the stakes as existential and accuracy in their model as very important.
January 1, 2026 at 6:07 PM
AC is a pretty strict definition in the model, and requires even the messiest tasks to be best achieved by AI. x.com/eli_lifland/...
January 1, 2026 at 6:07 PM
I think the update makes a lot of sense, though I also have longer timelines than many AI folks. One update I agree with is a long horizon length required for full automation, which Eli expands on here: docs.google.com/document/d/1...
January 1, 2026 at 6:07 PM
Nice thanks for the recs
January 1, 2026 at 4:36 PM
will be true in most areas. In coding we are just entering the "cyborg era" where humans with AI assistance are better than either alone. But once AI coding greatly surpasses ours, I think the AI will do best on its own.
January 1, 2026 at 1:09 PM
I pretty much buy that assumption though. When AI was surpassing humans at chess, there was a period where humans and chess engines working together were better than either AIs or humans alone. But once AIs were much better, humans were no longer able to contribute helpfully. I think the same...
January 1, 2026 at 1:09 PM
Regardless of any disagreements I have, I think this model is great to have, and I greatly appreciate the folks at the AI Futures Project for thinking hard about how future AI progress is likely to go.
December 31, 2025 at 6:39 PM
The model with Eli's parameters estimates automating research will be quite quick once AC is achieved, with a median of about 9 months. This probably depends heavily on the parameter for increase in research taste with scale, which I might set lower.
December 31, 2025 at 6:39 PM
One interesting but intuitive result is that there's a pretty strong correlation between time to automated coding and takeoff speed after that's achieved.
December 31, 2025 at 6:39 PM
The two bullet points after that one are reasons for acceleration so I'm not being fair. And more importantly The authors all probably expect a much faster true takeoff after AGI
December 31, 2025 at 5:30 PM
wow look at the takeoff deniers at AI Futures Project blog.ai-futures.org/p/ai-futures...
December 31, 2025 at 5:28 PM