Lightnews — Scholar-powered news

Tim Duffy

@timfduffy.com

1.4K followers 530 following 4K posts

I like utilitarianism, consciousness, AI, EA, space, kindness, liberalism, longtermism, progressive rock, economics, and most people. Substack: http://timfduffy.substack.com

Posts Replies Media Videos

Tim Duffy

@timfduffy.com

This was all previously released as a supplement to the Nature paper, but I missed it at the time static-content.springer.com/esm/art%3A10...

static-content.springer.com

January 7, 2026 at 10:32 PM

Tim Duffy

@timfduffy.com

They find that R1 is very vulnerable to jailbreaks, consistent with my experience.

January 7, 2026 at 10:14 PM

Tim Duffy

@timfduffy.com

They also evaluate R1 on a series of harm-related benchmarks

January 7, 2026 at 10:14 PM

Tim Duffy

@timfduffy.com

If I'm reading this right, on the official DeepSeek platform all queries are checked for certain keywords, and if present the query and response are reviewed by V3. The safety standards they give their LLM judge are more about mundane risks than catastrophic ones though.

January 7, 2026 at 10:14 PM

Tim Duffy

@timfduffy.com

I think you can load a text encoder for images/video in your RAM rather than VRAM since only getting a handful of tokens/s is fine for that purpose. You can presumably also use a quantized version of Gemma 3 12B.

January 7, 2026 at 8:07 PM

Tim Duffy

@timfduffy.com

Actually I should have read all the appendices first, their results with SAE features were mixed.

January 6, 2026 at 10:34 PM

Tim Duffy

@timfduffy.com

The method behind activation oracles has been previously shown to perform well at feature labeling arxiv.org/pdf/2511.08579

arxiv.org

January 6, 2026 at 9:16 PM

Tim Duffy

@timfduffy.com

I think you might have posted this at the time of day that minimizes the number of English speakers for whom it is morning.

January 5, 2026 at 9:37 PM

Tim Duffy

@timfduffy.com

January 5, 2026 at 9:31 PM

Tim Duffy

@timfduffy.com

I also really don't think this update reflects bowing to pressure. My strong impression of Eli and Daniel is that they view the stakes as existential and accuracy in their model as very important.

January 1, 2026 at 6:07 PM

Tim Duffy

@timfduffy.com

AC is a pretty strict definition in the model, and requires even the messiest tasks to be best achieved by AI. x.com/eli_lifland/...

January 1, 2026 at 6:07 PM

Tim Duffy

@timfduffy.com

I think the update makes a lot of sense, though I also have longer timelines than many AI folks. One update I agree with is a long horizon length required for full automation, which Eli expands on here: docs.google.com/document/d/1...

January 1, 2026 at 6:07 PM

Tim Duffy

@timfduffy.com

Nice thanks for the recs

January 1, 2026 at 4:36 PM

Tim Duffy

@timfduffy.com

will be true in most areas. In coding we are just entering the "cyborg era" where humans with AI assistance are better than either alone. But once AI coding greatly surpasses ours, I think the AI will do best on its own.

January 1, 2026 at 1:09 PM

Tim Duffy

@timfduffy.com

I pretty much buy that assumption though. When AI was surpassing humans at chess, there was a period where humans and chess engines working together were better than either AIs or humans alone. But once AIs were much better, humans were no longer able to contribute helpfully. I think the same...

January 1, 2026 at 1:09 PM

Tim Duffy

@timfduffy.com

Regardless of any disagreements I have, I think this model is great to have, and I greatly appreciate the folks at the AI Futures Project for thinking hard about how future AI progress is likely to go.

December 31, 2025 at 6:39 PM

Tim Duffy

@timfduffy.com

The model with Eli's parameters estimates automating research will be quite quick once AC is achieved, with a median of about 9 months. This probably depends heavily on the parameter for increase in research taste with scale, which I might set lower.

December 31, 2025 at 6:39 PM

Tim Duffy

@timfduffy.com

One interesting but intuitive result is that there's a pretty strong correlation between time to automated coding and takeoff speed after that's achieved.

December 31, 2025 at 6:39 PM

Tim Duffy

@timfduffy.com

The two bullet points after that one are reasons for acceleration so I'm not being fair. And more importantly The authors all probably expect a much faster true takeoff after AGI

December 31, 2025 at 5:30 PM

Tim Duffy

@timfduffy.com

wow look at the takeoff deniers at AI Futures Project blog.ai-futures.org/p/ai-futures...

December 31, 2025 at 5:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news