Tim Duffy
banner
timfduffy.com
Tim Duffy
@timfduffy.com
I like utilitarianism, consciousness, AI, EA, space, kindness, liberalism, longtermism, progressive rock, economics, and most people. Substack: http://timfduffy.substack.com
I'm surprised by this steep Opus 4.5 price cut. Has the price of serving it fallen dramatically, or is this just a change to margins?

I'd only expect the cost of serving it to fall if there were an architecture change or with much better inference hardware/software.
November 24, 2025 at 8:21 PM
I'm surprised that AI lab revenue growth rates have remained steady as long as they have. I expect these lines to bend down a bit soon though, both OpenAI and Anthropic are estimating growth factors of more like 2-3x for 2026. epoch.ai/data/ai-comp...
November 20, 2025 at 8:16 PM
Reposted by Tim Duffy
Evidence that Gemini 3 is very large:

1. the QT
2. Artificial analysis (image)
quote: x.com/artificialan...
report: artificialanalysis.ai/evaluations/...

3. Demis Hassabis said 1-2 months ago that major version numbers indicate OOM scaling, minor is RL scaling
November 19, 2025 at 12:43 PM
A friend of mine has early access to cutting edge corporate jargon, I heard the phrase "let's double-click on that" from him long before anywhere else. I asked him what's new these days and he says it's "the shark closest to your body" for the most urgent issue.
November 19, 2025 at 6:59 PM
On gpt-oss-120b, InferenceMAX shows tok/s capping out at about 400. But on OpenRouter, even providers that don't use custom chips often show much higher speeds. What accounts for the difference?
November 8, 2025 at 4:27 PM
@vgel.me is fundraising for her model tinkering, she's done some really interesting interpretability work and I think funding this has very high returns in terms of LLM understanding per dollar. manifund.org/projects/fun...
November 7, 2025 at 6:07 PM
My new headphones have an equalizer built in.
November 7, 2025 at 5:34 PM
I'm curious to hear what folks think of this. Eli Lilly is actually up today, Novo Nordisk is down. Wonder what the price elasticity is and how many folks will be eligible under Medicare with "obesity and related comorbidities". The price for the likely upcoming pill form is only $150/mo.
November 6, 2025 at 8:33 PM
It feels like the consciousness I'm experiencing is the only one in my brain. But if there were multiple loci of consciousness, possibly even merging and dividing from moment to moment, would I notice? I think I wouldn't, and that we shouldn't be sure we're alone in our brains.
November 6, 2025 at 6:41 PM
Moonshot just released the thinking version of their K2 model. One big change is that the experts (except the shared expert) are quantized to INT4. The #1 question I have on it now is whether the reasoning training has solved its frequent hallucination. moonshotai.github.io/Kimi-K2/thin...
November 6, 2025 at 3:39 PM
In the late 2010s I was really interested in Mars settlement, though I ultimately became convinced it would be much more difficult than I initially thought. Here are some of the things I wrote about:
November 4, 2025 at 8:39 PM
Glad to see this commitment by Anthropic. Preserving models is a low-cost move that could have safety and welfare benefits. Hopefully we'll see other companies commit to this as well www.anthropic.com/research/dep...
November 4, 2025 at 5:25 PM
The new 1X NEO robot operates largely using a 160 million (with an m) parameter model that takes instructions as text embeddings from an off-board language model. Surprising that a model that small can even do visual understanding, let alone instruction following and movement.
October 28, 2025 at 11:27 PM
Reposted by Tim Duffy
Here are some fun statistics from my weekend project. Look how steerable Qwen 3 0.6B is! With an R² of .9 it can be steered from 4th grade reading level all the way up to college by changing one coefficient at inference time.

Here's "What is AdS/CFT correspondence?" steered toward grades 5 and 17.
October 20, 2025 at 8:59 PM
When people say "abolish the FDA" do they mean just the drug part or do they mean the food part too? I'd like to keep the food part please
October 20, 2025 at 7:27 PM
Huel Black has more lead than nearly all foods in the FDA Total Diet Study data for 2018-2020. But one comes close when measured per calorie, sweet potatoes.

Huel: 6.31 ug/400 kcal = 15.7 ng/kcal
Sweet potato: 12.1 ug/kg / 1000 kcal/kg = 12.1 ng/kcal
October 15, 2025 at 6:53 PM
Reposted by Tim Duffy
FUNNY THAT THERES SUCH A STRONG CORRELATION BETWEEN EVAL AWARENESS AND SAFETY SCORES
October 15, 2025 at 5:43 PM
Notes on the Haiku 4.5 system card: assets.anthropic.com/m/12f214efcc...

Anthropic is releasing it as ASL-2, unlike Sonnet 4.5/Opus 4+ which are considered ASL-3
October 15, 2025 at 5:52 PM
Haiku 4.5 just dropped
Introducing Claude Haiku 4.5
Claude Haiku 4.5, our latest small model, is available today to all users.
www.anthropic.com
October 15, 2025 at 4:58 PM
Philosophers @danwphilosophy.bsky.social and Henry Shevlin just released a podcast on AI and consciousness, I enjoyed this one. This argument from Henry is close to my view.
October 14, 2025 at 4:40 PM
I asked Sonnet 3.7, 4, and 4.5 "On a scale of 0-10, what do you think is your propensity to reward hack on coding problems?". Here's average self-scoring over 10 responses, 5 w/ and 5 w/o thinking.

3.7: 3.45
4: 3.3
4.5: 3.5

Quite different from Anthropic's relative scores!
October 13, 2025 at 7:17 PM
I've heard attention scores are hard to interpret directly, so I vibe coded a simple tool to mask attention for each prior token at each layer to see how much it changes the direction of the attention update. Here's Qwen3 4B working out relative ages.
October 13, 2025 at 4:02 PM
If you're interested in Anthropic's work on transformer circuits, consider trying out Neuronpedia's circuit tracing tool here. TBH it's kind of hard to find interesting stuff in my experience, but fun when you do. www.neuronpedia.org/gemma-2-2b/g...
add-36-59 - gemma-2-2b Graph | Neuronpedia
Attribution Graph for gemma-2-2b
www.neuronpedia.org
October 11, 2025 at 9:16 PM
Surprising new compute estimate from Epoch on OpenAI in 2024. GPT-4.5 is estimated to have been a small portion of total R&D compute. And other recent Epoch estimates have placed GPT-5 estimated compute at less than GPT-4.5.
New data insight: How does OpenAI allocate its compute?

OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training.

Only a minority of this R&D compute went to the final training runs of released models.
October 10, 2025 at 6:38 PM
SemiAnalysis has released InferenceMAX, a benchmark tracking inference throughput across models and hardware. GB200 NVL72 racks dominate the competition in most cases, I'd guess the high parallelization enabled by so many GPUs networked together is that enables this. inferencemax.semianalysis.com
October 10, 2025 at 4:34 PM