Tim Duffy
@timfduffy.com
1.2K followers 500 following 3.5K posts
I like utilitarianism, consciousness, AI, EA, space, kindness, liberalism, longtermism, progressive rock, economics, and most people. Substack: http://timfduffy.substack.com
Posts Media Videos Starter Packs
timfduffy.com
If you have a telephoto lens I recommend taking pictures of birds, is a fun challenge and there are birds most places.
timfduffy.com
Surprising new compute estimate from Epoch on OpenAI in 2024. GPT-4.5 is estimated to have been a small portion of total R&D compute. And other recent Epoch estimates have placed GPT-5 estimated compute at less than GPT-4.5.
epochai.bsky.social
New data insight: How does OpenAI allocate its compute?

OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training.

Only a minority of this R&D compute went to the final training runs of released models.
timfduffy.com
For a batch size of 1 with minimal parallelism, probably there is only modest improvement to be had. But I think there is a lot of room to make various kinds of parallel inference faster, like in the example here.
timfduffy.com
SemiAnalysis comments on the reasons for strong GB200 performance here and half of them are things I've never heard of, time to do some reading. Also note the benefit of multi-token prediction for speed. x.com/SemiAnalysis...
timfduffy.com
I wonder if you'll have to detach some quotes and hide some replies for that post. ;)
timfduffy.com
SemiAnalysis has released InferenceMAX, a benchmark tracking inference throughput across models and hardware. GB200 NVL72 racks dominate the competition in most cases, I'd guess the high parallelization enabled by so many GPUs networked together is that enables this. inferencemax.semianalysis.com
timfduffy.com
But TBH Fed not having the data to say otherwise is also plausible. My prior would have been slight exaggeration in the last few years of slower growth.
timfduffy.com
I think another option is that China is good at cooking the numbers but their goal is to make growth look more stable than it is rather than to exaggerate. It's hard to overstate growth in the long term since small differences add up, so if focusing on the long term, it's a bad idea to exaggerate.
timfduffy.com
Here's the report's conclusion:
timfduffy.com
Some Fed economists looked at Chinese GDP growth estimates and found that they weren't systematically biased www.federalreserve.gov/econres/note...
timfduffy.com
As my Spanish teacher used to say "gracias a dios es viernes", which I misinterpreted almost all year as "gracias adios"
timfduffy.com
tl;dr: @kindgrace.bsky.social, you're absolutely right!
timfduffy.com
You can see the representations Grace is describing in action in this cross-layer transcoder graph. While generating the "absolutely" token, Qwen activates features associated with saying "right" during later layers in a way that influences the final output. www.neuronpedia.org/qwen3-4b/gra...
timfduffy.com
across a bunch of rollouts, and my intuition is that it would be hard for synthetic data to involve that much token generation, but I'm not sure on that as I'm not really familiar with synthetic data creation.
timfduffy.com
generate for each token that makes it into the pretraining corpus is >5 then the data generation needs more compute. If that synthetic data is being used in multiple epochs that factor is higher. But in RL, while you don't have the multiplier for backprop, you're generating tons of tokens...
timfduffy.com
Very curious about the math for synthetic data to be more than for training. For pretraining, processing a token maybe takes 4-5x what normal generation takes since you need an extra 2x for the backward pass plus some other stuff. So if with synthetic data, the number of tokens you need to ...
timfduffy.com
Hmm even if some entanglement across neurons can happen I don't see how it could be widespread in the brain, I agree with the hot messy objections on the Wiki page. And even if that were the case I don't see how it supports any source outside your brain.
timfduffy.com
What would it mean for it to be a quantum phenomenon? Is it about having lots of stuff in superposition? Do human brains even do that?
timfduffy.com
Yeah, I think having the barrier to seeing it instantly but still having it retrievable through skythreead or similar is a really nice middle ground.
timfduffy.com
"In this book, I aim to convince you that the experts do not know, and you do not know, and society collectively does not and will not know, and all is fog."

Schwitzgebel's thesis is that AI consciousness is non-obvious, and clarity on the issue is not imminent. I've enjoyed what I've read so far.
eschwitz.bsky.social
New book in draft: AI and Consciousness [link in thread]
This book is a skeptical overview of the literature on AI and consciousness.
Anyone who emails me comments on the entire manuscript will be thanked in print and receive an appreciatively signed hard copy.
AI and Consciousness title page
timfduffy.com
> At each position P_n and layer L_n, the transformer block receives information from positions P_0..P_n-1 as processed by layer L_n-1.

Interesting point about how you get past toekn info from 1 layer prior given that K/V are calculated before the attention/MLP in a layer, didn't occur to me.
timfduffy.com
Update: It's excellent so far, I intend to write about it.
timfduffy.com
Speaking of Schwitz, I'm about to dive into his new book draft:
eschwitz.bsky.social
New book in draft: AI and Consciousness [link in thread]
This book is a skeptical overview of the literature on AI and consciousness.
Anyone who emails me comments on the entire manuscript will be thanked in print and receive an appreciatively signed hard copy.
AI and Consciousness title page