Epoch AI
@epochai.bsky.social
780 followers 20 following 790 posts
We are a research institute investigating the trajectory of AI for the benefit of society. epoch.ai
Posts Media Videos Starter Packs
epochai.bsky.social
Insights and analysis provided by our outstanding data and benchmarking teams, including @robi_rahman, @justjoshinyou13, @luke__emberson, @benmcottier, @james_s48, @venkat_somaaala, @YafahEdelman, @everysum, @js_denain, and @tmkadamcz
epochai.bsky.social
Everything is available under the CC-BY license, so feel free to reuse our data, replicate our analysis, or conduct your own– just cite us. And if we’re missing a model, chip, or cluster, tell us and we’ll add it.

Dive in here: epoch.ai/data
Data on the Trajectory of AI
Our public datasets catalog over 3000 machine learning models. Explore data and graphs showing the growth and trajectory of AI from 1950 to today.
epoch.ai
epochai.bsky.social
Want one page of headline stats? Bookmark the Trends dashboard to see compute growth, hardware performance, development costs, and data bottlenecks: epoch.ai/trends
epochai.bsky.social
Capabilities constantly advancing: LLMs keep improving fast, but there are still frontiers to be conquered. For example, the hardest high-school math contest problems remain unsolved even by top systems.
epoch.ai/data-insigh...
LLMs have not yet solved the hardest problems on high school math contests
LLMs have come a long way on high school math contests but have yet to show that they can solve the hardest problems found on these contests.
epoch.ai
epochai.bsky.social
Rapid efficiency gains: frontier-level performance has historically become accessible to consumers on a single high-end gaming GPU (like the RTX 5090) within a year.
epoch.ai/data-insigh...
Frontier AI performance becomes accessible on consumer hardware within a year
Models that fit on consumer GPUs match the performance of frontier AI within a year or less.
epoch.ai
epochai.bsky.social
GPU Clusters: data on 800+ clusters & supercomputers, with info on owners, chips, cost, and power capacity. And read our paper which explains coverage, limitations, and policy implications.
epoch.ai/data/gpu-cl...
Data on GPU clusters
Our database of over 500 GPU clusters and supercomputers tracks large hardware facilities, including those used for AI training and inference.
epoch.ai
epochai.bsky.social
Machine Learning Hardware: follow the trends in 160+ AI accelerators (GPUs, TPUs) with specs and long-run performance charts. Data + docs + download links.
epoch.ai/data/machin...
epochai.bsky.social
AI Benchmarking Hub: see results on GPQA Diamond, MATH Level 5, Mock AIME ’24–’25, FrontierMath, SWE-bench Verified + curated external runs. Download the data, inspect run logs, or use the Python client.
epoch.ai/benchmarks
epochai.bsky.social
AI Models: we’ve annotated 3,000+ models released since 1950, with training compute, parameter count, dataset size, hardware & more. Browse, filter, or download CSV (updated daily).
epoch.ai/data/ai-models
epochai.bsky.social
A healthy conversation about AI should be grounded in facts. Epoch’s datasets can help you track and understand the trajectory of AI.
As a nonprofit, our work is freely accessible for anyone to read, replicate, and build upon.
Our datasets:
epochai.bsky.social
RL scaling remains a key factor in near-term AI progress. Recent evidence about this compute frontier is sparse, but we're tracking it closely.
epochai.bsky.social
Overall, if final RL training made up 10 to 200% of pre-training, that yields a median estimate of 5e25 FLOP for GPT-5’s overall training compute. And it’s likely that GPT-5 was trained on less than 1e26 FLOP.
epochai.bsky.social
Did OpenAI scale RL to match or exceed pre-training compute for GPT-5? It’s possible.

But reports suggest that training GPT-5 wasn’t straightforward, and OpenAI may have focused on different skills for GPT-5 than for o3, suggesting more experimentation vs simple scaling.
epochai.bsky.social
Next is reinforcement learning during post-training, which adds more uncertainty.

In early 2025, RL compute was small—maybe 1-10% of pre-training. But this is scaling up fast: OpenAI scaled RL by 10× from o1 to o3, and xAI did the same from Grok 3 to 4.
epochai.bsky.social
GPT-5’s pre-train token count is unconfirmed, but Llama 4 and Qwen3’s were 30-40 trillion tokens.

OpenAI has invested heavily into pre-training, so GPT-5 was likely trained on at least 30T tokens, possibly several times more.

This gives a median of ~3e25 FLOP pretrain.
epochai.bsky.social
Training compute scales in proportion to a model’s active parameters, as well as training data.

Based on price, speed, and prevailing industry trends, GPT-5 is probably a “mid-sized” frontier model with ~100B active params, akin to Grok 2 (115B active), GPT-4o, and Claude Sonnet.
epochai.bsky.social
Our best guess: GPT-5 was trained on ~5e25 FLOP total, including both pre-training and reinforcement learning.

That would be more than twice as much as GPT-4 (~2e25 FLOP), but less than GPT-4.5 (>1e26 FLOP).

Here’s how it breaks down.
epochai.bsky.social
We recently wrote that GPT-5 is likely the first mainline GPT release to be trained on less compute than its predecessor.

How did we reach this conclusion, and what do we actually know about how GPT-5 was trained?
🧵
epochai.bsky.social
Nevertheless, it's hard to deny that AI models have become substantially more useful over the past 12 months. One indication of this is that revenues at frontier AI companies have more than tripled in the past year.
epochai.bsky.social
Of course, benchmarks don't capture real-world utility perfectly. Even a model scoring 100% on GPQA Diamond probably won't fully replace scientists, since models can be overfit to benchmarks, and benchmarks don't capture all aspects of real-world work.
epochai.bsky.social
Across benchmarks covering coding, math, scientific knowledge, common sense and visual reasoning, and more, state-of-the-art models have improved by 20 to 50 percentage points in the last year.
epochai.bsky.social
AI capabilities have been steadily improving across a wide range of skills, and show no sign of slowing down in the near term. 🧵