Lightnews — Scholar-powered news

Epoch AI @epochai.bsky.social · 10m

Want one page of headline stats? Bookmark the Trends dashboard to see compute growth, hardware performance, development costs, and data bottlenecks: epoch.ai/trends

LLMs have not yet solved the hardest problems on high school math contests

Epoch AI @epochai.bsky.social · 11m

Capabilities constantly advancing: LLMs keep improving fast, but there are still frontiers to be conquered. For example, the hardest high-school math contest problems remain unsolved even by top systems.
epoch.ai/data-insigh...

LLMs have come a long way on high school math contests but have yet to show that they can solve the hardest problems found on these contests.

Frontier AI performance becomes accessible on consumer hardware within a year

Epoch AI @epochai.bsky.social · 11m

Rapid efficiency gains: frontier-level performance has historically become accessible to consumers on a single high-end gaming GPU (like the RTX 5090) within a year.
epoch.ai/data-insigh...

Models that fit on consumer GPUs match the performance of frontier AI within a year or less.

Private-sector companies own a dominant share of GPU clusters

Epoch AI @epochai.bsky.social · 11m

Who owns the compute? Private companies now hold ~80% of global AI cluster performance (up from ~40% in 2019).
epoch.ai/data-insigh...

The private sector’s share of global AI computing capacity has grown from 40% in 2019 to 80% in 2025. Though many leading early supercomputers such as Summit were run by government and academic labs, the total installed computing power of public-sector clusters has only increased at 1.8x per year, rapidly outpaced by private-sector clusters, whose total computing power has grown at 2.7x per year. The rising economic importance of AI has spurred the private sector to build more and faster clusters for training and inference. As of May 2025, the largest known public AI supercomputer, Lawrence Livermore’s El Capitan, achieves less than a quarter of the computational performance of the largest known industry cluster, xAI’s Colossus.

Our database of over 500 GPU clusters and supercomputers tracks large hardware facilities, including those used for AI training and inference.

Epoch AI @epochai.bsky.social · 11m

GPU Clusters: data on 800+ clusters & supercomputers, with info on owners, chips, cost, and power capacity. And read our paper which explains coverage, limitations, and policy implications.
epoch.ai/data/gpu-cl...

Data on GPU clusters

Epoch AI @epochai.bsky.social · 11m

Machine Learning Hardware: follow the trends in 160+ AI accelerators (GPUs, TPUs) with specs and long-run performance charts. Data + docs + download links.
epoch.ai/data/machin...

Epoch AI @epochai.bsky.social · 11m

AI Benchmarking Hub: see results on GPQA Diamond, MATH Level 5, Mock AIME ’24–’25, FrontierMath, SWE-bench Verified + curated external runs. Download the data, inspect run logs, or use the Python client.
epoch.ai/benchmarks

Epoch AI @epochai.bsky.social · 11m

AI Models: we’ve annotated 3,000+ models released since 1950, with training compute, parameter count, dataset size, hardware & more. Browse, filter, or download CSV (updated daily).
epoch.ai/data/ai-models

Epoch AI @epochai.bsky.social · 11m

A healthy conversation about AI should be grounded in facts. Epoch’s datasets can help you track and understand the trajectory of AI.
As a nonprofit, our work is freely accessible for anyone to read, replicate, and build upon.
Our datasets:

GPT-5 compute estimate (public)

Epoch AI @epochai.bsky.social · 19h

See here for detailed notes on our estimate: epoch.ai/s/gpt-5-com...

And for why OpenAI trained GPT-5 with relatively little compute, see our full Gradient Update.
x.com/EpochAIRese...

GPT-5 compute estimate Guesstimate model link: https://www.getguesstimate.com/models/26145 Pretraining Training compute of LLMs during pretraining is closely approximated by 6 * number of active parameters * number of tokens the model was trained on. Active parameters GPT-5 is likely a “mid-si...

docs.google.com

Epoch AI @epochai.bsky.social · 19h

RL scaling remains a key factor in near-term AI progress. Recent evidence about this compute frontier is sparse, but we're tracking it closely.

Epoch AI @epochai.bsky.social · 19h

Overall, if final RL training made up 10 to 200% of pre-training, that yields a median estimate of 5e25 FLOP for GPT-5’s overall training compute. And it’s likely that GPT-5 was trained on less than 1e26 FLOP.

Epoch AI @epochai.bsky.social · 19h

Did OpenAI scale RL to match or exceed pre-training compute for GPT-5? It’s possible.

But reports suggest that training GPT-5 wasn’t straightforward, and OpenAI may have focused on different skills for GPT-5 than for o3, suggesting more experimentation vs simple scaling.

Epoch AI @epochai.bsky.social · 19h

Next is reinforcement learning during post-training, which adds more uncertainty.

In early 2025, RL compute was small—maybe 1-10% of pre-training. But this is scaling up fast: OpenAI scaled RL by 10× from o1 to o3, and xAI did the same from Grok 3 to 4.

Epoch AI @epochai.bsky.social · 19h

GPT-5’s pre-train token count is unconfirmed, but Llama 4 and Qwen3’s were 30-40 trillion tokens.

OpenAI has invested heavily into pre-training, so GPT-5 was likely trained on at least 30T tokens, possibly several times more.

This gives a median of ~3e25 FLOP pretrain.

Epoch AI @epochai.bsky.social · 19h

Training compute scales in proportion to a model’s active parameters, as well as training data.

Based on price, speed, and prevailing industry trends, GPT-5 is probably a “mid-sized” frontier model with ~100B active params, akin to Grok 2 (115B active), GPT-4o, and Claude Sonnet.

AI capabilities have steadily improved over the past year

Epoch AI @epochai.bsky.social · 19h

Our best guess: GPT-5 was trained on ~5e25 FLOP total, including both pre-training and reinforcement learning.

That would be more than twice as much as GPT-4 (~2e25 FLOP), but less than GPT-4.5 (>1e26 FLOP).

Here’s how it breaks down.

1 3

Epoch AI @epochai.bsky.social · 19h

We recently wrote that GPT-5 is likely the first mainline GPT release to be trained on less compute than its predecessor.

How did we reach this conclusion, and what do we actually know about how GPT-5 was trained?
🧵

2 6

Epoch AI @epochai.bsky.social · 20h

Learn more about this insight on our website!
epoch.ai/data-insigh...

In the past 12 months, AI’s abilities have improved steadily across a range of skills including coding, visual and common sense reasoning, and math.

Epoch AI @epochai.bsky.social · 20h

Nevertheless, it's hard to deny that AI models have become substantially more useful over the past 12 months. One indication of this is that revenues at frontier AI companies have more than tripled in the past year.

1 1 1

Epoch AI @epochai.bsky.social · 20h

Of course, benchmarks don't capture real-world utility perfectly. Even a model scoring 100% on GPQA Diamond probably won't fully replace scientists, since models can be overfit to benchmarks, and benchmarks don't capture all aspects of real-world work.

Epoch AI @epochai.bsky.social · 20h

Across benchmarks covering coding, math, scientific knowledge, common sense and visual reasoning, and more, state-of-the-art models have improved by 20 to 50 percentage points in the last year.