Lightnews — Scholar-powered news

Epoch AI

@epochai.bsky.social

How Claudey is Opus 4.5?

We previously described Claudiness as "good at agentic tasks while being weaker at multimodal and math". This pattern remains when comparing Opus 4.5 to other newly-released models, though the gap on agentic coding and tool-calling benchmarks is small.

November 25, 2025 at 10:26 PM

Epoch AI

@epochai.bsky.social

We benchmarked Opus 4.5 on FrontierMath. It scored 21% on FrontierMath Tiers 1–3, continuing a trend of improvement for Anthropic models.

This score is behind Gemini 3 Pro and GPT-5.1 (high) while being on par with earlier frontier models like o3 (high) and Grok 4.

November 25, 2025 at 9:26 PM

Epoch AI

@epochai.bsky.social

Gemini 3 Pro set a new record on GPQA Diamond: 93% vs. the previous record of 88%. What you can’t tell from the headline: almost all of this gain came in organic chemistry. 🧬🧵

November 25, 2025 at 4:57 PM

Epoch AI

@epochai.bsky.social

We’ve optimized our Frontier Data Centers hub for mobile.

You can now examine annotated, recent, high-resolution satellite imagery of the world's largest compute clusters directly from your phone at epoch.ai/data/data-c....

Here’s a look at the updated Satellite Viewer:

November 25, 2025 at 2:15 AM

Epoch AI

@epochai.bsky.social

Gemini 3 Pro set a new record on FrontierMath: 38% on Tiers 1–3 and 19% on Tier 4.

On the Epoch Capabilities Index (ECI), which combines multiple benchmarks, Gemini 3 Pro scored 154, up from GPT-5.1’s previous high score of 151.

November 21, 2025 at 7:04 PM

Epoch AI

@epochai.bsky.social

Benchmarking data is dominated by a single “General Capability” dimension. Is this due to good generalization across tasks, or to developers pushing on all benchmarks at once?

🧵 with some analysis, including the discovery of a “Claudiness” dimension.

November 20, 2025 at 9:09 PM

Epoch AI

@epochai.bsky.social

It’s easy to talk about ‘large AI data centers’ and still underestimate the scale.

Our Frontier Data Centers database shows that some upcoming campuses will cover a substantial portion of Manhattan. Meta's Hyperion data center will be nearly four times the size of Central Park.

November 19, 2025 at 7:54 PM

Epoch AI

@epochai.bsky.social

GPT-5.1 is about as capable as GPT-5.

That’s according to the Epoch Capabilities Index, our tool for combining results across multiple benchmarks. With “high” reasoning, both GPT-5.1 and GPT-5 score 151 on ECI.

See 🧵 for individual benchmark scores!

November 19, 2025 at 12:10 PM

Epoch AI

@epochai.bsky.social

Data centers supporting AI training runs could require 1-5 GW by 2030, enough to power entire cities.

Join us for a live webinar/Q&A on our new Frontier Data Centers Hub, exploring what this infrastructure buildout means for AI.

Nov 20, 1-2 PM PT
luma.com/oste01d0

Frontier Data Centers Webinar and Q&A | Epoch AI · Luma

Join us for a webinar and live Q&A on the Frontier Data Centers Hub, our open database that maps the construction, power, compute, and cost of the largest AI…

luma.com

November 18, 2025 at 8:17 PM

Epoch AI

@epochai.bsky.social

Sam Altman: “Our goal is, by March of 2028, to have a true automated AI researcher”

Some say this kicks off a software singularity, where AIs recursively improve themselves and rapidly get smarter. Others think there’ll be a bottleneck.

So how can we tell who’s right? 🧵

November 17, 2025 at 7:42 PM

Epoch AI

@epochai.bsky.social

AI data center buildouts already rival the Manhattan Project in scale, but there’s little public info about them.

So we spent the last few months reading legal permits, staring at satellite images, and scouring news sources.

Here’s what you need to know. 🧵

November 10, 2025 at 6:03 PM

Epoch AI

@epochai.bsky.social

How fast can you build a gigawatt-scale data center?

Some hyperscalers plan to do it in just 1-2 years from the start of construction.

If they succeed, we’ll see the first GW-scale data centers online in 2026, marking one of the fastest infrastructure build-outs in history. 🧵

November 10, 2025 at 5:40 PM

Epoch AI

@epochai.bsky.social

The Epoch Capabilities Index is a useful way to measure model capabilities, but what does a score of 150 actually mean?

One way to read our new capability index is by plotting the benchmark performance you expect to see, for a range of ECI scores 🧵

November 7, 2025 at 7:13 PM

Epoch AI

@epochai.bsky.social

Anthropic's recently-reported projection of $70B revenue in 2028 may be less than OpenAI's projection for the same year, but it would still represent historically fast growth.

bsky.app/profile/epo...

November 5, 2025 at 3:27 PM

Epoch AI

@epochai.bsky.social

Announcing our Frontier Data Centers Hub!

The world is about to see multiple 1 GW+ AI data centers.

We mapped their construction using satellite imagery, permits & public sources — releasing everything for free, including commissioned satellite images.

Highlights in thread!

November 4, 2025 at 7:16 PM

Epoch AI

@epochai.bsky.social

By stitching benchmarks together, the Epoch Capabilities Index allows us to compare frontier models to models with 100,000x less training compute.

November 3, 2025 at 8:59 PM

Epoch AI

@epochai.bsky.social

We looked at OSWorld, a popular evaluation of AI computer use capabilities.

Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time.

See thread for details!

November 3, 2025 at 8:16 PM

Epoch AI

@epochai.bsky.social

We found a bug in our benchmarking code: calls to GPT-5 with "high" reasoning were silently being set to "medium".

Corrected results: GPT-5 (high) scores slightly higher than GPT-5 (medium) on the benchmarks we run. They are also now tied on the Epoch Capabilities Index (ECI).

October 31, 2025 at 3:22 PM

Epoch AI

@epochai.bsky.social

We used our new capabilities index, the ECI, to measure the gap between open- and closed-weight models.

The result? This gap is smaller than previously estimated.

On average, it takes 3.5 months for an open-weight model to catch up with closed-source SOTA.

October 30, 2025 at 7:59 PM

Epoch AI

@epochai.bsky.social

Conventional wisdom in AI is that large scale pretraining needs to happen in massive contiguous datacenter campuses. But is this true?

Our research suggests that conducting 10 GW training runs across two dozen sites—linked by a network spanning thousands of km long—is feasible.

October 28, 2025 at 6:00 PM

Epoch AI

@epochai.bsky.social

We've launched a new tool to track AI progress!

The tool addresses one of the field's biggest challenges: benchmark saturation.

It's called the Epoch Capabilities Index (ECI) — here's what makes it different:

October 27, 2025 at 7:13 PM

Epoch AI

@epochai.bsky.social

Large language models can imitate reasoning steps and even verify formal proofs.

But mathematical physicist Svetlana Jitomirskaya argues they lack folklore knowledge: the implicit priors mathematicians build from experience.

Link to video in comments!

October 27, 2025 at 3:50 PM

Epoch AI

@epochai.bsky.social

Stanford mathematician Ravi Vakil expects AI’s impact on mathematics to come as a phase change, not a slow climb.

Every major shift in math has caught experts off guard, he says. This one will be no different, except that all our predictions will be even more wrong.

Link to video in comments!

October 23, 2025 at 1:53 PM

Epoch AI

@epochai.bsky.social

We evaluated Claude Haiku 4.5 on several benchmarks.

Even with reasoning disabled, Haiku 4.5 performs similarly or better than early lightweight reasoning models, like o1-mini.

October 17, 2025 at 5:49 PM

Epoch AI

@epochai.bsky.social

If you ran GPT-5 infinitely many times on FrontierMath—our extremely challenging math benchmark—would it eventually solve every problem?

Probably not. From what we can tell, it caps out below 50%.

What about throwing in *every* available model? Infinitely many times? 🧵

October 17, 2025 at 4:56 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news