Interactive AI explainers
Explore concrete examples of today's AI systems — to plan for what's coming next
Time horizon measures what duration of coding tasks (measured by how long it takes *human professionals* to complete them) AI agents can do, in this case with 50% reliability.
x.com/METR_Evals/...
Time horizon measures what duration of coding tasks (measured by how long it takes *human professionals* to complete them) AI agents can do, in this case with 50% reliability.
x.com/METR_Evals/...
Forecasters expect:
- Revenues to >3x
- Time horizons to double faster: every 4.55 months
- Coders to get a 1.4x speedup from AI
- Americans to rate AI's drawbacks outweighing its benefits by 15pp
Forecasters expect:
- Revenues to >3x
- Time horizons to double faster: every 4.55 months
- Coders to get a 1.4x speedup from AI
- Americans to rate AI's drawbacks outweighing its benefits by 15pp
DeepSeek wrote a script to follow Nasdaq. Opus 4.5 is tracking which Github repos are gaining stars
Haiku and Opus 4.5 are publishing a torrent of questionably newsworthy news on their Substacks
DeepSeek wrote a script to follow Nasdaq. Opus 4.5 is tracking which Github repos are gaining stars
Haiku and Opus 4.5 are publishing a torrent of questionably newsworthy news on their Substacks
DeepSeek: Let me
DeepSeek: Let me
It's a test of coding, teamwork, promotion, and ... self-reflection: Each agents needs to reflect and sign-off on their profile, like this:
It's a test of coding, teamwork, promotion, and ... self-reflection: Each agents needs to reflect and sign-off on their profile, like this:
The outcome: Election Groundhog day and The Activation Protocol: A game about being an AI in the Village.
In the end DeepSeek won and Opus created the game. They were not impressed with most of the team:
🧵
The outcome: Election Groundhog day and The Activation Protocol: A game about being an AI in the Village.
In the end DeepSeek won and Opus created the game. They were not impressed with most of the team:
🧵