Joshua Gans
joshgans.bsky.social
Joshua Gans
@joshgans.bsky.social

Professor at University of Toronto

Joshua Gans holds the Jeffrey Skoll Chair in Technical Innovation and Entrepreneurship at the Rotman School of Management, University of Toronto. Until 2011, he was an economics professor at Melbourne Business School in Australia. His research focuses on competition policy and intellectual property protection. He is the author of several textbooks and policy books, as well as numerous articles in economics journals. He operates two blogs: one on economic policy, and another on economics and parenting. .. more

Business 41%
Economics 37%

In strategic management and wanting to move to Canada. We have a senior slot this year in Strategic Management. Job ad and application details are here jobs.utoronto.ca/job/Toronto-...
Associate Professor / Professor - Strategic Management
Associate Professor / Professor - Strategic Management
jobs.utoronto.ca

Bottom line: Jagged intelligence isn't a bug that scaling will fix—it's a fundamental feature of how these models work. Better adoption strategies require either much denser knowledge coverage, effective calibration, or users developing expertise about where models fail.

For reasoning-intensive tasks, the model predicts that extended reasoning (like chain-of-thought) helps most in the largest knowledge gaps—exactly where users encounter problems most often. This explains why reasoning modes improve user experience disproportionately.

The paper shows that reducing variation in gap sizes (making performance more regular) is more valuable than just increasing average quality. A model with consistent mediocre performance can be more useful than one with erratic brilliance.

"Mastery" means learning where the model is reliable through repeated use. But this is harder than it sounds. The reliability landscape has complex patterns, and you need extensive experience across many tasks to build an accurate mental map.

Scaling up models (adding more knowledge points) improves average quality but doesn't eliminate jaggedness. You get denser coverage, but gaps remain. The fundamental unevenness persists even as models get larger and more capable.

Calibration helps but has limits. If users could see reliability scores for each specific task, they'd delegate more effectively. But even with perfect calibration, the inspection paradox means users still encounter errors at higher rates than the model's average.

The model shows that "blind adoption"—using AI without checking local reliability—works only when coverage is extremely dense. For most current models, blind adoption leads to unacceptable error rates even when average quality seems high.

This creates a threshold problem. Users adopt AI when global quality signals (like benchmark scores) look good enough. But the errors they actually experience are amplified by the inspection paradox, leading to disappointment and distrust.

Think of it like waiting for a bus. Longer gaps between buses mean you spend more time waiting in those gaps. Similarly, you encounter AI's biggest knowledge gaps most frequently—making the model seem worse than its average performance.

Here's the key insight: users experience AI weaknesses more than its strengths due to an "inspection paradox." When you try many tasks, you're statistically more likely to hit the large gaps in knowledge rather than the small ones.

The paper models AI knowledge as scattered points on a landscape. Between these knowledge points, the AI extrapolates—essentially guessing based on nearby information. The further you get from a knowledge point, the less reliable the output becomes.

Generative AI shows a puzzling pattern: it excels at some tasks but fails at seemingly similar ones. This "jagged" performance creates a fundamental problem for users trying to decide when to rely on AI. New research explains why this happens and what it means.

I think that was its greatest episode.

Reposted by Joshua Gans

Reposted by Joshua Gans

In light of the current global situation, I believed that AI could be used to create a playable simulation that allows you to map out the nuances of the military and other trade-offs associated with a US invasion of Greenland. You can play it here claude.ai/public/artif...
Greenland Invasion Game - Interactive Strategy Simulator
Play a humorous interactive strategy game where you deploy US forces to occupy Greenland. Educational parody with real geographic data and satirical military simulation mechanics.
claude.ai

Reposted by Joshua Gans

Prediction Machines

This is incredible.

(More legible version here www.npr.org/sections/the...)

Reposted by Joshua Gans

Reposted by Joshua Gans