Lightnews — Scholar-powered news

Ethan Mollick

@emollick.bsky.social

It is getting harder and harder to test AIs as they get "smarter" at a wide variety of tasks. The average task in GDPval took an hour for experts to assess, and even those tasks did not push current AIs to their limits.

November 25, 2025 at 1:59 AM

Ethan Mollick

@emollick.bsky.social

Me: Claude 4.5 Opus, I need a strategy game based on the work of Weber

Claude: Here's one based on David Weber's space operas

Me: Not that Weber

C: Here's a game based on sociologist Max Weber

Me: Not that one

C: The operas of Carl Maria von Weber?

Me: No

C: Here is one using Weber grills!

November 24, 2025 at 8:29 PM

Ethan Mollick

@emollick.bsky.social

I had early access to Opus 4.5 & it is a very impressive model that seem to be right at the frontier

Big gains in ability to do practical work (like make a PowerPoint from an Excel) and the best results ever (& in one shot) in my Lem poetry test, plus good results in Claude Code

November 24, 2025 at 6:59 PM

Ethan Mollick

@emollick.bsky.social

I think my “otters on a plane using WiFi” benchmark may saturated now that nano banana pro can do this.

November 21, 2025 at 2:56 PM

Ethan Mollick

@emollick.bsky.social

Ruining great art with the nano banana pro command “Make this much more cheerful with as few changes as possible”

November 21, 2025 at 1:19 PM

Ethan Mollick

@emollick.bsky.social

Tell all the truth but tell it slant—
Success in Circuit lies
Too bright for our infirm Delight
The Truth's superb surprise

This paper finds poetry is a universal single shot jailbreak for LLMs. Systems built to stop prosaic attacks fail when the request is phrased in verse arxiv.org/abs/2511.15304

November 20, 2025 at 9:47 PM

Ethan Mollick

@emollick.bsky.social

Nano banana Pro: “i need a flowchart for how to toast bread, make it as wacky and over the top and complicated as possible.“

Not absolutely perfect, but I can’t believe how much there is a coherent through-line, how clear the text is, and also parts of it are actually funny?

November 20, 2025 at 7:19 PM

Ethan Mollick

@emollick.bsky.social

"Hey, Gemini 3, So I need DOOM, but more root vegetables, also no guns or demons or mars. And more of a focus on different flooring styles. but otherwise EXACTLY the same as DOOM."

Gemini: "Here is F.L.O.O.R. (First-person Lino Observation & Ornamental Review)."

Pretty good!

November 19, 2025 at 9:08 PM

Ethan Mollick

@emollick.bsky.social

As a fan of weird but revealing benchmarks, I enjoyed this historian’s attempts to have different frontier AIs build “a full featured RPG game where you play as Henry James wandering as a flâneur at the 1889 Universal Exposition in Paris.” HenryBench? open.substack.com/pub/resobscu...

How well can Gemini 3 make a Henry James simulator?

Finally, a benchmark for LLMs with real-world value

open.substack.com

November 19, 2025 at 4:13 AM

Ethan Mollick

@emollick.bsky.social

Fun little Gemini 3 experiment where I asked it "build me a time machine simulator, make it very very good" and then "make it better" a few times. I like that it added calls to Gemini within the application, including adding speech & nano banana images. Play it: gemini.google.com/share/02e4e8...

November 18, 2025 at 10:28 PM

Ethan Mollick

@emollick.bsky.social

I had access to Gemini 3. It is a very good, very fast model. It also demonstrates the change from chatbot to agent. www.oneusefulthing.org/p/three-year...

Three Years from GPT-3 to Gemini 3

From chatbots to agents

www.oneusefulthing.org

November 18, 2025 at 6:57 PM

Ethan Mollick

@emollick.bsky.social

Interesting changes from Grok 4 to Grok 4.1. Decreases in harmful responses but also increases in sycophancy and deception.

It isn’t clear how to interpret the sycophancy score, but the MASK score for deception is quite high compared to big models.

Sycophancy leads to higher LMArena scores…

November 18, 2025 at 2:55 AM

Ethan Mollick

@emollick.bsky.social

We are now seeing the first long-anticipated use of AI for semi-autonomous cyberattacks.

"This approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement" www.anthropic.com/news/disrupt...

November 13, 2025 at 7:12 PM

Ethan Mollick

@emollick.bsky.social

Some pretty eye-opening data on the effect of AI coding.

When Cursor added agentic coding in 2024, adopters produced 39% more code merges, with no sign of a decrease in quality (revert rates were the same, bugs dropped) and no sign that the scope of the work shrank. papers.ssrn.com/sol3/papers....

November 13, 2025 at 5:18 AM

Ethan Mollick

@emollick.bsky.social

As AIs get smarter & more useful, our benchmarks become less useful. Measuring general knowledge or coding ability gives us only a glimpse into what an AI model can do.

Anyone who wants to use AI seriously for real work will need to assess it themselves. www.oneusefulthing.org/p/giving-you...

Giving your AI a Job Interview

As AI advice becomes more important, we are going to need to get better at assessing it

www.oneusefulthing.org

November 12, 2025 at 2:55 AM

Ethan Mollick

@emollick.bsky.social

I keep warning that so many of our systems are still built around the assumption that quality writing and analysis are costly and therefore meaningful signals.

Our systems are very much not ready for the revelation that this is no longer true, as this planning objection AI shows

November 9, 2025 at 11:39 PM

Ethan Mollick

@emollick.bsky.social

This is a cool paper showing that first-gen college students don't realize a lot of unwritten rules that lead to success (the value of internships, student clubs, letters from professors).

But giving them access to an LLM for guidance significantly closes the gap. mgcuna.github.io/website/JMP_...

November 9, 2025 at 2:55 PM

Ethan Mollick

@emollick.bsky.social

Sora: "that infamous dramatic Oscar winning scene where the lead keeps getting hit by the boom mic but nobody notices"

November 5, 2025 at 4:32 AM

Ethan Mollick

@emollick.bsky.social

I have been writing for years about the fact that we are not ready for the destruction of costly signalling mechanisms. Writing used to be a way of measuring effort, ability and diligence. We still have no easy substitute

Now this paper confirms that cover letters have lost their value as predictor

November 5, 2025 at 1:48 AM

Ethan Mollick

@emollick.bsky.social

The big article on data centers in the New Yorker is pretty good, which I wasn’t expecting given the reaction on X. Lots of talk of the good and bad of AI, and it covers both bubble & non-bubble arguments.

It also featured the best version of “I spoke to a local farmer about a data center”

November 3, 2025 at 6:23 AM

Ethan Mollick

@emollick.bsky.social

I don’t think how people are tracking how quickly this is happening, for better or worse.

November 2, 2025 at 11:59 PM

Ethan Mollick

@emollick.bsky.social

Biggest gap between a brilliant passage written about a work of art and what you might expect the art to look like based on the passage?

From Walter Benjamin (the painting in the reply)

November 1, 2025 at 7:22 PM

Ethan Mollick

@emollick.bsky.social

The challenge in learning using AI is very similar to the same learning issue discovered about internet search

When we are given answers we think we learn, but we don’t. Learning is work. However, things like the “learning modes” from the AI providers help, as does using AI for tutoring not answers

October 31, 2025 at 1:55 PM

Ethan Mollick

@emollick.bsky.social

Sora: “Tiktok style high energy video explainer about the spinning columns of penguins in the sky. The pillar has always been there.” Now do it as a conspiracy theorist. Now a conspiracy debunker. Now a travel influencer

We live in a strange time (not the penguin pillar. That has always been there)

October 31, 2025 at 1:40 AM

Ethan Mollick

@emollick.bsky.social

In discussions of AI and jobs, we put too much emphasis on the technology and not enough on the corporate leaders who are actually making decisions about what they want to do with AI & its implications.

It is a time where CEO vision matters a lot, and you can see a contrast in Amazon and Walmart

October 30, 2025 at 2:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news