Lightnews — Scholar-powered news

William Gunn

@metasynthesis.net

An HBR study of AI at work finds people who adopt it are feeling more stressed. I feel like they have the causality backwards: it's the stress-prone people who are first to adopt it, but if you're not a good driver, buying a Ferrari isn't going to make you better. hbr.org/2026/02/ai-d...

February 10, 2026 at 5:17 PM

William Gunn

@metasynthesis.net

The monks walking for peace is such a sweet story. Feels like something from the 70s. www.reuters.com/world/us/wal...

Walk for peace: Buddhist monks arrive in Washington after 2,300-mile journey

Draped in burnt-orange robes, two dozen Buddhist monks are due to finish a 2,300-mile "Walk for Peace" in Washington, D.C., on Tuesday, a self-described spiritual journey across nine states that has b...

www.reuters.com

February 10, 2026 at 5:07 PM

William Gunn

@metasynthesis.net

Jevon's Paradox strikes again

TechCrunch @techcrunch.com · 1d

Because employees could do more, work began bleeding into lunch breaks and late evenings. The employees' to-do lists expanded to fill every hour that AI freed up, and then kept going.

The first signs of burnout are coming from the people who embrace AI the most | TechCrunch

Because employees could do more, work began bleeding into lunch breaks and late evenings. The employees' to-do lists expanded to fill every hour that AI freed up, and then kept going.

techcrunch.com

February 10, 2026 at 4:00 PM

Reposted by William Gunn

Center for Adaptive Rationality (MPIB, Berlin)

@arc-mpib.bsky.social

📣 Applications for the 23rd Summer Institute on Bounded Rationality are now open!

✨Join us in Berlin @arc-mpib.bsky.social June 08–16, 2026, to explore the topic of “Decision Making in the Age of AI”.

✏️ More details + application form (deadline: March 16): www.mpib-berlin.mpg.de/research/res...

February 10, 2026 at 12:42 PM

William Gunn

@metasynthesis.net

Clearly this means you should round up your 8 best friends and go to the park

Eiko Fried @eikofried.bsky.social · 1d

Anyway how's your day going

February 10, 2026 at 4:43 AM

William Gunn

@metasynthesis.net

Hey Frontiers, if you don't want your editors corrupted by bribery, I have a brilliant idea. Pay them more than the bribe. www.researchinformation.info/news/frontie...

Frontiers warns over paid approaches to editors - Research Information

Publisher says financial compensation has been offered in connection with editorial or peer-review activities

www.researchinformation.info

February 9, 2026 at 3:50 PM

William Gunn

@metasynthesis.net

All recommendation systems do this: they show you the best matches first, but then they run out. It's fun to try to notice when that happens. You can try it on any platform.

grand theft eigenvalue 🔆 @akhilrao.bsky.social · 4d

concept: a feed that's For You for N minutes, after which it shows you increasingly Not For You posts to get you to log off

February 7, 2026 at 7:08 PM

Reposted by William Gunn

Epoch AI

@epochai.bsky.social

How can we anticipate when AI will be able to do our jobs?

You could try building benchmarks as a leading indicator of automation, but they fail to capture the complexity of real-world tasks.

So Epoch AI researcher Anson Ho argues for an alternative: try automating tasks in your own job. 🧵

February 7, 2026 at 6:03 PM

William Gunn

@metasynthesis.net

New polling out of Australia shows AI is a major concern among workers, with 69% wanting better regulation. unionsnsw.org.au/media-releas...

February 6, 2026 at 9:39 PM

William Gunn

@metasynthesis.net

TakeOverBench.com Mapping the progress to humanity losing control. From @xrobservatory.bsky.social & @pauseai.bsky.social

February 6, 2026 at 5:58 PM

William Gunn

@metasynthesis.net

Anthropic found that 4.6 surpasses their tests for level 4 (the highest level) autonomy. Instead of taking that seriously, they decided to chuck the eval and just ask people if they should release it instead. 2nd image is an artists depiction of how that went. www-cdn.anthropic.com/0dd865075ad3...

February 6, 2026 at 5:56 PM

Reposted by William Gunn

Tim Kellogg

@timkellogg.me

Opus 4.6 is here!

biggest wins on agentic search, HLE & ARC AGI 2

claude.com/blog/opus-4-...

A large comparison table showing benchmark performance across five model families, with columns labeled at the top: “Opus 4.6,” “Opus 4.5,” “Sonnet 4.5,” “Gemini 3 Pro,” and “GPT-5.2 (all models).” The Opus 4.6 column is visually highlighted with a light shaded background and rounded border.

Rows list tasks and benchmarks on the left, with percentages or scores across models:

“Agentic terminal coding (Terminal-Bench 2.0)”:
Opus 4.6: 65.4%
Opus 4.5: 59.8%
Sonnet 4.5: 51.0%
Gemini 3 Pro: 56.2% (54.2% self-reported)
GPT-5.2: 64.7% (64% self-reported, Codex CLI)

“Agentic coding (SWE-bench Verified)”:
Opus 4.6: 80.8%
Opus 4.5: 80.9%
Sonnet 4.5: 77.2%
Gemini 3 Pro: 76.2%
GPT-5.2: 80.0%

“Agentic computer use (OSWorld)”:
Opus 4.6: 72.7%
Opus 4.5: 66.3%
Sonnet 4.5: 61.4%
Gemini 3 Pro: —
GPT-5.2: —

“Agentic tool use (t2-bench)”:
Retail: Opus 4.6 91.9%, Opus 4.5 88.9%, Sonnet 4.5 86.2%, Gemini 3 Pro 85.3%, GPT-5.2 82.0%
Telecom: Opus 4.6 99.3%, Opus 4.5 98.2%, Sonnet 4.5 98.0%, Gemini 3 Pro 98.0%, GPT-5.2 98.7%

“Scaled tool use (MCP Atlas)”:
Opus 4.6: 59.5%
Opus 4.5: 62.3%
Sonnet 4.5: 43.8%
Gemini 3 Pro: 54.1%
GPT-5.2: 60.6%

“Agentic search (BrowseComp)”:
Opus 4.6: 84.0%
Opus 4.5: 67.8%
Sonnet 4.5: 43.9%
Gemini 3 Pro: 59.2% (Deep Research)
GPT-5.2: 77.9% (Pro)

“Multidisciplinary reasoning (Humanity’s Last Exam)”:
Without tools: Opus 4.6 40.0%, Opus 4.5 30.8%, Sonnet 4.5 17.7%, Gemini 3 Pro 37.5%, GPT-5.2 36.6%
With tools: Opus 4.6 53.1%, Opus 4.5 43.4%, Sonnet 4.5 33.6%, Gemini 3 Pro 45.8%, GPT-5.2 50.0%

“Agentic financial analysis (Finance Agent)”:
Opus 4.6: 60.7%
Opus 4.5: 55.9%
Sonnet 4.5: 54.2%
Gemini 3 Pro: 44.1%
GPT-5.2: 56.6% (5.1)

“Office tasks (GDPVal-AA Elo)”:
Opus 4.6: 1606
Opus 4.5: 1416
Sonnet 4.5: 1277
Gemini 3 Pro: 1195
GPT-5.2: 1462

“Novel problem-solving (ARC AGI 2)”:
Opus 4.6: 68.8%
Opus 4.5: 37.6%
Sonnet 4.5: 13.6%
Gemini 3 Pro: 45.1% (Deep Thinking)
GPT-5.2: 54.2% (Pro)

“Graduate-level reasoning (GPQA Diamond)”:
Opus 4.6: 91.3%
Opus 4.5: 87.0%
S…

February 5, 2026 at 6:03 PM

William Gunn

@metasynthesis.net

There haven't been enough feuds among the big AI labs. This ought to fix it.

Cameron @cameron.stream · 6d

Okay well the OpenAI CMO is hopping on now and it is obvious that Anthropic hurt them and they were deeply unprepared

February 5, 2026 at 9:54 PM

Reposted by William Gunn

Grace

@gracekind.net

Tracker for changes to Claude’s constitution: https://claude-soul.watch

February 4, 2026 at 11:09 PM

William Gunn

@metasynthesis.net

Still going exponential. How much time do we have left?

METR @metr.org · 6d

We estimate that GPT-5.2 with `high` (not `xhigh`) reasoning effort has a 50%-time-horizon of around 6.6 hrs (95% CI of 3 hr 20 min to 17 hr 30 min) on our expanded suite of software tasks. This is the highest estimate for a time horizon measurement we have reported to date.

February 5, 2026 at 9:48 PM

Reposted by William Gunn

FAR.AI

@far.ai

AI governance increasingly relies on broken benchmarks. @ankareuel.bsky.social found many can't distinguish signal from noise, lack documentation, and have poor validity. GPQA claims 448 multiple choice questions measure graduate reasoning. It doesn't really. 👇

February 5, 2026 at 4:32 PM

William Gunn

@metasynthesis.net

Some proper oratory here!

Omg. WTF is Happening? @lalahaenzy.com · 6d

Sir Ian McKellen performing a monologue from Shakespeare’s Sir Thomas More on the Stephen Colbert show. Never have I heard this monologue performed with such a keen sense of prescience. Nor have I ever been in this exact historical moment.TY Sir Ian, for reaching us once again.
#Pinks #ProudBlue

February 5, 2026 at 5:00 PM

Reposted by William Gunn

LawZero - LoiZéro

@law-zero.bsky.social

Learn more in our blog post, "The Scientist AI: Safe by Design, by Not Desiring":
lawzero.org/en/unlisted/...
(4/4)

LawZero | The Scientist AI: Safe by Design, by Not Desiring

Scientific theories aspire to describe what is, as opposed to prescribe what ought to be. At LawZero, we take this idea as a design principle for safe artificial intelligence: that understanding—even ...

lawzero.org

February 5, 2026 at 3:15 PM

Reposted by William Gunn

LawZero - LoiZéro

@law-zero.bsky.social

At LawZero, we're rethinking the building blocks of frontier AI to create an intelligent machine that is both highly capable and safe-by-design. We’re excited to share our first blog post outlining some of the objectives and core components of our Scientist AI project. 🧵
(1/4)

February 5, 2026 at 3:15 PM

Reposted by William Gunn

Cat Hicks

@grimalkina.bsky.social

So many developers have sent me that Anthropic skills/mastery case study that I realized I should ungate what I *already wrote* about this: beginning principles to design workflows that work *with* your mind, not against it, & protect your problem-solving

www.fightforthehuman.com/cognitive-he...

Cognitive Helmets for the AI Bicycle: Part 1

I hear people name these three fears: will developers lose their problem-solving skills, learning opportunities, and critical thinking? One science-backed area can help: better metacognitive strategie...

www.fightforthehuman.com

February 4, 2026 at 6:13 PM

William Gunn

@metasynthesis.net

If you care about the truth, you need more than just confirming data to prove a theory. You also need data that rules out other possible hypotheses.
If you just care about clicks, not so much, but be careful about exchanging long-term credibility for short term gains.

Sol Messing @solmg.bsky.social · 7d

We tracked public TikToks containing keywords like "ICE," "Alex Pretti," "Renee Good," "Trump," and "Epstein" over time. There was a big drop - but it hit everything, political and non-political alike. "Recipe" and "Oscar" posts fell off too.

February 4, 2026 at 10:45 PM

Reposted by William Gunn

Mark Hahnel

@hahnel.org

The institutions that embrace machine-first FAIR will find themselves having more impact for their research and researchers.

More reuse. More trust. More interoperability.

Value, not volume.

www.digital-science.com/blog/2026/02...

Value over Volume: The Next Ten Years of Open Data - Digital Science

Mark Hahnel shares the open data wins of the past decade, the challenges, and the future of data sharing.

www.digital-science.com

February 4, 2026 at 9:00 AM

Reposted by William Gunn

David Rand

@dgrand.bsky.social

🚨New WP "@Grok is this true?"
We analyze 1.6M factcheck requests on X (grok & Perplexity)
📌Usage is polarized, Grok users more likely to be Reps
📌BUT Rep posts rated as false more often—even by Grok
📌Bot agreement with factchecks is OK but not great; APIs match fact-checkers
osf.io/preprints/ps...

February 3, 2026 at 9:55 PM

Reposted by William Gunn

Yoshua Bengio

@yoshuabengio.bsky.social

With all the noise around AI, I hope this Report provides policymakers, researchers, and the public with the reliable evidence they need to make more informed choices. We also have an “Extended Summary for Policymakers”:
internationalaisafetyreport.org/publication/...

(18/19)

2026 Report: Extended Summary for Policymakers

The Extended Summary for Policymakers of the 2026 International AI Safety Report. The second International AI Safety Report, published in February 2026, is the next iteration of the comprehensive revi...

internationalaisafetyreport.org

February 3, 2026 at 1:16 PM

Reposted by William Gunn

FAR.AI

@far.ai

AI Safety Researchers in London 🇬🇧: Attend the London Alignment Workshop, March 2–3! Top ML researchers from industry, academia & government will discuss AI alignment, including model evaluations, interpretability, and robustness. 👇

February 4, 2026 at 9:30 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news