Lightnews — Scholar-powered news

Reposted by Ian Lee

Gary Marcus

@garymarcus.bsky.social

Horseshit. Every business in the world has discovered in the last several months that GenAI is not in fact smart enough replace most of their employees.

Whatever you are reading from these (often gamed, sometimes contaminated) benchmarks does not reflect real-world real-world reality.

May 8, 2025 at 1:48 AM

Reposted by Ian Lee

Geoffrey A. Fowler

@geoffreyfowler.bsky.social

New data throws some cold water on AI accuracy, via @nitasha.bsky.social:
A test by Vals AI of models from OpenAI, Anthropic, Meta, Google, etc. found that all scored LESS THAN 50% accuracy on average for simple tasks required of entry-level financial analysts.
www.washingtonpost.com/politics/202...

Analysis | AI tools mostly fumble basic financial tasks, study finds

The Washington Post’s essential guide to tech policy news.

www.washingtonpost.com

April 23, 2025 at 8:01 PM

Reposted by Ian Lee

Bob E Hayes

@bobehayes.bsky.social

Grok 3 required training two new massive data centers operating full time for months, and 15x the compute of Grok 2 — yet all these kinds of errors feel awfully familiar." ~ @garymarcus.bsky.social

garymarcus.substack....

#AI
3/3

Grok 3 Beta in Shambles

Maximal Truth still seems far away

garymarcus.substack.com

February 21, 2025 at 2:30 AM

Ian Lee

@ianhhlee.bsky.social

Because some of us are entering into 2025 way too wound up

January 16, 2025 at 5:21 AM

Ian Lee

@ianhhlee.bsky.social

The gauntlet has been thrown down.
#generativeai

Gary Marcus @garymarcus.bsky.social · Dec 30

An AI bet! Proceeds to charity, on where AI will be at the end of 2027, with @milesbrundage.bsky.social, formerly of OpenAI.

open.substack.com/pub/garymarc...

Where will AI be at the end of 2027? A bet

We, Gary Marcus, author, scientist, and noted AI skeptic, and Miles Brundage, an independent AI policy researcher who recently left OpenAI and is bullish on AI progress, have agreed to the following b...

open.substack.com

December 31, 2024 at 6:05 AM

Ian Lee

@ianhhlee.bsky.social

Awesome stuff from Gwen! Happy reading!

Gwen Shapira @gwenshap.bsky.social · Dec 26

A bit late, but finally published my collection of good papers, blogs, videos (and one book). So we can all enjoy reading over the holidays and start 2025 inspired.

open.substack.com/pub/hackings...

What every SaaS developer should read on vacation - 2024 Edition

The annual round-up of long-ish papers and videos. Start the new year with new knowledge and inspiration.

open.substack.com

December 28, 2024 at 7:01 AM

Ian Lee

@ianhhlee.bsky.social

Been looking forward to this! I particularly like her apposition of o3's performance with strategies utilized by previous ARC-AGI winners, which left her "a bit disappointed: each of these methods went against the assumptions that I described above that, for me at least, made ARC so attractive"

Melanie Mitchell @melaniemitchell.bsky.social · Dec 23

Some of my thoughts on OpenAI's o3 and the ARC-AGI benchmark

aiguide.substack.com/p/did-openai...

Did OpenAI Just Solve Abstract Reasoning?

OpenAI’s o3 model aces the "Abstraction and Reasoning Corpus" — but what does it mean?

aiguide.substack.com

December 25, 2024 at 4:48 AM

Ian Lee

@ianhhlee.bsky.social

Looking forward to Melanie Mitchell's take.
Nathan Lambert's is sufficiently well-balanced and worth the read, in particular his statement "and add humility, here’s an example from the ARC prize that o3 did not solve. It’s very easy. We clearly have a ways to go, but you should be excited ... "

Melanie Mitchell @melaniemitchell.bsky.social · Dec 21

Excellent post about the recent OpenAI o3 results on ARC (& other benchmarks). I don't know how @natolambert.bsky.social manages to write these so quickly! I highly recommend his newsletter.

www.interconnects.ai/p/openais-o3...

I am (more slowly) writing my own take on all this, coming soon.

o3: The grand finale of AI in 2024

A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.

www.interconnects.ai

December 21, 2024 at 11:10 PM

Reposted by Ian Lee

Gary Marcus

@garymarcus.bsky.social

How many times do we have to see this same movie, where an AI beats some benchmark and influencers gleefully shout “It’s So Over” without even trying out the AI and then on careful inspection the AI turns out to not be robust or reliable?

Thousands?

(It’s already been hundreds.)

December 21, 2024 at 12:59 AM

Reposted by Ian Lee

Timothy Gowers

@wtgowers.bsky.social

The first open source program to compete in an International Mathematical Olympiad and get a gold medal (with problems and solutions in normal human format and solutions generated within the same time limit as that for human contestants) will get $5m.

December 12, 2024 at 11:46 PM

Ian Lee

@ianhhlee.bsky.social

Definitely a balanced view in contrast to all that flip-flopping

Arvind Narayanan @randomwalker.bsky.social · Dec 19

New AI Snake Oil essay: Last month the AI industry's narrative suddenly flipped — model scaling is dead, but "inference scaling" is taking over. This has left people outside AI confused. What changed? Is AI capability progress slowing? We look at the evidence. 🧵 www.aisnakeoil.com/p/is-ai-prog...

Declaring the death of model scaling is premature.

Regardless of whether model scaling will continue, industry leaders’ flip flopping on this issue shows the folly of trusting their forecasts. They are not significantly better informed than the rest of us, and their narratives are heavily influenced by their vested interests.

Inference scaling is real, and there is a lot of low-hanging fruit, which could lead to rapid capability increases in the short term. But in general, capability improvements from inference scaling will likely be both unpredictable and unevenly distributed among domains.

The connection between capability improvements and AI’s social or economic impacts is extremely weak. The bottlenecks for impact are the pace of product development and the rate of adoption, not AI capabilities.

December 19, 2024 at 2:06 PM

Reposted by Ian Lee

Csaba Szepesvari

@skiandsolve.bsky.social

If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.

December 19, 2024 at 12:55 AM

Ian Lee

@ianhhlee.bsky.social

bsky.app/profile/jdlo...

JD Long @jdlong.cerebralmastication.com · Dec 14

I interviewed an early career candidate whose resume showed fine tuning Open AI GPT models. I asked her to tell me about it and she said, "honestly it was maddening. It constantly felt like we were on the edge of getting something useful then it would be a mess in a new way." I knew she was legit.

December 14, 2024 at 8:05 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news