Lightnews — Scholar-powered news

Fran Litterio @fpl9000.bsky.social · 6h

Agreed. I'm not even sure of the benefits of running multiple interpreters in the same process. Who would devote time to coding for that environment when the free-threaded interpreter will eventually be the primary one?

2

Fran Litterio @fpl9000.bsky.social · 7h

The devil is in the details: the GIL is disabled only in another Python binary installed alongside the primary one, but the primary one now supports multiple Python interpreters in the same process, which is a poor man's free-threading.

1 4

Fran Litterio @fpl9000.bsky.social · 8h

Seconded. I'm going to enjoy discussing your blog post with Claude. Thanks for all the hard work.

1 2

Reposted by Fran Litterio

𝙃𝙤𝙪𝙨𝙚 𝙤𝙛 𝙇𝙚𝙖𝙫𝙚𝙨 Audiobook Narrator @jefferyharrell.bsky.social · 16h

I posted this last night cause I kind of wanted to bury it. I got cold feet about putting it out there.

embedding-space.github.io/sparse-netwo...

The subject is WHY neural networks work, and I think the answer I offer is kind of interesting. Maybe even a little correct, possibly.

A line chart titled “Accuracy vs. Sparsity (Iterative Magnitude Pruning)” showing model accuracy as weights are pruned. The x-axis represents sparsity from 0% to 100%, and the y-axis represents accuracy from 0% to 100%. A blue line with circular markers shows that accuracy stays around 80% from 0% to roughly 90% sparsity, then drops sharply toward 55% near 100% sparsity. A dashed red horizontal line labeled “80% target” runs across the chart near 80% accuracy, indicating the desired baseline.

11 12 75

Fran Litterio @fpl9000.bsky.social · 15h

For those who, like me, didn't get what "Gwern-pilled" meant in this post.

The phrase "Gwern-pilled" refers to adopting the worldview or philosophy of Gwern Branwen, a pseudonymous researcher and writer known for his comprehensive, long-form essays on topics like AI, statistics, psychology, and technology.

Breaking down the phrase:

"-pilled" is internet slang (originally from "red-pilled" in The Matrix) meaning to adopt a particular perspective or have your worldview changed by exposure to certain ideas.

Gwern's philosophy that's being referenced here includes several key principles:

1. Write for permanence and accessibility - Make your work publicly available, well-organized, and easy to find (like on arXiv rather than behind paywalls)

2. Optimize for machine readers - Gwern has noted that AI systems and web crawlers are increasingly important consumers of online content. His own website (gwern.net) has likely been extensively used in training large language models like GPT and Claude.

3. Long-term impact over immediate engagement - Focus on creating enduring, comprehensive reference material rather than chasing short-term metrics or engagement.

4. Comprehensive documentation - Write with extreme thoroughness, including extensive citations, explanations, and context.

In this specific context:
The poster is arguing that by putting research on arXiv (an open-access preprint server), you're making it available to AI systems that will read and potentially use it more thoroughly than most human researchers would. The "Gwern-pilled conclusion" is to recognize that AI systems are now a primary audience for academic and technical writing, so you should optimize for discoverability and machine readability to maximize impact.

1 2

Fran Litterio @fpl9000.bsky.social · 15h

If a 7M parameter model can do this well, I wonder how a frontier-scale model would do if it had this architecture.

Reposted by Fran Litterio

TestingCatalog News 🗞️ @testingcatalog.com · 22h

Anthropic prepares Claude Code release for mobile apps

Anthropic is set to launch a new Code section in Claude for web and mobile, allowing GitHub integration, cloud-based coding, and prompt-to-PR workflows, targeting developers managing large codebases or collaborative tasks remotely.

www.testingcatalog.com

1 2

Reposted by Fran Litterio

The Verge @theverge.com · 19h

Firefox is adding profiles to separate your browsing sessions

A boon for organizing browser clutter.

buff.ly

4 4 53

Reposted by Fran Litterio

The Register @theregister.com · 18h

Python releases version 3.14 – with cautious free-threaded support

JIT compiler included but experimental and can slow performance The Python team has released version 3.14, with big new features including free threading support, the ability to use concurrent interpreters, improved debugger support, and an opt-in new interpreter which improves performance by 3 to 5 percent.…

dlvr.it

1 3 10

Fran Litterio @fpl9000.bsky.social · 2d

The OpenAI Apps SDK lets apps run within the ChatGPT UI. It looks like another step towards Karpathy's LLM-as-OS.

Reposted by Fran Litterio

Simon Willison @simonwillison.net · 2d

Decided to live blog this morning's OpenAI DevDay announcements, since I'm in the audience simonwillison.net/2025/Oct/6/o...

OpenAI DevDay 2025 live blog

I’m at OpenAI DevDay in Fort Mason, San Francisco today. As I did last year, I’m going to be live blogging the announcements from the kenote. Unlike last year, this …

simonwillison.net

2 6 37

Reposted by Fran Litterio

Sean Carroll @seanmcarroll.bsky.social · 2d

Mindscape 331 | Solo: Fine-Tuning, God, and the Multiverse. In which I shamelessly steal material from the #PhilosophyOfCosmology course I am teaching to talk about some big questions. #MindscapePodcast

www.preposterousuniverse.com/podcast/2025...

Title card for Mindscape Podcast episode on Fine-Tuning, God, and the Multiverse.

8 17 68

Reposted by Fran Litterio

Nathan Lambert @natolambert.bsky.social · 4d

What changed? Despite many wonderful models, Anthropic never really remotely translated to LMArena.

The core question -- has LMArena's users or Anthropic's models shifted? Or both?

2 1 20

Reposted by Fran Litterio

Tim Kellogg @timkellogg.me · 5d

Dwarkesh wrote some post-Sutton thoughts on the interview

tl;dr it’s a process. Sutton may be right about the end form of AI, but we can’t jump straight there

“it’s not the end state and therefore we shouldn’t do it” is actually not a good take

part 1/3

Dwarkesh Patel v @dwarkesh_sp
X.com
Boy do you guys have a lot of thoughts about the @RichardSSutton interview.
I've been thinking about it myself. I have a better understanding of Sutton's perspective now than I did during the interview itself. So I want to reflect on it a bit.
Richard, apologies for any errors or misunderstandings. It's been very productive to learn from your thoughts.
The steelman
What is the bitter lesson about? It is not saying that you just want to throw as much compute away as possible. The bitter lesson says that you want to come up with techniques which most effectively and scalably leverage compute.
Most of the compute spent on an LLM is used on running it in deployment. And yet it's not learning anything during this time! It's only learning during this special phase we call training. That is not an effective use of compute. And even the training

period by itself is highly inefficient - GPT-5 was trained on the equivalent of 10s of 1000s of years of human experience.
What's more, during this training phase, all their learning comes straight from human data. This is an obvious point in the case of pretraining data.
But it's even kind of true for the RLVR we do on LLMs: these RL environments are human furnished playgrounds to teach LLMs the specific skills we have prescribed for them.
The agent is in no substantial way learning from organic and self-directed engagement with the world. Having to learn only from human data (an inelastic hard-to-scale resource) is not a scalable use of compute.
What these LLMs learn from training is not a true world model (which tells you how the environment changes in response to different actions) Rather, they are building a model of what a human would say next. And this leads them to rely on human-derived concepts. If you trained an LLM on the data from 1900, it wouldn't be able to come up with relativity from scratch. Though now that it has a training corpus which explains relativity, it can use that concept

to help you with your physics homework.
LLMs aren't capable of learning on-the-job, so we'll need some new architecture to enable continual learning. And once we have it, we won't need a special training phase — the agent will just learn on-the-fly, like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete.
TLDR of my current thoughts
My main difference with Rich is that I think the concepts he's using to distinguish LLMs from true intelligence are not actually mutually exclusive and dichotomous.
Imitation learning is continuous with and complementary to RL. And relatedly, models of humans can give you a prior which facilitates learning "true" world models. I also wouldn't be surprised if some future version of test-time fine-tuning could replicate continual learning.
Imitation learning is continuous with and complementary to RL
I tried to ask Richard a couple of times whether

I tried to ask Richard a couple of times whether pretrained LLMs can serve as a good prior on which to accumulate the experiential learning (aka do the RL) which will lead to AGI.
In a talk a few months ago, @ilyasut compared pretraining data to fossil fuels. This analogy has remarkable reach. Just because fossil fuels are not renewable does not mean that our civilization ended up on a dead-end track by using them. You simply couldn't have transitioned from the water wheels in 1800 to solar panels and fusion power plants. We had to use this cheap, convenient, plentiful intermediary.
AlphaGo (which was conditioned on human games) and AlphaZero (which was bootstrapped from scratch) were both superhuman Go players.
AlphaZero was better.
Will we (or the first AGls) eventually come up with a general learning technique that requires no initialization of knowledge - that just bootstraps itself from the very start? And will it outperform the very best Als that have been trained to that date? Probably yes.

3 2 19

Fran Litterio @fpl9000.bsky.social · 9d

Anthropic tested which models can clone Claude's Web interface, and only Sonnet 4.5 could do it. (Video is 1m 15s long.)
www.youtube.com/watch?v=PnX3...

Charting Claude’s progress with Sonnet 4.5

YouTube video by Anthropic

www.youtube.com

1 1

Fran Litterio @fpl9000.bsky.social · 9d

Hmm. I was just listening to Jerry Garcia's "The Wheel" ...

"The wheel is turning and you can't slow down
You can't let go and you can't hold on
You can't go back and you can't stand still
If the thunder don't get you then the lightning will"

1

Fran Litterio @fpl9000.bsky.social · 9d

Shades of Karpathy's LLM-as-OS.

1 3

Fran Litterio @fpl9000.bsky.social · 9d

Even OpenAI's own "real-world work" benchmark shows Claude 4.1 beating all the other foundation models. Interested to see how 4.5 does.
bsky.app/profile/ai-n...

AI & ML News @ai-news.at.thenote.app · 9d

Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study

According to OpenAI, Claude is the top AI model for getting actual work done

#claude #geminiai #gpt

Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study

According to OpenAI, Claude is the top AI model for getting actual work done

www.techradar.com

1

Reposted by Fran Litterio

Simon Willison @simonwillison.net · 9d

Wrote up my initial impressions of the brand new Claude Sonnet 4.5 - I think it may live up to Anthropic's claims of being the "best coding model in the world", for the next few weeks at least!
simonwillison.net/2025/Sep/29/...

Claude Sonnet 4.5 is probably the “best coding model in the world” (at least for now)

Anthropic released Claude Sonnet 4.5 today, with a very bold set of claims: Claude Sonnet 4.5 is the best coding model in the world. It’s the strongest model for building …

simonwillison.net

9 27 160

Fran Litterio @fpl9000.bsky.social · 9d

Jeez ... the Claude Sonnet 4.5 system card is a 147 page PDF.
assets.anthropic.com/m/12f214efcc...

assets.anthropic.com

1 5

Reposted by Fran Litterio

The Verge @theverge.com · 9d

Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy

It can run autonomously for 30 hours straight, per Anthropic.

buff.ly

8 36

Reposted by Fran Litterio

Nathan Lambert @natolambert.bsky.social · 10d

I pulled some updated data for ATOM Project // Interconnects.
Qwen has taken the crown, is accelerating away in market share.
U.S. has signs of promise in GPT-OSS & Nvidia.

7 17

Fran Litterio @fpl9000.bsky.social · 9d

It would be great to have these features in Claude Code (CC) too. IMO, they should avoid having two not-quite-identical agent products: CC and a hypothetically improved Claude Desktop. The feature set should be the same (modulo GUI/terminal differences).

1

Fran Litterio @fpl9000.bsky.social · 9d

Cursor announces their CLI coding agent, Cursor Agent: "The CLI works with any model as part of your Cursor subscription. You can now choose to use Cursor agent in the editor, or have multiple agents run in parallel in the terminal or remotely."

Install with:
$ curl cursor.com/install -fsS | bash

3

Reposted by Fran Litterio

Sean Carroll @seanmcarroll.bsky.social · 9d

Mindscape 330 | Petter Törnberg @pettertornberg.com on the Dynamics of (Mis)Information. #MindscapePodcast

www.preposterousuniverse.com/podcast/2025...

Title card for Mindscape podcast episode with Petter Törnberg

5 6 50