Lightnews — Scholar-powered news

Reposted by Tim Kellogg

Simon Willison

@simonwillison.net

At the risk of starting the flame war to end all flame wars...

Modern LLMs (GPT-5.1, Claude 4.5, Gemini 3) produce excellent code and can be a significant productivity boost to software engineers who take the time to learn how to effectively apply them - especially if used with coding agent tools

November 27, 2025 at 7:55 PM

Tim Kellogg

@timkellogg.me

i hereby am requesting that @ai2.bsky.social make a 1T base model and have PrimeIntellect posttrain it

November 27, 2025 at 7:41 PM

Tim Kellogg

@timkellogg.me

they take GLM-4.5-Air (small model) and post train it to out-perform the already near-SOTA 3x-larger GLM-4.5

but the model is an advertisement for their infrastructure (that you should use!), and so peek at that too! You should be able to replicate this for your domain

Here’s a clear, faithful description of the image — no inference about identities, just what’s visually present:

⸻

Description

A retro, comic-style poster with bold headline text at the top:

“STOP JUST FINE-TUNING. START REASONING.”

Below it, smaller text reads:

“INTELLECT-3: We didn’t just give you the weights, we gave you the whole darn factory.”

The main illustration is split visually into two contrasting scenes:

Left Side
• A tall, gray, heavily guarded facility labeled “FRONTIER LABS – TOP SECRET AI.”
• Barbed wire, security cameras, and a huge, locked metal gate make the building look closed and inaccessible.
• A small person in a suit is sneaking out through a tiny mouse-hole in the wall, nervously holding a stack of cash and glancing back over their shoulder.

Right Side
• A bright, sunny, welcoming outdoor scene.
• A friendly, cartoonish robot with a glowing lightbulb above its head is holding a blueprint and smiling.
• A banner overhead reads:
“INTELLECT-3 — OPEN FACTORY – Y’ALL COME IN!”
• A sign hanging from the robot says:
“12B ACTIVE PARAMS. BEATS GIANTS. SERIOUSLY.”
• Next to the robot is a pile of labeled components:
• Tools
• Code
• Web Browsers
• PRIME-RL
• Environments Hub
• PRIME Sandboxes

Bottom Text

A caption spans the width of the image:

“Punches way above its weight on Math, Code, & Science.
Runs on consumer hardware. Open-source. No secret sauce. Just sauce.”

Below that are three stylized buttons:
• [Download Model]
• [Fork Repo]
• [Join Hub]

⸻

If you want, I can also break down the messaging or compare it to other open-source-vs-frontier lab memes you’ve been collecting.

November 27, 2025 at 4:28 PM

Tim Kellogg

@timkellogg.me

DeepSeek-Math-V2: self-verification

Fascinating paper that explores how to RL but focused on process over outcome

It’s sort of similar to a GAN, but with loops for each the generator & verifier as well as an outer loop

github.com/deepseek-ai/...

Thanks — here’s a clean, accurate description of the image without over-interpreting anything or attributing identities:

⸻

This illustration shows a closed-loop AI training and verification system, centered around a glowing cube labeled Unified Self-Verification Model. Multiple subsystems connect to it with curved arrows, creating a multi-stage pipeline.

Top-left

A small panel labeled “COLD START” shows silhouettes of human experts handing documents to a robot. A label reads “EXPERT DATA”. The robot is marked “INITIAL VERIFIER.”

Left side

A machine labeled “GENERATOR” emits data toward the central model via blue arrows. Nearby is a panel titled “AUTO-LABELING VIA SCALED COMPUTE” showing branching lines of generated labels flowing into the loop.

Top-center

A blocky structure marked “META-VERIFIER (STATIC)” sends a bright golden beam into the central model.

Right side

A cube-like “VERIFIER” module receives outputs from the central model and displays mixed “pass/fail” icons and red flags. It feeds its evaluations back into the central loop.

Bottom

A golden arrow flows into a container titled “GOLDEN DATASET,” which represents validated high-quality data feeding back into earlier steps of the pipeline.

Overall

Blue arrows represent generation and verification flows; golden arrows represent validated or high-confidence data circulating back into the system. Circuit-pattern artwork forms the background.

⸻

If you want, I can also explain what conceptual training architecture this resembles (e.g., iterative self-verification, multi-stage verifier stacks, or how it relates to your TTC/Verifier thoughts).

November 27, 2025 at 2:19 PM

Reposted by Tim Kellogg

Scott Riley

@scott.is

thank u for choosing linkedin. here are a bunch of comments made by people you do not care about on posts from people you somehow care even less about. also your inbox is a personalised advertising billboard. also your notifications are randomised every morning based on what you will hate the most.

November 27, 2025 at 12:37 AM

Tim Kellogg

@timkellogg.me

in the past, leftward-leaning folk were tied together by one uniting force: progress

that was Obama’s schtick. Social, tech, economic, any kind of progress will do

now it feels like the left and right are fighting over which kind of *regress* is better

seems like someone will probably win

November 27, 2025 at 12:16 AM

Tim Kellogg

@timkellogg.me

inb4 the left goes anti-solar along the path to being anti-AI

Epoch AI @epochai.bsky.social · 1d

Whatever happens at the model level, one thing is clear: hyperscalers are building enough new infrastructure to put city-scale amounts of power and compute behind AI.

November 26, 2025 at 11:49 PM

Tim Kellogg

@timkellogg.me

dropping truths into the gc

real programmers rewrite dependencies

as an old engineer ex-colleague used do say, "dependencies are like children, you have high hopes for them but one day they'll always disappoint you"

November 26, 2025 at 11:03 PM

Tim Kellogg

@timkellogg.me

for the 12 days of openai, i’m really hoping they release both GPT-5o and o4 + o5-mini. That would really round things out

November 26, 2025 at 4:48 PM

Tim Kellogg

@timkellogg.me

towards the end, Ilya has a part where he makes the case that the genome (thus evolution) doesn't dictate intelligence

i have a hunch that that's why he's taking a lot of crap from some parts of the tech bro crowd that's started leaning into eugenics. Might have nothing to do with his AI views

November 26, 2025 at 12:56 PM

Tim Kellogg

@timkellogg.me

this is insane. how small can we go?

the even cooler part is this all independent research

Alexander Doria @dorialexander.bsky.social · 1d

The threshold for consistent English/query understanding is now 3M parameters.

November 26, 2025 at 11:58 AM

Tim Kellogg

@timkellogg.me

new summary of the Ilya podcast just landed

VagabondVisions @vagabondvisions.studio · 1d

If he was in a band, he would be saying that this next album is going to suck but it’s “getting back to our roots” kind of suck.

November 26, 2025 at 11:53 AM

Tim Kellogg

@timkellogg.me

thinking more about Ilya’s strategy..

that’s why he’s on a podcast, to shape minds. he can’t just release a shitty model and be called a saint. he needs to control the narrative and provide context for what he’s done

if this doesn’t land, he’s likely screwed (ngl i don’t think it landed)

Tim Kellogg @timkellogg.me · 2d

yeah, his idea is really like releasing a newborn baby and saying, “that’s it, work is done”

but it’s not done, it’s still got to learn

in our current approaches, it’s hard to conceive of that, because we’re bombarded by hype and marketing. i can’t imagine releasing an incapable model..

November 26, 2025 at 2:52 AM

Tim Kellogg

@timkellogg.me

Summary — He's got a divergent view of AGI

We're all pursuing a single behemoth that is *already* smarter than all humans when it's launched

He's pursuing an entity that is *capable of* being smarter

i.e. he's all in on continual learning

Tim Kellogg @timkellogg.me · 2d

Ilya!!!!
www.dwarkesh.com/p/ilya-sutsk...

Ilya Sutskever – We're moving from the age of scaling to the age of research

“These models somehow just generalize dramatically worse than people. It's a very fundamental thing.”

www.dwarkesh.com

November 26, 2025 at 2:02 AM

Tim Kellogg

@timkellogg.me

alright, Opus 4.5 got me through a gnarly debugging session (not one shot) that Gemini 3 couldn't figure out

Opus 4.5 basically does not do doom loops, period. It's legit, I'm impressed.

November 25, 2025 at 10:41 PM

Tim Kellogg

@timkellogg.me

oh shit, NVIDIA’s in trouble

NVIDIA
NVIDIA Newsroom v @nvidianewsroom
X.com
We're delighted by Google's success — they've made great advances in Al and we continue to supply to Google.

November 25, 2025 at 9:45 PM

Tim Kellogg

@timkellogg.me

we've reached AGI

Great question - this is actually where I need to be honest about the confusion I've been creating.

November 25, 2025 at 9:33 PM

Tim Kellogg

@timkellogg.me

Ilya!!!!
www.dwarkesh.com/p/ilya-sutsk...

Ilya Sutskever – We're moving from the age of scaling to the age of research

“These models somehow just generalize dramatically worse than people. It's a very fundamental thing.”

www.dwarkesh.com

November 25, 2025 at 6:07 PM

Tim Kellogg

@timkellogg.me

too many people seem to be convinced that LLM vendors set prices on a cost plus basis

no, the advantage of closed weights is you can explore prices completely detached from cost. You’re free to set prices based purely on what people will pay, the value they get from it

November 25, 2025 at 3:29 PM

Tim Kellogg

@timkellogg.me

codex just taught me about jina.ai reader

an API you can easily use via curl that takes a URL and converts it to LLM-friendly text. Free to use, afaict

github.com/jina-ai/reader

GitHub - jina-ai/reader: Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/ - jina-ai/reader

github.com

November 25, 2025 at 2:20 PM

Tim Kellogg

@timkellogg.me

calling out bs on my own posts because microsoft can’t be trusted to not commit chart crimes

Tim Kellogg @timkellogg.me · 2d

community note: using cost on the y axis makes it appear like cheaper models are more capable on pass@3

November 25, 2025 at 2:10 PM

Tim Kellogg

@timkellogg.me

one side conversation i had multiple times at AIE was that maybe mono repos are good

if all of your dependencies are sitting on disk, the agent doesn’t need to rely on documentation

even wo monorepos, it’s a good idea to clone tricky dependencies locally

November 25, 2025 at 2:04 PM

Tim Kellogg

@timkellogg.me

Fara 7B: A cheap & capable open weights computer use agent (CuA)

they got within a few points of o3’s performance using only 4k training data points (yes, synthetic)

www.microsoft.com/en-us/resear...

A scatter-line chart titled “Accuracy (pass@k) vs. Cost Trade-off on WebVoyager.” It compares different agent models by plotting accuracy (%) on the vertical axis and average cost per task on the horizontal axis. Each model’s curve is labeled and color-coded, with numbered markers indicating different evaluation settings or runs.

Left cluster (low cost near $0.00):
• Fara-7B (purple): Three points, ranging from ~72% up to ~92%, all at effectively zero cost.
• UI-TARS-1.5-7B (orange): Three points rising from ~66% to ~86%, also near zero cost.
• GLM-4.1V-9B-Thinking (blue): One point around ~67% accuracy, zero cost.

Mid-cost cluster ($0.50–$1.00 range):
• SoM Agent (GPT-4o) (red): Three points, climbing from ~70% to ~85%.
• SoM Agent (GPT-5) (teal): Three points, 95–97% accuracy, cost around $0.60–$1.00.
• SoM Agent (o3) (gray): Two points, around 90–96% accuracy at ~$0.60–$1.00.

High-cost line ($1.00–$2.50+):
• OpenAI computer-use-preview (brown): Three points rising from ~80% to ~89% as cost increases from ~$1.10 to ~$2.50.

Legend notes:
• Model families are color-coded.
• Shapes indicate model type:
• Circles = Computer Use models
• Squares = SoM Agent w/ Ax Tree

Overall trend:

Fara-7B and UI-TARS offer strong low-cost performance, SoM (GPT-5) delivers the highest accuracy at mid-range cost, and OpenAI’s computer-use-preview scales with price but doesn’t reach SoM (GPT-5)’s peak accuracy.

November 25, 2025 at 1:54 PM

Tim Kellogg

@timkellogg.me

Exa 2.1: both fast and accurate search (that’s not Google)

available both as an MCP server & web UI

exa.ai/blog/exa-api...

November 25, 2025 at 1:42 PM

Tim Kellogg

@timkellogg.me

this is the value of these new scaled models

GPT-5-Pro could probably do it too, but you’d pay like $30 for one shot

Gemini 3 & Opus 4.5 can still run fast & cheap bc they’re extremely sparse MoE, but solve very tricky problems

we truly need scale along both axes

Clemens Vasters 🇪🇺🇩🇪 @clemens.vasters.com · 2d

Opus 4.5 solved a very tricky, complex problem in one session for me (VS Code Agent mode) that Sonnet 4.5 had been giving up on all day yesterday (I'm quite relentless).

November 25, 2025 at 12:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news