Tim Kellogg
banner
timkellogg.me
Tim Kellogg
@timkellogg.me
AI Architect | North Carolina | AI/ML, IoT, science

WARNING: I talk about kids sometimes
Pinned
Social Media “Nutrition Label” for me for the last several days (thanks nano banana!)
towards the end, Ilya has a part where he makes the case that the genome (thus evolution) doesn't dictate intelligence

i have a hunch that that's why he's taking a lot of crap from some parts of the tech bro crowd that's started leaning into eugenics. Might have nothing to do with his AI views
November 26, 2025 at 12:56 PM
this is insane. how small can we go?

the even cooler part is this all independent research
The threshold for consistent English/query understanding is now 3M parameters.
November 26, 2025 at 11:58 AM
new summary of the Ilya podcast just landed
If he was in a band, he would be saying that this next album is going to suck but it’s “getting back to our roots” kind of suck.
November 26, 2025 at 11:53 AM
thinking more about Ilya’s strategy..

that’s why he’s on a podcast, to shape minds. he can’t just release a shitty model and be called a saint. he needs to control the narrative and provide context for what he’s done

if this doesn’t land, he’s likely screwed (ngl i don’t think it landed)
yeah, his idea is really like releasing a newborn baby and saying, “that’s it, work is done”

but it’s not done, it’s still got to learn

in our current approaches, it’s hard to conceive of that, because we’re bombarded by hype and marketing. i can’t imagine releasing an incapable model..
November 26, 2025 at 2:52 AM
Summary — He's got a divergent view of AGI

We're all pursuing a single behemoth that is *already* smarter than all humans when it's launched

He's pursuing an entity that is *capable of* being smarter

i.e. he's all in on continual learning
November 26, 2025 at 2:02 AM
alright, Opus 4.5 got me through a gnarly debugging session (not one shot) that Gemini 3 couldn't figure out

Opus 4.5 basically does not do doom loops, period. It's legit, I'm impressed.
November 25, 2025 at 10:41 PM
oh shit, NVIDIA’s in trouble
November 25, 2025 at 9:45 PM
we've reached AGI
November 25, 2025 at 9:33 PM
too many people seem to be convinced that LLM vendors set prices on a cost plus basis

no, the advantage of closed weights is you can explore prices completely detached from cost. You’re free to set prices based purely on what people will pay, the value they get from it
November 25, 2025 at 3:29 PM
codex just taught me about jina.ai reader

an API you can easily use via curl that takes a URL and converts it to LLM-friendly text. Free to use, afaict

github.com/jina-ai/reader
GitHub - jina-ai/reader: Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/ - jina-ai/reader
github.com
November 25, 2025 at 2:20 PM
calling out bs on my own posts because microsoft can’t be trusted to not commit chart crimes
community note: using cost on the y axis makes it appear like cheaper models are more capable on pass@3
November 25, 2025 at 2:10 PM
one side conversation i had multiple times at AIE was that maybe mono repos are good

if all of your dependencies are sitting on disk, the agent doesn’t need to rely on documentation

even wo monorepos, it’s a good idea to clone tricky dependencies locally
November 25, 2025 at 2:04 PM
Fara 7B: A cheap & capable open weights computer use agent (CuA)

they got within a few points of o3’s performance using only 4k training data points (yes, synthetic)

www.microsoft.com/en-us/resear...
November 25, 2025 at 1:54 PM
Exa 2.1: both fast and accurate search (that’s not Google)

available both as an MCP server & web UI

exa.ai/blog/exa-api...
November 25, 2025 at 1:42 PM
this is the value of these new scaled models

GPT-5-Pro could probably do it too, but you’d pay like $30 for one shot

Gemini 3 & Opus 4.5 can still run fast & cheap bc they’re extremely sparse MoE, but solve very tricky problems

we truly need scale along both axes
Opus 4.5 solved a very tricky, complex problem in one session for me (VS Code Agent mode) that Sonnet 4.5 had been giving up on all day yesterday (I'm quite relentless).
November 25, 2025 at 12:48 PM
that’s it, i’m calling it, software engineering is over

AI can do everything an engineer can do
Yeah it’ll do that now I’ve head
November 25, 2025 at 12:23 PM
Reposted by Tim Kellogg
Opus 4.5 solved a very tricky, complex problem in one session for me (VS Code Agent mode) that Sonnet 4.5 had been giving up on all day yesterday (I'm quite relentless).
November 25, 2025 at 7:32 AM
merging the kiddo’s trio of passions for space facts, hamsters, and K-pop demon hunter
November 25, 2025 at 12:44 AM
i built this for myself a few months ago. it worked well, except that i only launched them in subagents (to preserve the prefix cache). this would probably work a lot better

no such thing as too many tools!
A tool for searching for relevant tools to keep context clean?

Was thinking about this last night as I approached sleep and glad to find this morning that one of the thought leaders rolled out this capability

www.anthropic.com/engineering/...
Introducing advanced tool use on the Claude Developer Platform
Claude can now discover, learn, and execute tools dynamically to enable agents that take action in the real world. Here’s how.
www.anthropic.com
November 25, 2025 at 12:39 AM
oooh, codex-cli is doing 2 things at once

did it start supporting subagents? i missed that
November 24, 2025 at 10:44 PM
i reeeally hope this is what it looks like

i’d love to hear from Ilya, and also i assume Ilya wouldn’t talk unless he had something interesting to say, some tidbit of news also dropping tomorrow
November 24, 2025 at 10:18 PM
OpenAI is planning on releasing 2 models in the next few months:

- GPT-5.2: Successor, very good at programming
- Shallotpeat: fixed pre-training + new base for the IMO Gold math model

I'm really curious about Shallotpeat. Sounds like a redo of GPT-4.5
November 24, 2025 at 10:10 PM
its both exciting and slightly frustrating that Opus 4.5 is both better and worse than Gemini 3 Pro

Opus => Coding
Gemini => Problem solving, explaining
November 24, 2025 at 10:06 PM
Opus 4.5 scored 50% higher than Gemini 3 Pro on the “system card page count” benchmark
system card

assets.anthropic.com/m/64823ba748...

oh, high alignment and low rates of concerning behavior? sounds like bliss
November 24, 2025 at 8:30 PM