Tim Kellogg
banner
timkellogg.me
Tim Kellogg
@timkellogg.me
AI Architect | North Carolina | AI/ML, IoT, science

WARNING: I talk about kids sometimes
Pinned
Social Media “Nutrition Label” for me for the last several days (thanks nano banana!)
oh shit, NVIDIA’s in trouble
November 25, 2025 at 9:45 PM
we've reached AGI
November 25, 2025 at 9:33 PM
too many people seem to be convinced that LLM vendors set prices on a cost plus basis

no, the advantage of closed weights is you can explore prices completely detached from cost. You’re free to set prices based purely on what people will pay, the value they get from it
November 25, 2025 at 3:29 PM
codex just taught me about jina.ai reader

an API you can easily use via curl that takes a URL and converts it to LLM-friendly text. Free to use, afaict

github.com/jina-ai/reader
GitHub - jina-ai/reader: Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/ - jina-ai/reader
github.com
November 25, 2025 at 2:20 PM
calling out bs on my own posts because microsoft can’t be trusted to not commit chart crimes
community note: using cost on the y axis makes it appear like cheaper models are more capable on pass@3
November 25, 2025 at 2:10 PM
one side conversation i had multiple times at AIE was that maybe mono repos are good

if all of your dependencies are sitting on disk, the agent doesn’t need to rely on documentation

even wo monorepos, it’s a good idea to clone tricky dependencies locally
November 25, 2025 at 2:04 PM
Fara 7B: A cheap & capable open weights computer use agent (CuA)

they got within a few points of o3’s performance using only 4k training data points (yes, synthetic)

www.microsoft.com/en-us/resear...
November 25, 2025 at 1:54 PM
Exa 2.1: both fast and accurate search (that’s not Google)

available both as an MCP server & web UI

exa.ai/blog/exa-api...
November 25, 2025 at 1:42 PM
this is the value of these new scaled models

GPT-5-Pro could probably do it too, but you’d pay like $30 for one shot

Gemini 3 & Opus 4.5 can still run fast & cheap bc they’re extremely sparse MoE, but solve very tricky problems

we truly need scale along both axes
Opus 4.5 solved a very tricky, complex problem in one session for me (VS Code Agent mode) that Sonnet 4.5 had been giving up on all day yesterday (I'm quite relentless).
November 25, 2025 at 12:48 PM
that’s it, i’m calling it, software engineering is over

AI can do everything an engineer can do
Yeah it’ll do that now I’ve head
November 25, 2025 at 12:23 PM
Reposted by Tim Kellogg
Opus 4.5 solved a very tricky, complex problem in one session for me (VS Code Agent mode) that Sonnet 4.5 had been giving up on all day yesterday (I'm quite relentless).
November 25, 2025 at 7:32 AM
merging the kiddo’s trio of passions for space facts, hamsters, and K-pop demon hunter
November 25, 2025 at 12:44 AM
i built this for myself a few months ago. it worked well, except that i only launched them in subagents (to preserve the prefix cache). this would probably work a lot better

no such thing as too many tools!
A tool for searching for relevant tools to keep context clean?

Was thinking about this last night as I approached sleep and glad to find this morning that one of the thought leaders rolled out this capability

www.anthropic.com/engineering/...
Introducing advanced tool use on the Claude Developer Platform
Claude can now discover, learn, and execute tools dynamically to enable agents that take action in the real world. Here’s how.
www.anthropic.com
November 25, 2025 at 12:39 AM
oooh, codex-cli is doing 2 things at once

did it start supporting subagents? i missed that
November 24, 2025 at 10:44 PM
i reeeally hope this is what it looks like

i’d love to hear from Ilya, and also i assume Ilya wouldn’t talk unless he had something interesting to say, some tidbit of news also dropping tomorrow
November 24, 2025 at 10:18 PM
OpenAI is planning on releasing 2 models in the next few months:

- GPT-5.2: Successor, very good at programming
- Shallotpeat: fixed pre-training + new base for the IMO Gold math model

I'm really curious about Shallotpeat. Sounds like a redo of GPT-4.5
November 24, 2025 at 10:10 PM
its both exciting and slightly frustrating that Opus 4.5 is both better and worse than Gemini 3 Pro

Opus => Coding
Gemini => Problem solving, explaining
November 24, 2025 at 10:06 PM
Opus 4.5 scored 50% higher than Gemini 3 Pro on the “system card page count” benchmark
system card

assets.anthropic.com/m/64823ba748...

oh, high alignment and low rates of concerning behavior? sounds like bliss
November 24, 2025 at 8:30 PM
Opus 4.5

Now 1/3rd the cost, and SOTA in programming

Like Gemini 3 Pro, people note that it can see a lot deeper into tough problems. That big model smell..

www.anthropic.com/news/claude-...
November 24, 2025 at 8:09 PM
Reposted by Tim Kellogg
I think the very short history of public language models shows that slips in regulation led to huge popular surges

Pointing to the public has moral panic about alignment, but they want the raw stuff. Like they CRAVE the raw stuff
Anthropic has no competitors, because nobody else sells Claude

we’re expecting Opus 4.5 soon, and time will tell if they understand this

It’s over if Opus 4.5 is yet another over-RLVR’d braindead shell-of-a-Claude
November 24, 2025 at 4:38 PM
Reposted by Tim Kellogg
Bad decisions are key. If you want good decisions, what you’re looking for is called a “game.”
Great keynote talk on the fundamentals of storytelling by @antonyjohnston.bsky.social at the ever-brilliant AdventureX
November 24, 2025 at 3:31 PM
Bring Me To Life
open.spotify.com
November 24, 2025 at 3:03 PM
Anthropic has no competitors, because nobody else sells Claude

we’re expecting Opus 4.5 soon, and time will tell if they understand this

It’s over if Opus 4.5 is yet another over-RLVR’d braindead shell-of-a-Claude
November 24, 2025 at 2:33 PM
my benchmark for AI models is how much they change life for me, normally it takes a few weeks to run, but:

- Gemini 3 + nano banana is massive, probably biggest change in 6 months

- GPT-5 is small, but 5.1 + 5.1-codex is actually a moderate jump

- o3 might be biggest jump of the year
November 24, 2025 at 12:56 PM