Ólafur Páll Geirsson
@geirsson.com
610 followers 720 following 280 posts
Building Agents at Sourcegraph. Posts about coding, AI, and family (3 kids). @olafurpg elsewhere. Based in Oslo, Norway. https://geirsson.com
Posts Media Videos Starter Packs
geirsson.com
The best mental model for buying strollers is to think of renting them by the month. The resale value holds pretty high so the final cost isn’t so bad even for the $2k premium twin strollers.
geirsson.com
The usual answer is “no” whenever people ask whether LSP can be used in a novel way.

The protocol’s strength is also its weakness, it’s very much optimized around a single human user interacting with an IDE.
geirsson.com
Meanwhile we’ll get ads in ChatGPT.
geirsson.com
Anthropic is the king of function calling and deserves their incredible revenue growth. They’ve paved the way for AI agents, not OpenAI or Google. It’s only a matter of time before Google gets the memo and Gemini starts taking function calling more seriously.
geirsson.com
Three kids done with chickenpox this month.
geirsson.com
On a second iteration, it seems like it's the web tool that's causing troubles. Disabling the web tool makes Claude 4 reach the right syntactic solution although not with the optimal token edits. Goes to show that you need to be careful with what tools you're exposing. Less is more.
geirsson.com
Sonnet 3.7 is the only model I've seen that delivers the perfect solution, it replaces the tokens for `.` and `apply` and nothing else. All other models I've tested use the worse tree replacement APIs.
geirsson.com
Surprisingly, Sonnet and Opus 4 both fail on one of my go-to codegen tests for new models

> Implement a Scalafix rule that converts foo.apply(...) to foo(...) and explain why it's semantic or syntactic

They both think it needs to be semantic (aka. have access to types and symbol).
geirsson.com
Amp Tab is coming along nicely, it's not too far from being able to replace Cursor Tab as my daily driver.
geirsson.com
Last note, even with code that can be unit tested, I still think most of the tests that AI generates is crap. And the AI generated commit messages also miss the point. I'm seeing lots of PRs now where people add AI generated tests that aren't even testing anything meaningful.
geirsson.com
The Dwarkesh episode still gave a fresh perspective on how these models work, and I have probably underestimated how powerful they will become. If you're still judging AI capabilities by today's products and today's models then you are probably also underestimating how weird things are going to get.
geirsson.com
I am knee deep in the AI hype, and I don't think software engineering will ever be the same again. I love working on ampcode.com and I see daily anecdotes how AI coding is turning software development upside-down for our users.
Amp
Everything will change.
ampcode.com
geirsson.com
Even components that can be unit tested or e2e tested via behavioral assertions have lots of implicit constraints wrt. latency or how features interact with each other in long-running user sessions that are impractical to tests in an automated fashion.
geirsson.com
The fallacy is thinking that all software engineers do is deliver code that can be tested in isolation, and AI is very good at doing that now. The problem is that tests only cover maybe 0-50% of real-world constraints.
geirsson.com
I keep shaking my head hearing AI folks claiming software engineering will be automated this year. After listening to this conversation, I better understand what they at least mean by this. These AI researchers are super smart, but they're also sort of clueless over what "software engineering" is.
geirsson.com
The Dwarkesh episode on Claude 4 is the most in-depth, balanced, and (almost) non-hype conversation I have heard on why AI researchers believe AGI is around the corner open.spotify.com/episode/3H46...
How Does Claude 4 Think? — Sholto Douglas & Trenton Bricken
Dwarkesh Podcast · Episode
open.spotify.com
geirsson.com
Memory reminds me of the Facebook feed circa 2016. It was clearly beneficial for the company, it sure boosted engagement, but deleted my Facebook account and was better off for it.
geirsson.com
Memory in AI chatbots is overrated, it turns the LLM into a sycophant by tying every response with random pieces of information that got extracted from past conversations.

I’m sure memory is great for engagements/likability, but it’s turned me off ChatGPT personally.
geirsson.com
After starting working on Amp (ampcode.com ):

- No meetings
- No code review, just push to main
- Take responsibility for your changes
- Rarely need to create a branch off main
- Auto-release every few hours
- Prioritize user bug reports whenever possible
Amp
Everything will change.
ampcode.com
geirsson.com
At the risk of being pedantic, when many people say “one shot” the actually mean zero shot pass@1.

Technically, one shot means including one example output in the prompt, and most prompts don’t do that.

Not blaming, I even catch myself saying one shot meaning pass@1.
geirsson.com
Contrary to popular belief, the models are surprisingly bad at writing CSS.
geirsson.com
There’s a different trajectory for the people who are excited about AI because it enables them to build more expertise, or skip building expertise.

Concrete example, I love AI because it helps me learn CSS faster, not because AI writes all my CSS so I don’t have to learn it.
geirsson.com
git worktrees are overrated, they're a performance optimization that only makes sense when working in a repo that's super slow to clone.

For normal repos, just clone twice and enjoy benefits like being able to check out the main branch in both clones at the same time.
geirsson.com
Sprinkling Copilot dependencies across the VS Code codebase is a great technique to make it more annoying to keep a fork up-to-date. Well played, Microsoft.