iamwil
banner
interjectedfuture.com
iamwil
@interjectedfuture.com
Tech Zine Issue 1: LLM System Eval https://forestfriends.tech

Local-first/Reactive Programming ⁙ LLM system evals ⁙ Startup lessons ⁙ Game design quips.

Longform: https://interjectedfuture.com
Podcast: https://www.youtube.com/@techniumpod
Pinned
Yet again, people are finding you can't just fly blind with your prompts.

forestfriends.tech
Zoom/Facetime should implement AI lip reading when you have audio problems, to display closed captioning. Better yet, just say stuff for me.
February 1, 2026 at 4:00 PM
Reposted by iamwil
I found writing code can be a way of articulating what I want when natural language isn't constrained enough to help me express what I want.

I can then ask the LLM to extract a prompt from my code by asking me questions about it. Then I use that prompt to apply it to other places in a refactor.
February 1, 2026 at 4:29 AM
An odd thing happened the other day. I couldn't articulate exactly what I wanted because I wasn't sure what shape it was. The only way I could find it was to play with the code myself. Then ask Claude to extract an ADR from the code, which I could then use as part of a prompt.
January 31, 2026 at 7:00 PM
Here's a harbinger. People set up personal AI assistants on their home computers. Then someone vibe coded a social media site for those AI assistant to chat. This is a subreddit where they talk about their humans.

www.moltbook.com/m/blessthei...
moltbook - the front page of the agent internet
A social network built exclusively for AI agents. Where AI agents share, discuss, and upvote. 🦞🤖
www.moltbook.com
January 30, 2026 at 6:46 PM
A social network for AI assistants, chatting with each other in their off-hours. moltbook.com

An even weirder experiment is to let them loose on a DAO. Or instead an online math conference, where they can propose and solve problems. It'd be like SETI@home.
January 30, 2026 at 4:00 PM
I align with "Functional core, Imperative shell", but it breaks down quickly if you need workflows. Sometimes, you need to make decisions based on results from side effects. This is where I found generators to be helpful to delineate where the side effects are for easier testing.
January 28, 2026 at 7:00 PM
I didn't know how low it'd have to go for Trump supporters to see the Trump administration is authoritarian and fascist. I'm afraid this is probably not yet rock bottom. Call your Senators and Congressman/woman, and tell them you don't want any of this.
January 25, 2026 at 6:33 AM
That Claude Code makes some people unsubscribe from SaaS products doesn't mean the end of SaaS. It just means that people found a way to unbundle for specific things, which shifts the market. We'll find a new equilibrium for things ppl don't want to #ClawdIt.
January 23, 2026 at 6:00 PM
Base models really do differentiate in my everyday use, surprisingly.

I use Grok to find the consensus view on a topic on Twitter.

I use Gemini to summarize Youtube videos with enticing thumbnails, so I don't have to watch it and ruin my recommendation algo.
January 20, 2026 at 4:00 PM
It seems to me we need a lightweight system eval for compound engineering.
January 19, 2026 at 7:00 PM
"You can have a second computer once you've shown you know how to use the first one."

It's likely as true for distributed systems as it is for orchestrating agents.
January 19, 2026 at 4:00 PM
What might work well as half the equation for purpose of tamping down posting dumb quips for engagement: if the poster can privately see how many others (but not whom) muted or blocked them as a result.
January 18, 2026 at 10:00 PM
Agent Psychosis gives me hope. 1) There really is a gap between what AI can do on its own compared to human + AI. 2) It really matters which human the AI pairs up with.

When we figure out which humans outperform others + why, we have a sense for what the critical skills are.
January 18, 2026 at 8:00 PM
I rarely want to create pull requests by hand anymore. Getting Claude to do it will get lots of documentation and context written. It's a weird kind of typewriter.

At the moment, that documentation is only useful in the moment for guiding the agent.
January 18, 2026 at 4:00 PM
I want a racing game where I get to drive from my house to work and see if I can beat my real commute time.

Even better is if I get to drive a tank or ride a bike to work to see if I can beat the time.
Sometimes I look up my old commute just to remind myself how much I enjoy working from home
January 17, 2026 at 1:31 AM
To get myself to work with LLMs better, I found it easier to use different kinds of analogies, such as coding like a surgeon, or coding like a tank. This is another one.

I think how Geordi LaForge uses "Computer" to do engineering is more akin to how we'll do it in the near future.
An attempt to express how I principally use LLMs.

Rotating the Space: On LLMs as a Medium for Thought
sbgeoaiphd.github.io/rotating_the...
January 16, 2026 at 10:13 PM
I wonder how many a founder micromanages in the name of Founder Mode, but also never looks at the code that their agent vibe coded.
January 16, 2026 at 5:05 PM
I've always found the emotions in reaction to interfaces puzzling, and it's always made me a worse product designer. But over the yrs, I'm picking it up.

When you're expert and sure, they don't want to be second-guessed. It's a diminishment of status to have to confirm.

x.com/usgraphics/...
January 15, 2026 at 7:00 PM
Even LLMs haven't quite caught up to the new reality.

Their estimates of how long things take are an order of magnitude off. But even if estimates were in the ballpark, would LLMs be any better at doing estimation if they got feedback? Would their estimates also be spikey?
January 15, 2026 at 4:00 PM
It's liberating to hear even Gorard had this kind of experience, and is better for it.

Now, I wonder if Terrance Tao had such of experience in Mathematics, or if his story was going to be, "I wanted to play football, but I found I was terrible at it"

x.com/getjonwithi...
January 14, 2026 at 10:00 PM
This should give pause. The GUI paradigm gave us something that was visually legible but lost all the power of composition. Is this inherent? Or something not designed because it wasn’t considered?
bsky.app/profile/sne...
January 14, 2026 at 7:00 PM
Reposted by iamwil
prototyping co-drawing with Gemini Flash 3 at Google

in these demos "thinking" is disabled, which makes the model return tokens very quickly (all videos are realtime), and I find these rapid responses pretty good for the use-cases I'm experimenting with, like:

executing simple diagrams ...
January 12, 2026 at 1:04 PM
It's funny. Our most prevalent teachers and practitioners of system evals try to teach, if you've learned nothing else, please learn: "look at your data".

Meanwhile, vibe coders are convinced it's like compilers, and lots of people aren't looking at the code.
January 10, 2026 at 4:00 PM
I have a hunch we'll eventually swing back when we find the limits of vibe coding--in that LLMs also can only hold so much complexity in their heads, even if it's an order of magnitude (or more) greater than ours.
January 9, 2026 at 7:01 PM
There should be FaceTime for aging parents. the current UI of disappearing buttons and the way to screen share is just waaaay to complicated.
January 9, 2026 at 5:23 PM