Lightnews — Scholar-powered news

Hrishi

@olickel.com

I think we've just scratched the surface on what's possible.

This might be the start of us actually being able to talk to models with images, conveying a lot more than what's been possible before.

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

Opus 4.5 still amazes me, that Anthropic in a single release moved from models that could sort-of understand pictures to something that actually knows what its looking at, and (from my testing) the best model for visual understanding by far.

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

What's amazing is also that these can now be collaborated on, and version controlled. You probably don't need this for a comic, but it's useful to have for other kinds of design.

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

4. Process - more specific breakdown of the actual task (in this case that's outlining each specific strip)
5. Ideas - in this case that would be the characters themselves
6. Guidelines - for us that's style guidelines

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

Obviously this is all new, but currently my design specs are structured as:
1. Background - writing, reasoning, definitions, etc.
2. Primary Task - what's the overarching objective?
3. Audience - who is this for? What is the intended outcome?

then comes more specific parts:

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

Before rushing to hook up Opus and Nano in an endless loop that burns tokens, it's worth functioning as the ferry-agent in this loop manually.

Go to Opus with the results, ask for updated specs (or addendums), and go back to nano with the specs.

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

The same thing applies to frontend design. The loop of
Generate from specs ↠ Render ↠ Critique ↠ Edit specs ↠ Regenerate works extremely well with Opus. For fun, you can also throw in Nano to generate out-there-undesignable-but-cool frontends to remix from.

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

Text as the intermediate makes the designs so much more editable. The same specs produce the same results, and changing something - at least for me - has a predictable effect.

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

It's also amazing at writing specs. The same plan->spec->build->review->spec workflow we've been using for code works *perfectly* for design, with Opus as the planning model and Banan as the executor.

Sorry - next strip coming up!

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

This entire strip (and others like it) were made from an OpusBanana collaboration.

Turns out Opus is now miles ahead of even Gemini at visual understanding. This is a model that can pick out and critique emotional impact, while noticing elements 10 pixels out of place.

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

Opus 4.5 + Nanobanana make for crazy design partners.

Opus is amazing at visual review and making very, VERY detailed specs, and banan is good at following them.

Let me tell you two stories.

December 1, 2025 at 6:22 AM

Hrishi

@olickel.com

12 of my go-to choice of model + harness combos for each job as of Dec 2025 (IMO):

1) Gemini 2.5 Pro in AI Studio for Deep Writing
2) Opus 4.5 in Cursor for the best greenfield frontend work money can buy - if you're rich

November 29, 2025 at 2:21 AM

Hrishi

@olickel.com

someone had to build this specific dialog

I wonder how it happened

November 19, 2025 at 9:12 PM

Hrishi

@olickel.com

I’m so sorry I have to leave this in the documentation or every coding agent in this codebase will keep trying to fix the feature

November 19, 2025 at 4:21 AM

Hrishi

@olickel.com

When you're tired on a flight and trying to manually review a trigger module you made two months ago

November 18, 2025 at 4:21 AM

Hrishi

@olickel.com

Trying to finish typing `git add` while the agent's editing a file just so I can preserve pristine diffs from the last change

November 14, 2025 at 11:17 PM

Hrishi

@olickel.com

Don't think there's a way I could like this article more

September 19, 2025 at 4:20 PM

Hrishi

@olickel.com

KIMI is the real deal. Unless it's really Sonnet in a trench coat, this is the best agentic open-source model I've tested - BY A MILE.

Here's a slice of a 4 HOUR run (~1 second per minute) with not much more than 'keep going' from me every 90 minutes or so.

moonshotai.github.io/Kimi-K2/

July 13, 2025 at 6:09 PM

Hrishi

@olickel.com

It seems 3 and 15 might be the new Pareto frontier for intelligence (excepting the o-series). Feels like the hedge fund 2 and 20

June 8, 2025 at 1:57 AM

Hrishi

@olickel.com

Dan's article on progressive JSON has a lot of carryover to LLMs.

The key problems for modern LLM application design that get often overlooked (I think) are:
• Streaming outputs and partial parsing
• Context organization and management (I don't mean summarising at 90%)

June 1, 2025 at 4:15 PM

Hrishi

@olickel.com

GRPO clips impact based on token probability. Lower prob tokens can move less than higher prob tokens. This means that even with random rewards (especially so), models push more into what was in-distribution. For -MATH, this is code - It thinks better in code. Therefore it gets better overall.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

Honestly it's relevant to almost all work - most agentic flows have 10-20 transitions (sometimes more) per loop.

Most flows today treat NL as reasoning, code as execution, and structured data as an extraction method. There might be problems with this approach.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

Testing this locally surprised me too. Something is definitely happening here - and it's also apparent when testing Opus vs Sonnet 4. Models reason very, VERY differently when using code vs natural language - displaying very different aptitudes working through the same problem.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

How does an LLM writing out this program (WITHOUT a code interpreter running the output) make things more accurate?

Verified on Qwen 3 - a30b (below)

Lots of interesting takeaways from the Random Rewards paper. NOT that RL is dead, but honestly far more interesting than that!

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

Now for the schemas, I agree with this assessment. Opus is the best for describing data - it has a way of being methodical that the other models (or tools) don't really have. They all managed to load the data properly, which is still a big leap.

May 24, 2025 at 6:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news