Simon P. Couch
banner
simonpcouch.com
Simon P. Couch
@simonpcouch.com
he/him - writing statistical software at Posit, PBC (née RStudio)🥑

simonpcouch.com, @simonpcouch elsewhere
0 to 60 most days😭😂
December 27, 2025 at 5:14 PM
we're so back
December 13, 2025 at 9:41 PM
actually laughed out loud😂 not what I meant!

for source code: github.com/skaltman/mod...

my feeling is "we'll benchmark whatever the default on the API is, don't make me swim through all of your reasoning settings" but we don't strictly do that atm
model-eval/eval/02_eval.R at main · skaltman/model-eval
Contribute to skaltman/model-eval development by creating an account on GitHub.
github.com
December 12, 2025 at 10:57 PM
They are cheaper and don't do nearly as well on the eval as the other selected models! The -Pro version thinks for a while and does much better. :)
December 12, 2025 at 8:52 PM
Thanks so much for that comment, that was the wrong model. Updated in the post!
December 12, 2025 at 1:00 PM
Oh, that's awesome. I'll check it out.
December 10, 2025 at 10:53 PM
I haven't tried. :)

The crunching of the ~2,000 tok prompt (and thus time to first token) seems to take much longer than the streaming of tokens once it begins! I've been surprised at the tok/s I've been seeing, even without spec dec.
December 10, 2025 at 5:53 PM
Thanks for the shoutout!
December 10, 2025 at 5:04 PM
Yes! Maybe I ought to have been clearer that that can be changed--the issue I unknowingly ran into was the value that'd be set to if I didn't set anything myself.
December 10, 2025 at 4:39 PM