Mike Samuel 🟣
banner
mvsamuel.bsky.social
Mike Samuel 🟣
@mvsamuel.bsky.social
I solve large software systems problems with programming language techniques.

Previously, I was the first frontend engineer on Google Calendar, and was a security engineer who worked on the industrial-strength Mad Libs undergirding Gmail.
Pew's 2019 numbers (7.5%) seem roughly in line with Dr. Chelwa's paper's 2018 numbers (6.4%) which is linked later in this thread.
November 28, 2025 at 8:55 PM
Updoc, erights.org/elang/tools/... , asserts that some subset of good tests are also good documentation and tries to bridge the two with a REPL-syntax.
November 28, 2025 at 5:43 PM
Practice of Programming is old but chapter 6 talks about properties: tests that pass are quiet. Test suites cover conservation properties and corner cases, check equivalence to simple&slow implementations, cover inputs that are unlikely when used in good faith.
November 28, 2025 at 5:41 PM
Yeah, without that context from the précis that it's when performing a high-risk task, the 40% stat is meaningless.

E.g. high risk: generating a database query from an untrusted input which can lead to SQL injection.
Low risk: pure math, like compute compound interest.

arxiv.org/pdf/2108.09293
November 25, 2025 at 8:37 PM
What even is this? Boolean reverse implication? No, it's just the first input.

wtftw
November 25, 2025 at 8:02 PM
actuators: where values go to die
November 25, 2025 at 7:41 PM
Investing a $B in AI over 30 years would produce a crop of post docs who do great work.

Trying to jam $10B through a funnel in a fraction of the time won't.

Talented people co-opted into doing mediocre work instead of laying a solid foundation, without as you say, critical review.
November 25, 2025 at 7:23 PM
We software practitioners need to hash out what constitutes responsible AI use, and the sources for these statistics are important.

But as you say, these bags of stats do not lead us towards clarity.
November 25, 2025 at 7:18 PM
I tracked down one of those the other day: 45% of generated code has exploitable flaws.

The authors were choosing prompts for coding tasks known to entail risk, so "use this input to query the database" is within their methodology because SQL injection, but "implement compound interest" is not.
November 25, 2025 at 7:16 PM
SBI ≟ simulation based inference
November 25, 2025 at 7:02 PM
<voice trumpian>
Wales. I don't like Wales. Old Wales they call it.
Big but not bigly.
I prefer my Wales new and south.
November 25, 2025 at 4:41 PM
best of luck
November 25, 2025 at 3:24 PM
I vaguely remember a biography of Helen Keller talking about how learning language at the age of 7 revealed internal vistas.

I can't vouch for the below but it matches my recollections.
November 25, 2025 at 3:20 PM
But yeah, you don't want bottlenecks either. There's some optimal level of distribution of knowledge, and docs and tests are important ways to encode that.

These are things mgmt needs to know to value.
November 24, 2025 at 6:00 PM
I was floored when a director casually said about an HR application suite "I'll probably spin up a new team do a full rewrite in a few years."

The idea that the existing codebase and tests didn't embody lots of hard-learned lessons and that there wasn't valuable oral culture in the existing team.
November 24, 2025 at 5:58 PM
Noise marine has entered the chat.
November 24, 2025 at 5:21 PM
Sure, a tool isn't good or bad *per se*, but a use can be judged on its effect on people, and tools that overwhelmingly are turned to bad for people are bad in the ways that matter.

The distinction between tools/means and ends/moral standing is pretty explicit in Kant's second formulation.
November 24, 2025 at 5:19 PM
Yeah, I poke fun, but Scala did some really impressive things to shoehorn a nice language onto the JVM.

Bridging nominals and structurals is never going to be pretty, but it's great for DX.

bsky.app/profile/mvsa...
Nominal types are when a PL names types.
A type like X is declared in some source file, A.
Other files, B and C, can depend on it.
And if yet another depends on B and C, it can use an X value from one with the other.
November 24, 2025 at 5:02 PM
November 24, 2025 at 4:54 PM
There's study work on snowy conditions, so this is not a first rodeo situation, but things will get weird initially.

bsky.app/profile/mvsa...
November 21, 2025 at 7:06 PM
Surprising level of purity in your fall from Grace, but I forget whom I'm talking to.
November 21, 2025 at 6:26 PM
Do you have any kind of scoped/structured transactions so that a Grace program can be explicit about which prompts inherit state from prior ones?
November 21, 2025 at 6:01 PM
Nice!

We have all this new machinery for producing semi-structured outputs but I can imagine the orchestration sauce is missing to assemble outputs and get them in a structured form.
November 21, 2025 at 6:01 PM
So you can have forms driven by types that give pieces that you use to compose a prompt from a template and then there's additional promptery done so that the model application produces a structured result fitting the output type passed to prompt?
November 21, 2025 at 5:45 PM
Speedrunning soteriology
November 21, 2025 at 3:43 AM