Vincent Carchidi
@vcarchidi.bsky.social
470 followers 580 following 1.5K posts
Defense analyst. Tech policy. Have a double life in CogSci/Philosophy of Mind. (It's confusing. Just go with it.) https://philpeople.org/profiles/vincent-carchidi All opinions entirely my own.
Posts Media Videos Starter Packs
vcarchidi.bsky.social
Oh I agree. Wasn't criticizing her. Moreso that her (skeptical) comments on LLMs make her kind of a pariah among people more bullish on them.
vcarchidi.bsky.social
In any event, props to this guy for simultaneously being a big believer in super capable LLMs *and* quoting Emily Bender.
vcarchidi.bsky.social
I came across the thread last night, and coming back to it now, I think it's much too pessimistic on several counts.

E.g. Musk controlling Grok as a news source on X is very bad, but not as bad as it sounds when interwoven with everything else discussed by Carlini.
vcarchidi.bsky.social
Lots to think about. I might just point out re: the token tax proposed by Amodei, that major firms asking to be regulated is...not incompatible with regulatory capture...
vcarchidi.bsky.social
Interesting thread...
mariaa.bsky.social
Keynote at #COLM2025: Nicholas Carlini from Anthropic

"Are language models worth it?"

Explains that the prior decade of his work on adversarial images, while it taught us a lot, isn't very applied; it's unlikely anyone is actually altering images of cats in scary ways.
vcarchidi.bsky.social
Yeah, I had no idea about some of the Soviet interpretations of people like Simon and Newell that @mraginsky.bsky.social mentions...
Reposted by Vincent Carchidi
vcarchidi.bsky.social
I think you'd agree that latter sense is less seamless than the former, but it can substitute for a less efficient process?
vcarchidi.bsky.social
Take your use of Excel. That's a software that is designed to function in very specific ways and, if used properly, it does! You don't need to question it, in that sense.

Then there's the other sense of effectively jury rigging it for use in an environment not built around Excel.
vcarchidi.bsky.social
Both need to be aligned for workflow purposes, and I think they are very much misaligned in the process you're describing.
vcarchidi.bsky.social
So I think there's two main senses in which it works:

- It operates as the programmer intended

- It operates as the end user intends
vcarchidi.bsky.social
those I think should be kept *somewhat* separate from whether the tech works as intended or not, just to get our bearings on what needs to be fixed and what doesn't need to be fixed.

These are really the problems that take years to figure out.
vcarchidi.bsky.social
Yeah I have family who work in healthcare, and documentation is maybe the most recurring thing I hear about. The doctors (so I'm told) don't have enough time to do it themselves, but they're too short staffed so they have to do it.

But the broader issues about requirements and so forth...
vcarchidi.bsky.social
Interesting example...things get a little tangled here. Like would you say this specific outcome is a tech problem or just a workflow problem?

The data that's recorded is being recorded accurately - even if not an accurate diagnosis, etc - so the pipes work in that sense?
vcarchidi.bsky.social
Gonna be reading this one today for sure
vcarchidi.bsky.social
So in clinical software, that of course doesn't have to be maximum precision or reliability for each application. But for anything that, say, delivers an output that will guide the clinician to prescribing a med, my sense is it just has to be up there. (lots of variation, but you get my point)
vcarchidi.bsky.social
Yeah I think, getting to what someone else said, I probably trot out the five nines line too often, but it does capture a point which I think has been lost: the most impact on people's lives comes from the systems capable of operating (mostly) seamlessly in sensitive environments. AI or not.
vcarchidi.bsky.social
I'll have to get past my "too much to read" crisis before I get to them tho
vcarchidi.bsky.social
Speaking of which, this preprint was just released, and there are some sections here which look very very promising.

nap.nationalacademies.org/resource/279...
vcarchidi.bsky.social
But I could be missing something tbf from the compsci side.
vcarchidi.bsky.social
Maybe I should be using a different term, but my point is getting at the indeterminacy and/or inaccuracy of a system in production, which - for those sensitive domains - really does need to be as minimal as possible.

I'm most familiar with defense apps, where it's often taken for granted.
vcarchidi.bsky.social
And I think agents today basically can't have the impact it seems like they should be having in sensitive applications in finance, clinical domains, defense, etc. because they don't reach that level yet.

I'm confident this will be overcome (maybe hybrid models), and I also think it's a must.
vcarchidi.bsky.social
The traditional standard AFAICT for sensitive or otherwise critical applications of tech - where its use is relied upon uncritically, often with time constraints - is to basically ensure that they meet *very* high accuracy/reliability, five nines so to speak (99.999%).
vcarchidi.bsky.social
holy shit that was actually a text
carlquintanilla.bsky.social
“.. On Sept. 20, Trump meant to send a private message to Attorney General Pam Bondi urging her to prosecute” Comey.

“.. Trump believed he had sent Bondi the message directly .. and was surprised to learn it was public, the officials said.” 🤡

@wsj.com
www.wsj.com/politics/pol...
vcarchidi.bsky.social
Good time, given the discussions, to re-up this piece from last year arguing that it's during AI downturns that the most impactful work in the field has been done, which later underpins the AI booms (often switching from other, less tarnished names back to "AI").

cacm.acm.org/opinion/betw...
Between the Booms: AI in Winter – Communications of the ACM
cacm.acm.org