Matt Wallace
m9e.bsky.social
Matt Wallace
@m9e.bsky.social
CTO building #AI
Interesting. I think OpenAI may do something similar. Last night I was watching Codex try to run a `ps` and it was failing. Ultimately, I'm skeptical of these shim approaches; you want them to have tons of horsepower. Running the tool in the sandbox seems better.
October 21, 2025 at 1:36 PM
So not surprised. When the “is ChatGPT‘s behavior changing over time“ paper came out I published a critique, partially because it’s not that the paper was technically wrong, but it was written with ambiguous conclusions misconstrued by the press. The world loves to see AI fail right now. 🤷🏼‍♂️
June 15, 2025 at 8:23 AM
This conversation with ChatGPT goes to show: garbage in, garbage out chatgpt.com/share/6841e2...

It knows a lot but it won’t second guess you unless you ask/demand it
ChatGPT - AI Assistant Demo Analysis
Shared via ChatGPT
chatgpt.com
June 5, 2025 at 6:30 PM
This fits really well with the “putting our national security conversation on telegram is a non-story” line
May 30, 2025 at 10:46 AM
This was using 5 as spec tokens (which is the example in the docs but for llamacpp I've found 5-8 to be a sweet spot).
May 6, 2025 at 1:37 PM
I thought we’d agreed on paperclips
April 13, 2025 at 12:35 AM
Yes
April 12, 2025 at 3:54 AM
Sorry, what? This seems very specific and I’ve been wondering how to make sense of different reports on quality; lmsys model is literally labeled Maverick though. Are you saying there was a different unreleased version of Maverick?
April 6, 2025 at 10:13 PM
I haven’t used it enough to say but it’s interesting to think that going from 70% success to 85% success is “twice as good”. Asymptotic improvements are still significant.
March 29, 2025 at 6:44 AM
Well you’re not going to make a lot of friends with the lawyers but the rest of us are pretty happy.
March 23, 2025 at 12:31 AM
semianalysis iirc thinks deepseek actually had 1.3B in hardware so not a box of scraps :) But I agree there’s no need or gain long term from an artificial moat. A surgical ban on deepseek api (no gvmt use, requiring contractors/employees to disclose use) would be fine imo
March 14, 2025 at 2:31 PM
💯 to be fair, china has a deserved (I think) reputation for engineering economic success with state policies. But in this case the answer is to optimize more domestically. AI is not some consumable like a car; it is creating a flywheel. We can’t afford to be crippled.
March 14, 2025 at 2:28 PM
Even banning API access here is super sketchy, and if this is meant to say to ban the weights... it's not just a horrible take, it's just anti-competitive lobbying. Not quite as dumb as Hawley's bill - which arguably banned the US from using China's published algos/math - but terrible still.
March 14, 2025 at 2:11 PM
Side note - when I was chiming in on github and actually, I think, triggered gg to start merging this back in, I remember I was doing 32B_Q8 w/ 7B 4KL draft but I think I still had mine set to --draft 5; I will say 7B>>>>3B so far. I may have to play around with using some even smaller drafts
March 11, 2025 at 1:10 PM