Lightnews — Scholar-powered news

Matt Wallace

@m9e.bsky.social

Interesting. I think OpenAI may do something similar. Last night I was watching Codex try to run a `ps` and it was failing. Ultimately, I'm skeptical of these shim approaches; you want them to have tons of horsepower. Running the tool in the sandbox seems better.

October 21, 2025 at 1:36 PM

Matt Wallace

@m9e.bsky.social

So not surprised. When the “is ChatGPT‘s behavior changing over time“ paper came out I published a critique, partially because it’s not that the paper was technically wrong, but it was written with ambiguous conclusions misconstrued by the press. The world loves to see AI fail right now. 🤷🏼‍♂️

June 15, 2025 at 8:23 AM

Matt Wallace

@m9e.bsky.social

This conversation with ChatGPT goes to show: garbage in, garbage out chatgpt.com/share/6841e2...

It knows a lot but it won’t second guess you unless you ask/demand it

ChatGPT - AI Assistant Demo Analysis

Shared via ChatGPT

chatgpt.com

June 5, 2025 at 6:30 PM

Matt Wallace

@m9e.bsky.social

This fits really well with the “putting our national security conversation on telegram is a non-story” line

May 30, 2025 at 10:46 AM

Matt Wallace

@m9e.bsky.social

This was using 5 as spec tokens (which is the example in the docs but for llamacpp I've found 5-8 to be a sweet spot).

May 6, 2025 at 1:37 PM

Matt Wallace

@m9e.bsky.social

I thought we’d agreed on paperclips

April 13, 2025 at 12:35 AM

Matt Wallace

@m9e.bsky.social

Yes

April 12, 2025 at 3:54 AM

Matt Wallace

@m9e.bsky.social

Sorry, what? This seems very specific and I’ve been wondering how to make sense of different reports on quality; lmsys model is literally labeled Maverick though. Are you saying there was a different unreleased version of Maverick?

April 6, 2025 at 10:13 PM

Matt Wallace

@m9e.bsky.social

I haven’t used it enough to say but it’s interesting to think that going from 70% success to 85% success is “twice as good”. Asymptotic improvements are still significant.

March 29, 2025 at 6:44 AM

Matt Wallace

@m9e.bsky.social

Well you’re not going to make a lot of friends with the lawyers but the rest of us are pretty happy.

March 23, 2025 at 12:31 AM

Matt Wallace

@m9e.bsky.social

semianalysis iirc thinks deepseek actually had 1.3B in hardware so not a box of scraps :) But I agree there’s no need or gain long term from an artificial moat. A surgical ban on deepseek api (no gvmt use, requiring contractors/employees to disclose use) would be fine imo

March 14, 2025 at 2:31 PM

Matt Wallace

@m9e.bsky.social

💯 to be fair, china has a deserved (I think) reputation for engineering economic success with state policies. But in this case the answer is to optimize more domestically. AI is not some consumable like a car; it is creating a flywheel. We can’t afford to be crippled.

March 14, 2025 at 2:28 PM

Matt Wallace

@m9e.bsky.social

Even banning API access here is super sketchy, and if this is meant to say to ban the weights... it's not just a horrible take, it's just anti-competitive lobbying. Not quite as dumb as Hawley's bill - which arguably banned the US from using China's published algos/math - but terrible still.

March 14, 2025 at 2:11 PM

Matt Wallace

@m9e.bsky.social

Side note - when I was chiming in on github and actually, I think, triggered gg to start merging this back in, I remember I was doing 32B_Q8 w/ 7B 4KL draft but I think I still had mine set to --draft 5; I will say 7B>>>>3B so far. I may have to play around with using some even smaller drafts

March 11, 2025 at 1:10 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news