Matthew Carrigan
banner
carrigmat.bsky.social
Matthew Carrigan
@carrigmat.bsky.social
Engineer @huggingface. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working
He/him
Though I'd add one addendum to that thread: It seems like some EPYC CPUs don't get the full socket bandwidth (possibly based on CCD count?), so going with the absolute cheapest ones might not be the best idea. If anyone knows the true memory bandwidths for those chips, I really want to know!
November 7, 2025 at 6:27 PM
The hardware for R1 should work perfectly because K2 is actually slightly smaller despite the higher parameter count due to INT4 quantization. You should be able to fit it at full quality (Q8 attention, Q4 MoE) in 768GB!
November 7, 2025 at 6:27 PM
In particular, this bit suggests that if you inject a concept too weakly the model doesn't notice, and too strongly it just talks about the concept rather than 'introspecting'. But maybe that just means a medium strength biases towards the concept without totally overriding the original question?
October 29, 2025 at 7:32 PM
Yup, you can very clearly see a halving of stock value right after GPT-4 is released
June 15, 2025 at 9:06 PM
I think a lot of people are dismissing it by analogy to crypto, where usage took off but it was clearly useless for anything but speculative investing or laundering the proceeds of crime. It even ate up all the GPUs for years too!

I mean, they're incredibly wrong, but I can see how they got there
May 26, 2025 at 5:55 PM
One clear giveaway is that modern German still has an informal second-person "du" which bears obvious signs of shared heritage with "thou". Their similarity in sound, of course, but also their "-st" verb endings. Shakespearean "thou sayst" is almost identical to modern German "du sagst"!
May 13, 2025 at 3:21 PM
And when Leela Chess Zero did an open-source reproduction of it, they just distributed inference to volunteer computers around the globe. Of course, that probably won't work for a 700GB LLM as well as it did for a 100MB convnet, but in principle you could do the same
March 25, 2025 at 4:49 PM
The analogy here is to projects like AlphaGo/AlphaZero - far more compute was spent on calculating board positions to generate the training data than it was actually updating the model with that training data! Deepmind distributed that over tons of tiny TPUv1s iirc
March 25, 2025 at 4:49 PM
This might also herald a possible upgraded R1 reasoning model as well, using the new V3 as an improved base, but this is pure speculation on my part - I don't have any secret info!
March 24, 2025 at 6:43 PM