Lightnews — Scholar-powered news

Matthew Carrigan

@carrigmat.bsky.social

220 followers 150 following 70 posts

Engineer @huggingface. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working
He/him

Posts Replies Media Videos

Matthew Carrigan

@carrigmat.bsky.social

Though I'd add one addendum to that thread: It seems like some EPYC CPUs don't get the full socket bandwidth (possibly based on CCD count?), so going with the absolute cheapest ones might not be the best idea. If anyone knows the true memory bandwidths for those chips, I really want to know!

November 7, 2025 at 6:27 PM

Matthew Carrigan

@carrigmat.bsky.social

The hardware for R1 should work perfectly because K2 is actually slightly smaller despite the higher parameter count due to INT4 quantization. You should be able to fit it at full quality (Q8 attention, Q4 MoE) in 768GB!

November 7, 2025 at 6:27 PM

Matthew Carrigan

@carrigmat.bsky.social

In particular, this bit suggests that if you inject a concept too weakly the model doesn't notice, and too strongly it just talks about the concept rather than 'introspecting'. But maybe that just means a medium strength biases towards the concept without totally overriding the original question?

October 29, 2025 at 7:32 PM

Matthew Carrigan

@carrigmat.bsky.social

Yup, you can very clearly see a halving of stock value right after GPT-4 is released

June 15, 2025 at 9:06 PM

Matthew Carrigan

@carrigmat.bsky.social

I think a lot of people are dismissing it by analogy to crypto, where usage took off but it was clearly useless for anything but speculative investing or laundering the proceeds of crime. It even ate up all the GPUs for years too!

I mean, they're incredibly wrong, but I can see how they got there

May 26, 2025 at 5:55 PM

Matthew Carrigan

@carrigmat.bsky.social

One clear giveaway is that modern German still has an informal second-person "du" which bears obvious signs of shared heritage with "thou". Their similarity in sound, of course, but also their "-st" verb endings. Shakespearean "thou sayst" is almost identical to modern German "du sagst"!

May 13, 2025 at 3:21 PM

Matthew Carrigan

@carrigmat.bsky.social

And when Leela Chess Zero did an open-source reproduction of it, they just distributed inference to volunteer computers around the globe. Of course, that probably won't work for a 700GB LLM as well as it did for a 100MB convnet, but in principle you could do the same

March 25, 2025 at 4:49 PM

Matthew Carrigan

@carrigmat.bsky.social

The analogy here is to projects like AlphaGo/AlphaZero - far more compute was spent on calculating board positions to generate the training data than it was actually updating the model with that training data! Deepmind distributed that over tons of tiny TPUv1s iirc

March 25, 2025 at 4:49 PM

Matthew Carrigan

@carrigmat.bsky.social

This might also herald a possible upgraded R1 reasoning model as well, using the new V3 as an improved base, but this is pure speculation on my part - I don't have any secret info!

March 24, 2025 at 6:43 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news