Sam Harsimony
banner
harsimony.bsky.social
Sam Harsimony
@harsimony.bsky.social
370 followers 410 following 1.1K posts
I write about opportunities in science, space, and policy here: https://splittinginfinity.substack.com/
Posts Media Videos Starter Packs
Pinned
Thread of my new posts in the replies (and some summaries of old ones).
Oh cool! I didn't see you in the credits
That too, but I'm excited about a defensive technology that can end the threat of nuclear war!
In theory, QC's will be great for simulating chemistry, but in the short term it's not going to make a big deal (outside some niche chem problems) because you need lots of qubits to simulate lots of atoms
Long term, AI safety is important. Short term, I'm quite suspicious of premature regulation of AI.

Regardless of who proposes it, it's the same anti-progress thinking.

www.transformernews.ai/p/how-maga-l...
How MAGA learned to love AI safety
The AI industry is engaging in one of “ the most blasphemous endeavors,” a leading MAGA figure told us
www.transformernews.ai
Writing a post about LLM inference economics. It's insane that I can talk to an LLM in detail about how its own inference works.
In the comments people were suggesting boiling ethanol as coolant and using it for base bleed drag reduction!
Reposted by Sam Harsimony
Final thread on the Tensor Economics blog!

Last thread was inference for dense models. Today we talk about the MoE models common in industry today.

Their post ends with some ominous questions about inference demand.

www.tensoreconomics.com/p/moe-infere...
Reposted by Sam Harsimony
Research reveals key differences between Low-Rank Adaptation (LoRA) and full fine-tuning of large language models, showing LoRA introduces "intruder dimensions" leading to significant forgetting. This insight could change our approach to continual learning in AI. https://arxiv.org/abs/2410.21228
LoRA vs Full Fine-tuning: An Illusion of Equivalence
ArXiv link for LoRA vs Full Fine-tuning: An Illusion of Equivalence
arxiv.org
This is great. Immunotherapies have become a new pillar of cancer treatment: we now have mRNA, CAR-T, antibodies, and checkpoint inhibitors.
while researching personalized mRNA cancer vaccines, researchers discovered that the non-specific immune effects of mRNA vaccines are strong enough that they don't need to target cancers to benefit cancer therapy

www.mdanderson.org/newsroom/res...
But I agree that 6 years sounds like a long time to be reusing old GPUs. Surely the advances (like the ones you point out) will make it more cost effective to switch to new GPUs?

A lot rides on this question.
I don't have any numbers actual length of use unfortunately.

My point about amortization is: assume the data centers follow the amortization schedule exactly. Then when you do accounting, you can divide H100 price over 6 years rather than 4. That means more net profit.

Does that make sense?
Something like this will become part of air defense. Saturate friendly airspace with these and smash into low flying enemy drones/missiles.
585 km/h (363 mph) drone speed record.

Incredible to think about how advances in 3D printing, batteries, motors, cameras, computational fluid dynamics, and VR displays come together to produce this achievement.

www.youtube.com/watch?v=-Iu6...
World's Fastest Drone | Design & Aerodynamics
YouTube video by Mike Bell
www.youtube.com
I guess point #1 can be broken down into:
1a. They're getting less profit than expected from AI so trying to find ways to increase it
1b. Broader economic conditions not looking great, so looking to shore up profit sources
Most AI companies have recently extended their GPU depreciation schedules (from ~4 to ~6 years).

That can mean:

1. They want more net profit, so amortizing hardware over more years

2. New chip offerings aren't compelling enough to switch as frequently as they expected

Other interpretations?
Reposted by Sam Harsimony
recently we got

1. DeepSeek Sparse Attention (DSA) which solved the cost angle
2. DeepSeek-OCR which solved the performance angle

also there’s memory like Letta, and then cartridges pushing it into latent space

seems obvious that we’re rapidly approaching the 1B “cognitive core” model
Reposted by Sam Harsimony
New paper by Rivera Mora & @philippstrack.bsky.social Strack challenges everything we thought we knew about mechanism design beyond expected utility. The results are wild. 🤯
Paper: "Mechanism Design Beyond Expected Utility Preferences" (Sept 2025) static1.squarespace.com/static/62d2f...
static1.squarespace.com
Reposted by Sam Harsimony
BERT is just a Single Text Diffusion Step!

A masked language models like RoBERTa, originally designed for fill-in-the-blank tasks, can be repurposed into fully generative engines by interpreting variable-rate masking as a discrete diffusion process.