Ben Recht
@beenwrekt.bsky.social
7.3K followers 220 following 1.3K posts
Blog: https://argmin.substack.com/ Webpage: https://people.eecs.berkeley.edu/~brecht/
Posts Media Videos Starter Packs
beenwrekt.bsky.social
How do you feel about "incentives?"
beenwrekt.bsky.social
Optimization theory has a few important lessons for machine learning, but the list of actionable lessons is surprisingly short.
Minimal Theory
What are the most important lessons from optimization theory for machine learning?
www.argmin.net
beenwrekt.bsky.social
Likely title for my next post: "Momentum Is Real." IYKYK.
beenwrekt.bsky.social
Wrote about football again, amused by cases where even the data nerds concede you might have to consider the messiness of human psychology.
Sunday Never Knows
When do you give up on your model and trust your gut?
www.argmin.net
beenwrekt.bsky.social
A pox on early meetings messing up my ritualized posting schedule.
beenwrekt.bsky.social
Hmm, what is the gap in the go for 2 down 8? This old article by Walder says it's 59% win percentage if going for 2/47% for the XP. Using Walder's analysis, it's 50%/50% for this other case. These numbers are not precise enough to be different.
Going for 2 down 8 points: Packers pull off come-from-behind win thanks to this strategy
The Packers profited from being the first team this season to try this unconventional 2-point conversion attempt. Here's why others should follow their lead.
www.espn.com
beenwrekt.bsky.social
Why do you take psychology into account here but not when computing the odds of winning when going for 2 down 8?
beenwrekt.bsky.social
The majority of Koopman's article argues for why "at fault" metrics are deeply flawed. It's a compelling argument and worth a full read.
beenwrekt.bsky.social
In non-LLM AI news, Waymo has quietly been involved in two fatal car crashes in 2025. Phil Koopman details that this fatality rate is now comparable to humans.

People: 7/billion miles in all conditions
Waymo: 5/billion miles in geofenced conditions
Blame-Free Robotaxi Crashes Are Still Crashes
The tricky logic behind robotaxi crash metrics
philkoopman.substack.com
beenwrekt.bsky.social
Yes, and though I'm not inclined, it behooves technocrats to embrace this position and make the case for bureaucracy as a public good.

(But also reiterating: there are non-bureaucratic uses of statistics. They just don't try to reduce everything to RCTs or counterfactual reasoning.)
beenwrekt.bsky.social
Yes, I highly recommend this paper. A masterpiece.

In contrast to much potential outcomes work, Dawid is upfront that we only do experimentation with the expectation that there is some form of external validity, which necessitates modeling how the sample relates to future individuals.
beenwrekt.bsky.social
Yes, you can, but this is not necessary.
beenwrekt.bsky.social
we don't even have to think about infinite samples. Imagine you have 52 cards laid out face down on a table, and you want to estimate the prevalence of hearts. You sample 10 of them. It's not hard to imagine checking your answer on the other 42. 1/2
beenwrekt.bsky.social
But what good is lower variance if the estimand isn't real?
beenwrekt.bsky.social
My issue is that George Box ruined the brains of all subsequent statisticians who torture his quote to equate "useful" with "usable."
beenwrekt.bsky.social
In an RCT, I can either give someone a drug or not. There is no world where I can observe both. 2/2
beenwrekt.bsky.social
we don't even have to think about infinite samples. Imagine you have 52 cards laid out face down on a table, and you want to estimate the prevalence of hearts. You sample 10 of them. It's not hard to imagine checking your answer on the other 42. 1/2
beenwrekt.bsky.social
Sure, but it behooves us to convincingly argue why we should take regression coefficient seriously. Too often, people compute them because it's what they are trained to compute.
beenwrekt.bsky.social
In ITEs, you can never apply both treatments. You are reasoning about something that can't exist. 2/2
beenwrekt.bsky.social
There is a fundamental difference in the missing data problems of superpopulations vs ITEs.

You can imagine superpopulations as drawing cards from a shuffled deck. It is possible to draw every card if you sample long enough. 1/2
beenwrekt.bsky.social
I'm ok with this split, to be honest. In the first camp, would you agree that "gold standard" means "all causal inference is an RCT, either real or imagined?"

FWIW, I have more gripes with the missing data interpretation because ITEs are purely metaphysical.
beenwrekt.bsky.social
There are many ways the gold standard thing gets tossed around, but the Rubin school wants you to imagine that *all* observational studies are simulated RCTs.

It's why their whole system is broken.
beenwrekt.bsky.social
Tying together many previous threads on the blog, I respond to Ben Chugg’s critique of “A Bureaucratic Theory of Statistics.”
How do know so much about swallows?
A dialogue about the use and meaning of statistics with Ben Chugg
www.argmin.net