Edward H. Kennedy
@edwardhkennedy.bsky.social
2.5K followers 260 following 64 posts
assoc prof of statistics & data science at Carnegie Mellon https://www.ehkennedy.com/ interested in causality, machine learning, nonparametrics, public policy, etc
Posts Media Videos Starter Packs
Pinned
edwardhkennedy.bsky.social
New paper! arxiv.org/pdf/2411.14285

Led by amazing postdoc Alex Levis: www.awlevis.com/about/

We show causal effects of new "soft" interventions are less sensitive to unmeasured confounding

& study which effects are *least* sensitive to confounding -> makes new connections to optimal transport
Reposted by Edward H. Kennedy
donskerclass.bsky.social
Went to look up textbook results after getting the nagging feeling that an ML paper was reinventing classical ideas, and found this gem:

"Not reading to the end of Le Cam's papers became not uncommon in later years. His ideas have been regularly rediscovered."

At least they're in good company.
Text from van der Vaart, "Asymptotic Statistics" Ch 27, http://www.stat.yale.edu/~pollard/Books/LeCamFest/VanderVaart.pdf

The theorem may have looked to somewhat too complicated to gain popularity. Nevertheless Hájek's result, for general locally asymptotically normal models and general loss functions, is now considered the final result in this direction, Hájek wrote:

"The proof that local asymptotic minimax implies local asymptotic admissibility was first given by LeCam (1953, Theorem 14). ... Apparently not many people have studied Le Cam's paper so far as to read this very last theorem, and the present author is indebted to Professor LeCam for giving him the reference"

Not reading to the end of Le Cam's papers became not uncommon in later years. His ideas have been regularly rediscovered
edwardhkennedy.bsky.social
Ok I think I'll stop now :) I'm always amazed at how ahead of its time this work was.

It's too bad it's not as widely known among us causal+ML people
edwardhkennedy.bsky.social
Once you have a pathwise differentiable parameter, a natural estimator is a debiased plug-in, which subtracts off the avg of estimated influence fn

Pfanzagl gives this 1-step estimator here - in causal inference this is exactly the doubly robust / DML estimator you know & love!
edwardhkennedy.bsky.social
Pfanzagl uses pathwise differentiability above, but w/regularity conditions this is just a distributional Taylor expansion, which is easier to think about

I note this in my tutorial here:

www.ehkennedy.com/uploads/5/8/...

Also v related to so-called "Neyman orthogonality" - worth separate thread
x.com
x.com
edwardhkennedy.bsky.social
Here’s Pfanzagl on the gradient of a functional/parameter, aka derivative term in a von Mises expansion, aka influence function, aka Neyman-orthogonal score

Richard von Mises first characterized smoothness this way for stats in the 30s/40s! eg:

projecteuclid.org/journals/ann...
Reposted by Edward H. Kennedy
edwardhkennedy.bsky.social
From twitter:

A short thread:

It amazes me how many crucial ideas underlying now-popular semiparametrics (aka doubly robust parameter/functional estimation / TMLE / double/debiased/orthogonal ML etc etc) were first proposed many decades ago.

I think this is widely under-appreciated!
edwardhkennedy.bsky.social
The m-estimator logic certainly relies on “exactly correct”

Once you start moving to “close enough” to me that means you’re no longer getting precise root-n rates with the nuisances. Then you’ll have to deal with the bias/variance consequences just as if you were using flexible ML
edwardhkennedy.bsky.social
And here for more specific discussion:

arxiv.org/pdf/2405.08525

I think DR estimation vs inference are two quite different things and we need different assumptions to make them work
arxiv.org
edwardhkennedy.bsky.social
If we really rely on 2 parametric models, we should of course use a variance estimator recognizing this. But this is more about how we model nuisances vs DR estimator itself

Also our paper here suggests strictly more assumptions are needed for DR inference vs estimation:

arxiv.org/pdf/2305.04116
arxiv.org
edwardhkennedy.bsky.social
I find it much more believable that I could estimate both nuisances consistently, but at slower rates, vs that I could pick 2 parametric models (without looking at data) & happen to get one exactly correct
edwardhkennedy.bsky.social
Hm not sure I agree with this logic…

To me the beautiful thing about the DR estimator is you can get away with estimating both nuisances at slower rates (as long as the product is < 1/sqrt(n))

This opens the door to using much more flexible methods - random forests, lasso, ensembles, etc etc
edwardhkennedy.bsky.social
"Randomized trials should be used to answer any causal question that can be so studied...

But the reality is that observational methods are used everyday to answer pressing causal questions that cannot be studied in randomized trials."

- Jamie Robins, 2002
tinyurl.com/4yuxfxes
tinyurl.com/zncp39mr
Reposted by Edward H. Kennedy
instrumenthull.bsky.social
What's the best paper you read this year?
edwardhkennedy.bsky.social
Here's the recent paper!

bsky.app/profile/edwa...
edwardhkennedy.bsky.social
In this paper we consider incremental effects of continuous exposures:

arxiv.org/abs/2409.11967

i.e., soft interventions on cts treatments like dose, duration, frequency

it turns out exponential tilts preserve all nice properties of incremental effects with binary trt (arxiv.org/abs/1704.00211)
Reposted by Edward H. Kennedy
idiaz.bsky.social
Thank you Alec for leading this project, I learned a lot! This paper has a very useful study of what contrasts are feasible in situations with many treatments and positivity violations, including necessary assumptions and efficient one-step estimators. Check it out!
alecmcclean.bsky.social
New-ish paper alert! arxiv.org/abs/2410.13522
 
We tackle the challenge of comparing multiple treatments when some subjects have zero prob. of receiving certain treatments. Eg, provider profiling: comparing hospitals (the “treatments”) for patient outcomes. Positivity violations are everywhere.
Fair comparisons of causal parameters with many treatments and positivity violations
Comparing outcomes across treatments is essential in medicine and public policy. To do so, researchers typically estimate a set of parameters, possibly counterfactual, with each targeting a different ...
arxiv.org
Reposted by Edward H. Kennedy
alecmcclean.bsky.social
New-ish paper alert! arxiv.org/abs/2410.13522
 
We tackle the challenge of comparing multiple treatments when some subjects have zero prob. of receiving certain treatments. Eg, provider profiling: comparing hospitals (the “treatments”) for patient outcomes. Positivity violations are everywhere.
Fair comparisons of causal parameters with many treatments and positivity violations
Comparing outcomes across treatments is essential in medicine and public policy. To do so, researchers typically estimate a set of parameters, possibly counterfactual, with each targeting a different ...
arxiv.org
Reposted by Edward H. Kennedy
gautamkamath.com
Found slides by Ankur Moitra (presented at a TCS For All event) on "How to do theoretical research." Full of great advice!

My favourite: "Find the easiest problem you can't solve. The more embarrassing, the better!"

Slides: drive.google.com/file/d/15VaT...
TCS For all: sigact.org/tcsforall/
Reposted by Edward H. Kennedy
alecmcclean.bsky.social
@bonv.bsky.social presented this at NYU this week -- terrific work with an excellent presentation (no surprise there)! I found the connections to higher-order estimators and the orthogonalizing property of the U-stat kernel fascinating&illuminating.
edwardhkennedy.bsky.social
Should we use structure-agnostic (arxiv.org/abs/2305.04116) or smooth (arxiv.org/pdf/1512.02174) models for causal inference?

Why not both?

Here we propose novel hybrid smooth+agnostic model, give minimax rates, & new optimal methods

arxiv.org/pdf/2405.08525

-> fast rates under weaker conditions
Reposted by Edward H. Kennedy
idiaz.bsky.social
I see renewed discussion on #statsky about the interpretation of confidence intervals. I will leave here this quote from Larry Wasserman's All of Statistics, which I love. Controlling one's lifetime proportion of studies with an interval that does not contain the parameter is surely desirable!