Lightnews — Scholar-powered news

Alex Luedtke @alexluedtke.bsky.social · 15d

Our method can take existing generative models and use them to produce counterfactual images, text, etc.

From a technical perspective, our approach is doubly robust and can be wrapped around state of the art approaches like diffusion models, flow matching, and autoregressive language models.

2

Alex Luedtke @alexluedtke.bsky.social · 15d

New paper on generative modeling of counterfactual distributions! We give a way to answer "what if" questions with generative models.

For example: what would faces look like if they were all smiling?

arxiv.org/abs/2509.16842

Title page for paper:

DoubleGen: Debiased Generative Modeling of Counterfactuals

arXiv:2509.16842 (stat)

Alex Luedtke, Kenji Fukumizu

Selected attributes that are more common in smiling (n = 78 080) than in non-smiling (n = 84 690) CelebA faces. If a model is trained only on the smiling subset, it tends to over-produce these attributes instead of showing how the full population would look if everyone smiled.

Table:
Lipstick Makeup Female* Earrings No-beard Blonde
Smiling 56 % 47 % 65 % 26 % 88 % 18 %
Not smiling 38 % 30 % 52 % 12 % 79 % 12 %
Overall 47 % 38 % 58 % 19 % 83 % 15 %

Counterfactual smiling celebrities generated by a traditional diffusion model trained on only smiling faces (top) and a DoubleGen diffusion model (bottom). Columns contain coupled samples, with the random seed set to the same value before generation. The stars mark the most qualitatively different pairs.

What’s visible: two horizontal rows, each showing twelve AI-generated smiling portraits.

Starred columns highlight the biggest shifts: in those pairs, DoubleGen produces faces with traits under-represented among smiling faces in the original data. Non-starred columns look nearly identical between the two rows.

1 1 7

Alex Luedtke @alexluedtke.bsky.social · Sep 9

Same - me since I was 4. CGM is fantastic.

1 2

Reposted by Alex Luedtke

arxiv.stat.ME @arxiv-stat-me.bsky.social · Aug 26

Carlos Cinelli, Avi Feller, Guido Imbens, Edward Kennedy, Sara Magliacane, Jose Zubizarreta
Challenges in Statistics: A Dozen Challenges in Causality and Causal Inference
https://arxiv.org/abs/2508.17099

4 10

Reposted by Alex Luedtke

Sam Power @spmontecarlo.bsky.social · Aug 19

I want to advertise some relatively recent work which I really like, and have been fortunate to play a small role in.

The paper is titled "A New Proof of Sub-Gaussian Norm Concentration Inequality" (arxiv.org/abs/2503.14347), led by Zishun Liu and Yongxin Chen at Georgia Tech.

1 9 37

Alex Luedtke @alexluedtke.bsky.social · Jul 24

Neat AI product for improving technical writing.

Tried it on a 50 page draft of a causal ML paper. Of its top 10 comments, 4 concerned minor technical issues I'd missed (notation error, misapplication of definition, etc.). In my experience, vanilla chatbots wouldn't have caught these.

Ben Golub @bengolub.bsky.social · Jul 24

I've been working on a new tool, Refine, to make scholars more productive. If you're interested in being among the very first to try the beta, please read on.

Refine leverages the best current AI models to draw your attention to potential errors and clarity issues in research paper drafts.

1/

1 5

Reposted by Alex Luedtke

Noah Greifer @noahgreifer.bsky.social · Jun 4

Starting to look like I might not be able to work at Harvard anymore due to recent funding cuts. If you know of any open statistical consulting positions that support remote work or are NYC-based, please reach out! 😅

11 97 150

Alex Luedtke @alexluedtke.bsky.social · May 23

I've advised 15 PhD students—10 were international students. All graduates continue advancing U.S. excellence in research and education. Cutting off this pipeline of talent would be shortsighted.

2 8

Reposted by Alex Luedtke

Bailey Bowcutt @baileybowcutt.bsky.social · May 22

I'm a current Harvard graduate student and I found out today that I had my NSF GRFP terminated without notification. I was awarded this individual research fellowship before even choosing Harvard as my graduate school

45 310 900

Reposted by Alex Luedtke

Lars van der Laan @larsvanderlaan3.bsky.social · May 19

Had a great time presenting at #ACIC on doubly robust inference via calibration

Calibrating nuisance estimates in DML protects against model misspecification and slow convergence.

Just one line of code is all it takes.

1 1 19

Alex Luedtke @alexluedtke.bsky.social · May 1

Thanks for the pointer! We'll check it out

Alex Luedtke @alexluedtke.bsky.social · Apr 30

Our main insight is that smooth divergences - like the Sinkhorn - behave locally like an MMD, and so it suffices to compress with respect to that criterion. This insight draws from recent works studying distributional limits of Sinkhorn divergences (Goldfeld et al., Gonzalez-Sanz et al.).

2

Alex Luedtke @alexluedtke.bsky.social · Apr 30

We build on earlier coreset selection works that compress with respect to maximum mean discrepancy (MMD), including kernel thinning (Dwivedi and @lestermackey.bsky.social) and quadrature (Hayakawa et al.).

3 2

Alex Luedtke @alexluedtke.bsky.social · Apr 30

We pay special attention to the Sinkhorn divergence from optimal transport. Using our method, CO2, a dataset of size n can be compressed to about size log(n) without meaningful Sinkhorn error.

1 3

Alex Luedtke @alexluedtke.bsky.social · Apr 30

New paper, led by my student Alex Kokot!
We study dataset compression through coreset selection - finding a small, weighted subset of observations that preserves information with respect to some divergence.
arxiv.org/abs/2504.20194

The Sinkhorn reconstruction error in various dimensions (left) and dataset sizes (right). In the first plot the sample size is fixed at n=25,000, and for the latter the dimension is fixed at d=10. The proposed compression method, CO2, outperforms random sampling in all settings considered.

Q-Q plots of the Sinkhorn reconstruction error (left) and l1 error between the label proportions (right) of the compressed data as compared to random samples. The proposed compression method, CO2, outperforms random sampling in all settings considered.

2 1 10

Reposted by Alex Luedtke

Carl T. Bergstrom @carlbergstrom.com · Feb 8

The NIH overhead cut doesn't just hurt universities.

It's deadly to the US economy.

The US is a world leader in tech due to the ecosystem that NIH and NSF propel. It drives innovation for tech transfer, creates a highly-skilled sci/tech workforce, and fosters academic/industry crossfertilization.

Carl T. Bergstrom @carlbergstrom.com · Feb 8

2. While NSF and NIH indeed have a mission to fund specific research innovations via grantmaking, they do a lot more than that.

Their principal role is support a scientific ecosystem in the United States, that includes everything from education and training to infrastructure and communication.

30 520 1.4K

Alex Luedtke @alexluedtke.bsky.social · Jan 24

Agreed. And when misspecified, the MLE is estimating a Kullback-Leibler projection of the true distribution onto the misspecified model (and is consistent for that as n->infinity).

1 3

Reposted by Alex Luedtke

Lars van der Laan @larsvanderlaan3.bsky.social · Jan 22

Thrilled to share our new paper! We introduce a generalized autoDML framework for smooth functionals in general M-estimation problems, significantly broadening the scope of problems where automatic debiasing can be applied!

arxiv.stat.ME @arxiv-stat-me.bsky.social · Jan 22

Lars van der Laan, Aurelien Bibaut, Nathan Kallus, Alex Luedtke
Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands
https://arxiv.org/abs/2501.11868

1 7 19

Reposted by Alex Luedtke

Rachel Leah Childers @donskerclass.bsky.social · Dec 31

My traditional end-of-year review: some papers I read and liked in 2024.
donskerclass.github.io/post/papers-...

Papers I Liked 2024 | David Childers

This has been another year where I felt like I slacked on my reading, and that probably is genuinely true for the tumultuous last half, but my read folder lists 154, so I can pick out a few that I lik...

donskerclass.github.io

2 6 32

Alex Luedtke @alexluedtke.bsky.social · Nov 24

Welcome, @danielawitten.bsky.social!

5

Reposted by Alex Luedtke

Edward H. Kennedy @edwardhkennedy.bsky.social · Nov 22

New paper! arxiv.org/pdf/2411.14285

Led by amazing postdoc Alex Levis: www.awlevis.com/about/

We show causal effects of new "soft" interventions are less sensitive to unmeasured confounding

& study which effects are *least* sensitive to confounding -> makes new connections to optimal transport

3 14 59

Alex Luedtke @alexluedtke.bsky.social · Nov 12

👋 In Tokyo this academic year, on sabbatical at the Institute of Statistical Mathematics.

In town and interested in causal ML? Would love to grab coffee and chat.

2 7

Reposted by Alex Luedtke

Adam L @lauretig.bsky.social · Apr 2

"The Elements of Differentiable Programming"

link: arxiv.org/abs/2403.14606

Basically: "autodiff - it's everywhere! what is it, and how do you use it?" seems like a good resource for anyone interested in data science, machine learning, "ai," neural nets, etc

#blueskai #stats #mlsky

Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible. As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two.

13 32

Alex Luedtke @alexluedtke.bsky.social · Oct 9

Do you know someone applying for a PhD in stat/biostat? Suggest they submit their draft application materials for feedback/mentoring!
stat.uw.edu/pre-applicat...

PARS | University of Washington Department of Statistics

The Department of Statistics at the University of Washington is launching the pre-application review service (PARS) initiative to provide support and mentorship to PhD applicants from historically marginalized groups.

stat.uw.edu

2

Alex Luedtke @alexluedtke.bsky.social · Aug 10

www.causalml-book.org

CausalML

Applied Causal Inference Powered by ML and AI. Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis.

www.causalml-book.org

2