Lightnews — Scholar-powered news

Michael Oberst @moberst.bsky.social · Jul 23

For more details, see the paper / poster!

And if you're at UAI, check out the talk and poster today! Jacob (not on social media) and I are around at UAI, so reach out if you're interested in chatting more!

Paper: arxiv.org/abs/2502.09467
Poster: www.michaelkoberst.com/assets/paper...

Michael Oberst @moberst.bsky.social · Jul 23

These findings are also relevant for the design of new trials!

For instance, deploying *multiple models* in a trial has two benefits: (1) it allows us to construct tighter bounds for new models, and (2) it allows us to test whether these assumptions hold in practice.

1

Michael Oberst @moberst.bsky.social · Jul 23

We make some other mild assumptions, which can be falsified using existing RCT data. For instance, if two models have the *same* output on a given patient, then we assume outcomes are at least as good under the model with higher performance.

1 1

Michael Oberst @moberst.bsky.social · Jul 23

To capture these challenges, we assume that model impact is mediated by both the output of the model (A), and the performance characteristics (M).

This formalism allows us to start reasoning about the impact of new models with different outputs and performance characteristics.

1

Michael Oberst @moberst.bsky.social · Jul 23

The second challenge is trust: Impact depends on the actions of human decision-makers, and those decision-makers may treat two models differently based on their performance characteristics (e.g., if a model produces a lot of false alarms, clinicians may ignore the outputs).

1

Michael Oberst @moberst.bsky.social · Jul 23

We tackle two non-standard challenges that arise in this setting, *coverage* and *trust*.

The first challenge is coverage: If the new model is very different from previous models, it may produce outputs (for specific types of inputs) that were never observed in the trial.

1

Michael Oberst @moberst.bsky.social · Jul 23

We develop a method for placing bounds on the impact of a *new* ML model, by re-using data from an RCT that did not include the model.

These bounds require some mild assumptions, but those assumptions can be tested in practice using RCT data that includes multiple models.

1

Michael Oberst @moberst.bsky.social · Jul 23

Randomized trials (RCTs) help evaluate if deploying AI/ML systems actually improves outcomes (e.g., survival rates in a healthcare context).

But AI/ML systems can change: Do we need a new RCT every time we update the model? Not necessarily, as we show in our UAI paper! arxiv.org/abs/2502.09467

1 1 5

Michael Oberst @moberst.bsky.social · Dec 30

Hard to have a graded quiz, but still useful as an ungraded “self-assessment” (which I’ve seen) to set expectations for what kind of prereqs are expected. In some courses, you might expect those who would be scared off to drop the course later in any case, esp if drop deadline is pretty late.

7

Michael Oberst @moberst.bsky.social · Dec 26

From skimming the paper it seems more like the takeaway is: “if you binarize, you are estimating *something* that has a specific causal interpretation but it’s a weird thing (diff of two very specific treatment policies) you might not actually care about except in some special cases”

1 9

Michael Oberst @moberst.bsky.social · Dec 12

I’d nominate @monicaagrawal.bsky.social

1 6

Michael Oberst @moberst.bsky.social · Dec 12

@matt-levine.bsky.social has a great explanation in his Money Stuff newsletter (which I also highly recommend in general)

1 1

Reposted by Michael Oberst

Rachel Leah Childers @donskerclass.bsky.social · Nov 9

In this conversation I have been endorsed as "twee" and "not a crank".

BTW, I'm on the job market this year. If you are interested hiring an economist in macro/metrics/computational/ML with such stellar endorsements, please get in touch!

Ben Golub @bengolub.bsky.social · Nov 9

Oh I have no strong view on the substance, I just missed conversations that go like this (not with cranks) and I'm happy there's a platform where they're happening again.

1 6 32

Michael Oberst @moberst.bsky.social · Nov 27

An example of some recent work (my first last-author paper!) on rigorous re-evaluation of popular approaches to adapt LLMs and VLMs to the medical domain
bsky.app/profile/zach...

Zachary Lipton @zacharylipton.bsky.social · Nov 26

Medically adapted foundation models (think Med-*) turn out to be more hot air than hot stuff. Correcting for fatal flaws in evaluation, the current crop are no better on balance than generic foundation models, even on the very tasks for which benefits are claimed.
arxiv.org/abs/2411.04118

Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pret...

arxiv.org

7

Michael Oberst @moberst.bsky.social · Nov 27

Application link: www.cs.jhu.edu/academic-pro...

More information: www.michaelkoberst.com/joining

Joining the Group

Computer Science, Statistics, Causality, and Healthcare

www.michaelkoberst.com

1 4

Michael Oberst @moberst.bsky.social · Nov 27

I'm recruiting PhD students for Fall 2025! CS PhD Deadline: Dec. 15th.

I work on safe/reliable ML and causal inference, motivated by healthcare applications.

Beyond myself, Johns Hopkins has a rich community of folks doing similar work. Come join us!

1 7 19

Reposted by Michael Oberst

Zachary Lipton @zacharylipton.bsky.social · Nov 26

Medically adapted foundation models (think Med-*) turn out to be more hot air than hot stuff. Correcting for fatal flaws in evaluation, the current crop are no better on balance than generic foundation models, even on the very tasks for which benefits are claimed.
arxiv.org/abs/2411.04118

Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pret...

arxiv.org

8 57 260

Michael Oberst @moberst.bsky.social · Nov 24

Would love to be added if possible, and would also nominate @monicaagrawal.bsky.social :)

1 3

Michael Oberst @moberst.bsky.social · Nov 22

Self-nominating for this one! All things in moderation

Michael Oberst @moberst.bsky.social · Nov 22

Would love to be added!

1 1

Michael Oberst @moberst.bsky.social · Nov 20

Late to this, but would love to be added!

Reposted by Michael Oberst

Monica Agrawal @monicaagrawal.bsky.social · Nov 18

I am recruiting PhD students at Duke!

Please apply to Duke CS or CBB if you are interested in developing new methods and paradigms for NLP/LLMs in healthcare.
For details, see here: monicaagrawal.com/home/researc....

6 13

Reposted by Michael Oberst

Sherri Rose @sherrirose.bsky.social · Nov 16

I made a starter pack for health policy statistics! 📈🔢

12 26 85

Reposted by Michael Oberst

Berk Ustun @berkustun.bsky.social · Nov 17

Couldn't find a machine learning for health starter pack so I made one.

DM/Reply if you want to be added!

go.bsky.app/PJKJ8vK

48 29 110

Reposted by Michael Oberst

Irene Chen @irenetrampoline.bsky.social · Nov 12

First post! I'm recruiting PhD students this PhD admission cycle who want to work on: a) impactful ML methods for healthcare 🤖, b) computational methods to improve health equity ⚖️, or c) AI for women's health or climate health 🤰🌎

Apply via UC Berkeley CPH or EECS (AI-H) 🌉.

irenechen.net/join-lab/

2 21 42