Michael Oberst
@moberst.bsky.social
1.2K followers 180 following 19 posts
Assistant Prof. of CS at Johns Hopkins Visiting Scientist at Abridge AI Causality & Machine Learning in Healthcare Prev: PhD at MIT, Postdoc at CMU
Posts Media Videos Starter Packs
moberst.bsky.social
For more details, see the paper / poster!

And if you're at UAI, check out the talk and poster today! Jacob (not on social media) and I are around at UAI, so reach out if you're interested in chatting more!

Paper: arxiv.org/abs/2502.09467
Poster: www.michaelkoberst.com/assets/paper...
moberst.bsky.social
These findings are also relevant for the design of new trials!

For instance, deploying *multiple models* in a trial has two benefits: (1) it allows us to construct tighter bounds for new models, and (2) it allows us to test whether these assumptions hold in practice.
moberst.bsky.social
We make some other mild assumptions, which can be falsified using existing RCT data. For instance, if two models have the *same* output on a given patient, then we assume outcomes are at least as good under the model with higher performance.
moberst.bsky.social
To capture these challenges, we assume that model impact is mediated by both the output of the model (A), and the performance characteristics (M).

This formalism allows us to start reasoning about the impact of new models with different outputs and performance characteristics.
moberst.bsky.social
The second challenge is trust: Impact depends on the actions of human decision-makers, and those decision-makers may treat two models differently based on their performance characteristics (e.g., if a model produces a lot of false alarms, clinicians may ignore the outputs).
moberst.bsky.social
We tackle two non-standard challenges that arise in this setting, *coverage* and *trust*.

The first challenge is coverage: If the new model is very different from previous models, it may produce outputs (for specific types of inputs) that were never observed in the trial.
moberst.bsky.social
We develop a method for placing bounds on the impact of a *new* ML model, by re-using data from an RCT that did not include the model.

These bounds require some mild assumptions, but those assumptions can be tested in practice using RCT data that includes multiple models.
moberst.bsky.social
Randomized trials (RCTs) help evaluate if deploying AI/ML systems actually improves outcomes (e.g., survival rates in a healthcare context).

But AI/ML systems can change: Do we need a new RCT every time we update the model? Not necessarily, as we show in our UAI paper! arxiv.org/abs/2502.09467
moberst.bsky.social
Hard to have a graded quiz, but still useful as an ungraded “self-assessment” (which I’ve seen) to set expectations for what kind of prereqs are expected. In some courses, you might expect those who would be scared off to drop the course later in any case, esp if drop deadline is pretty late.
moberst.bsky.social
From skimming the paper it seems more like the takeaway is: “if you binarize, you are estimating *something* that has a specific causal interpretation but it’s a weird thing (diff of two very specific treatment policies) you might not actually care about except in some special cases”
moberst.bsky.social
@matt-levine.bsky.social has a great explanation in his Money Stuff newsletter (which I also highly recommend in general)
Reposted by Michael Oberst
donskerclass.bsky.social
In this conversation I have been endorsed as "twee" and "not a crank".

BTW, I'm on the job market this year. If you are interested hiring an economist in macro/metrics/computational/ML with such stellar endorsements, please get in touch!
bengolub.bsky.social
Oh I have no strong view on the substance, I just missed conversations that go like this (not with cranks) and I'm happy there's a platform where they're happening again.
moberst.bsky.social
An example of some recent work (my first last-author paper!) on rigorous re-evaluation of popular approaches to adapt LLMs and VLMs to the medical domain
bsky.app/profile/zach...
zacharylipton.bsky.social
Medically adapted foundation models (think Med-*) turn out to be more hot air than hot stuff. Correcting for fatal flaws in evaluation, the current crop are no better on balance than generic foundation models, even on the very tasks for which benefits are claimed.
arxiv.org/abs/2411.04118
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pret...
arxiv.org
moberst.bsky.social
I'm recruiting PhD students for Fall 2025! CS PhD Deadline: Dec. 15th.

I work on safe/reliable ML and causal inference, motivated by healthcare applications.

Beyond myself, Johns Hopkins has a rich community of folks doing similar work. Come join us!
Photo of Johns Hopkins Campus
Reposted by Michael Oberst
zacharylipton.bsky.social
Medically adapted foundation models (think Med-*) turn out to be more hot air than hot stuff. Correcting for fatal flaws in evaluation, the current crop are no better on balance than generic foundation models, even on the very tasks for which benefits are claimed.
arxiv.org/abs/2411.04118
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pret...
arxiv.org
moberst.bsky.social
Would love to be added if possible, and would also nominate @monicaagrawal.bsky.social :)
moberst.bsky.social
Self-nominating for this one! All things in moderation
moberst.bsky.social
Would love to be added!
moberst.bsky.social
Late to this, but would love to be added!
Reposted by Michael Oberst
monicaagrawal.bsky.social
I am recruiting PhD students at Duke!

Please apply to Duke CS or CBB if you are interested in developing new methods and paradigms for NLP/LLMs in healthcare.
For details, see here: monicaagrawal.com/home/researc....
Reposted by Michael Oberst
sherrirose.bsky.social
I made a starter pack for health policy statistics! 📈🔢
Reposted by Michael Oberst
berkustun.bsky.social
Couldn't find a machine learning for health starter pack so I made one. 

DM/Reply if you want to be added!

go.bsky.app/PJKJ8vK
Reposted by Michael Oberst
irenetrampoline.bsky.social
First post! I'm recruiting PhD students this PhD admission cycle who want to work on: a) impactful ML methods for healthcare 🤖, b) computational methods to improve health equity ⚖️, or c) AI for women's health or climate health 🤰🌎

Apply via UC Berkeley CPH or EECS (AI-H) 🌉.

irenechen.net/join-lab/