Wouter van Amsterdam
@vanamsterdam.bsky.social
1.3K followers 250 following 41 posts
machine learning, causal inference, healthcare - assistant professor in dep. of Data Science Methods, Julius Center, of University Medical Center Utrecht, the Netherlands; wvanamsterdam.com
Posts Media Videos Starter Packs
vanamsterdam.bsky.social
work with

Diantha Schipaanboord, Floor B.H. van der Zalm, René van Es, Melle Vessies, Rutger R. van de Leur, Klaske R. Siegersma, Pim van der Harst, Hester M. den Ruijter, N. Charlotte Onland-Moret, on behalf of the IMPRESS consortium
vanamsterdam.bsky.social
Discrimination remained stable across sexes; only calibration shifted in extreme scenarios when prevalence differed by sex, with similar patterns for women and men.
vanamsterdam.bsky.social
Using ~165k ECGs, we simulated sex-imbalances in representation (women-to-men ratio), outcome prevalence, and misclassification in the training data for LBBB, long QT syndrome, LVH, and physician-labeled “abnormal” ECGs.
vanamsterdam.bsky.social
Pre-print alert:
Many ECG-AI models have been developed to predict a wide range of cardiovascular outcomes. But, underrepresentation of women in cardiovascular studies raises the question: Are ECG-AI models equally predictive for women and men with sex-imbalanced training data?
Reposted by Wouter van Amsterdam
bms-aned.bsky.social
BMS-ANed Spring Meeting on Thursday, June 19
Time: 13:00–18:00 (CEST)
Location: Vredenburg 19, 3511 BB, Utrecht
Details and registration: vvsor.nl/biometrics/e...
Hans van Houwelingen award ceremony and symposium June 19th 2025 - VVSOR
This spring, the BMS-ANed organises an in-person meeting:
vvsor.nl
Reposted by Wouter van Amsterdam
oisinryan.bsky.social
Still some spots available in our summer school on all things causal inference, 7-11 July in Utrecht! Discounts for those working in universities and non-profits, and affordable accommodation offered by @utrechtuniversity.bsky.social summer school!
vanamsterdam.bsky.social
Even if you model a physical system, e.g. avg yearly temperature depending on height, and assume that temp given height is the same everywhere. If you invert it into predicting presence of mountain given temp, you’ll find varying discrimination in diff countries. Example from scholkopf’s talks
vanamsterdam.bsky.social
You’ve modeled a system with no meaningful variation across environments. The model may be reliable in the tested environments but you haven’t shown robustness against variation in distributions as you haven’t observed any
vanamsterdam.bsky.social
if the distribution of outcome given features remains the same (Y|X), calibration is preserved. If both are the same, the environments were not meaningfully different to begin with!

a more lengthy explanation is in this blog post: wvanamsterdam.com/posts/250425...
wvanamsterdam.com
vanamsterdam.bsky.social
as promised (so all of you can breathe normally again), here's my TLDR answer:

Environments must differ with respect to something. If the distribution of features given outcome remains the same (X|Y), discrimination is preserved;
vanamsterdam.bsky.social
Which is stronger evidence for robustness?

When evaluating predictive performance of one model in several different environments (e.g. regions / hospitals):

A. stable discrimination (AUC) and calibration in all environments
B. stable discrimination, varying calibration

vote with 👍=A; ❤️=B
vanamsterdam.bsky.social
ask chatGPT o3 this before submitting your next paper to, I got ~10 usable comments out of it:

you're a reviewer for <journal>; review the attached paper when you're either:
Reposted by Wouter van Amsterdam
gelovennan.bsky.social
Vacancy for a postdoc position.

Improve the transparency of decision support algorithms by figuring out how we can quantify and communicate uncertainty in individual causal predictions.

With Marleen Kunneman, Daniala Weir and me.
Three more days to apply 👇

www.lumc.nl/en/about-lum...
Postdoc Biomedical Data Scientist / Biostatistician | LUMC
In this postdoc position at LUMC, you will work on groundbreaking research that enhances the transparency and trustworthiness of decision support algorithms in healthcare. This position allows you to ...
www.lumc.nl
vanamsterdam.bsky.social
Building in the physics is one way to potentially get the right causal mechanisms

In sofar as the model is trained on real world patient data, you'll still have to ensure no biases e.g. related to confounding creep in
vanamsterdam.bsky.social
Digital twins are useful insofar as they reflect causal mechanisms

Don't think a generative model ('digital twin') can inform treatment decisions just because it procudes different outputs when you give it different inputs. Doesn't matter if it's 'AI' or not.
vanamsterdam.bsky.social
saliency maps are the new table 2 fallacy
vanamsterdam.bsky.social
Not sure about overfitting, results seemed robust to 5-site cross validation.

It just learns correlations, what's wrong with that? The words 'confounders' and 'bias' make it sound they expected the model to yield some causal understanding. Maybe these heatmaps are the new table 2 fallacy