Markus List
@itisalist.bsky.social
290 followers 530 following 27 posts
Assistant Professor for Data Science in Systems Biology at the Technical University of Munich (http://daisybio.de). Mostly posting about bioinformatics and systems / network biology research. Views are my own. he / him.
Posts Media Videos Starter Packs
itisalist.bsky.social
We're getting ready for Maustag 2025 www.mdsi.tum.de/en/mdsi/late... at the @tum.de Munich Data Science Institute, where we @daisybio.de plan to show children why AI and bioinformatics are important for studying the code of life. I dare say that our first practice session went quite well :-)
Reposted by Markus List
stephenturner.us
Comprehensive benchmark of differential transcript usage analysis for bulk and single-cell RNA sequencing academic.oup.com/nargab/artic... 🧬🖥️🧪
itisalist.bsky.social
And for those who enjoyed our tutorial on network medicine and drug repurposing: if you are spontaneous, consider joining us for the RExPO conference organized by @repo4eu.bsky.social repo4.eu/rexpo25/ later this month.
itisalist.bsky.social
I enjoyed visiting the BC2 conference: cool talks,
great networking, beautiful location. Thanks to the organizers at @sib.swiss I appreciated especially today's session on startups in bioinformatics, that was insightful.
Rhein in Basel
itisalist.bsky.social
En route to Basel for the @sib.swiss #BC2 conference. We're contributing to a workshop on 🕸️ network medicine and 💊 drug repurposing, with tools developed in @repo4eu.bsky.social incl. drugst.one for which we've just released the DREAM extension doi.org/10.58647/DRU... simplifying expert annotation.
Park bench in a misty morning
Reposted by Markus List
repo4eu.bsky.social
#RExPO25 Speakers | S7: AI/ML in #SystemsMedicine & #DrugRepurposing

🟣 @itisalist.bsky.social & Lisa Spindler (@daisybio.de)

🟣 Jan Baumbach & Fernando Delgado Chavez (@cosybio-uhh.bsky.social)

Check the full conference agenda ⤵️
repo4.eu/rexpo25/agen...

🇪🇺 #EUfunded #DrugRepurposing
RExPO25 Session in focus: AI/ML in systems medicine and drug repurposing.
Reposted by Markus List
grst.bsky.social
Our benchmark + guidelines for atlas-level differential gene expression of single cells is online:

academic.oup.com/bib/article/...

Bottom line: Use pseudobulk + DESeq2 in simple and pseudobulk + DREAM in more complex settings.

Collab w/ @leonhafner.bsky.social @itisalist.bsky.social
itisalist.bsky.social
A flowery surprise at our @tum.de campus Freising yesterday. Congratulations to all students who celebrated their graduation.
itisalist.bsky.social
Yes indeed, are you here as well :-)
itisalist.bsky.social
En route to visit the @cosybio-uhh.bsky.social lab in Hamburg who are kindly organizing the latest @repo4eu.bsky.social WP2 workshop. Looking forward to discussing the refinement of our computational pipelines for drug repurposing. Hope the train will not be too much delayed...
itisalist.bsky.social
🧬🖥️Drug response prediction is a machine learning challenge with immense potential for precision medicine. Our latest preprint introduces DrEval, a comprehensive benchmarking framework to evaluate state-of-the-art methods, uncover widespread issues, and guide the development of more robust models.
judith-bernett.bsky.social
🧬🖥️So excited to show you the outcome of @pascivers.bsky.social and my latest project: "From Hype to Health Check: Critical Evaluation of Drug Response Prediction Models with DrEval" doi.org/10.1101/2025.05.26.655288, published with M. Picciani, M. Wilhelm, K. Baum & @itisalist.bsky.social.
🧵1/10
Overview of the DrEval framework. Via input options, implemented state-of-the-art models can be compared against baselines of varying complexity. We address obstacles to progress in the field at each point in our pipeline: Our framework is available on PyPI and nf-core and we follow FAIReR standards for optimal reproducibility. DrEval is easily extendable as demonstrated here with a pseudocode implementation of a proteomics-based random forest. Custom viability data can be preprocessed with CurveCurator, leading to more consistent data and metrics. DrEval supports five widely used datasets with application-aware train/test splits that enable detecting weak generalization. Models are free to use provided or custom cell line– and drug features. The pipeline supports randomization-based ablation studies and performs robust hyperparameter tuning for all models. Evaluation is conducted using meaningful, bias-resistant metrics to avoid inflated results from artifacts such as Simpson’s paradox. All results are compiled into an interactive HTML report. Created in https://BioRender.com.
itisalist.bsky.social
Had a great time in Innsbruck. The scenery here with the mountains in the background is always impressive, even when the weather is not so nice. Thanks @francescafinotello.bsky.social for inviting me!
itisalist.bsky.social
For those of you who are not in Innsbruck to see me today, you might instead listen to @judith-bernett.bsky.social at the @iscb.bsky.social NetBio webinar!

🔗 Attend at ISCB Nucleus: iscb.junolive.co

📍 If you’re not an ISCB member, register for access to ISCB Nucleus: lnkd.in/gMhrKGJz
webinar description
itisalist.bsky.social
I believe the seminar is offline only. My focus is not data privacy (also very important!), but on inflated performance estimates due to methods learning illegitimate shortcuts. If you'd like to know more, we have written a perspective article with guiding questions: www.nature.com/articles/s41...
Guiding questions to avoid data leakage in biological machine learning applications - Nature Methods
This Perspective discusses the issue of data leakage in machine learning based models and presents seven questions designed to identify and avoid the problems resulting from data leakage.
www.nature.com
itisalist.bsky.social
Traveling to Innsbruck by invitation of @francescafinotello.bsky.social to talk about data leakage, a widespread issue in biomedical machine learning applications. I'll talk about challenges in protein-protein interaction (doi.org/10.1093/bib/...) and drug response prediction (upcoming preprint!).
Colorful liquid flowing from one bottle into another, as an illustration for (data) leakage.
Reposted by Markus List
daisybio.de
Weihenstephan Bioinformatics Symposium 2025: More than 75 scientists from Bavaria and the world came together to share talks and create new synergies. It was great to host this event, for those who missed it: The next edition is planned for 2027 😉
Reposted by Markus List
daisybio.de
Greetings from Palermo! @en-coding.bsky.social, @a-dietrich.bsky.social, @itisalist.bsky.social, Serafina Reif, Nico Trummer & Kamila Kwiecien are united here at the occasion of the MyeInfoBank COSTAction: Converting Molecular Profiles of Myeloid Cells into Biomarkers for Inflammation and Cancer
itisalist.bsky.social
It was a pleasure having you, Ryu! Thanks so much for your talk and visit.
Reposted by Markus List
daisybio.de
📍Welcome to our presentation round of the DaiSyBio members! Every week, you will get to know someone from our lab.
The start is done by @itisalist.bsky.social who heads the group. Markus joined TUM in 2018 and became a W2 tenure track associate professor in 2023. More members are about to follow! 📍
itisalist.bsky.social
We followed up on our previous work, where we showed that predicting protein-protein interactions from sequence alone yields random performance when data leakage is accounted for. In this new preprint, we show that ESM2 embeddings raise the bar to 0.65 accuracy independent of the model architecture.
judith-bernett.bsky.social
🧬🖥️ Proud to share our latest update on PPI predictions – "Deep learning models for unbiased sequence-based PPI prediction plateau at an accuracy of 0.65" doi.org/10.1101/2025... by T. Reim, published with @itisalist.bsky.social @dbblumenthal.bsky.social, A. Hartebrodt, and me. What did we do? 1/15 🧵
Graphical summary of the analyses done in the publication displayed on six panels a-f. (a) We computed ESM-2 embeddings of different sizes for the proteins of our data-leakage-free PPI dataset. The per-token embeddings have variable sizes depending on the protein length, while the per-protein embeddings have a fixed size by applying dimension-wise averaging. (b) We tested two models operating on the per-protein embeddings—a baseline random forest classifier and adaptions of the previously published Richoux model. Five models operated on the per-token embeddings: a 2d-baseline, the 2d-Selfattention and 2d-Crossattention models (which expanded the 2d-baseline through a Transformer encoder), and adaptations of the published models D-SCRIPT and TUnA. (c) Hyperparameter tuning gave us insight into the influence of each tunable parameter on the classification performance. (d) No model surpassed an accuracy of 0.65. The more advanced models had similar accuracies, leading us to believe that the information content of the ESM-2 embedding has more influence than the model architecture. Per-token models did not consistently outperform per-protein models. (e) We applied various modifications to test their influence: different embedding sizes, inserting a Transformer encoder into different positions, adding spectral normalization after the linear layers, self- vs. cross-attention, and removing the padding. (f) Finally, we compared the implicitly predicted distance maps of the 2d-baseline, 2d-Selfattention, 2dCrossattention, and D-SCRIPT-ESM-2 to real distance maps computed from PDB structures.
itisalist.bsky.social
Hey young investigators, use this opportunity to learn about the COST action MyeInfoBank and listen to a nice talk by our fabulous @a-dietrich.bsky.social
daisybio.de
📣 Shoutout to all young molecular biologists, bioinformaticians and immunobiologists: Don't miss our lab member Alexander Dietrich giving a talk on cell-type deconvolution next THU at 2pm 🕑
It's free, it's virtual, it's new! Register here ⤵️
bit.ly/3VI6Dtx