David Selby
@davidselby.bsky.social
68 followers 190 following 13 posts
Data science researcher working on applications of machine learning in health at DFKI, getting the most out of small data. Reproducible #Rstats evangelist and unofficial British cultural ambassador to Rhineland-Palatinate 🇩🇪 https://selbydavid.com
Posts Media Videos Starter Packs
davidselby.bsky.social
📊📉📈 Better data visualizations with AI: can LLMs provide constructive critiques on existing charts? We explore how generative AI can automate #MakeoverMonday -type exercises, suggesting improvements to existing charts.

📄 New preprint + benchmark dataset 💽

arxiv.org/abs/2508.05637
Automated Visualization Makeovers with LLMs
Making a good graphic that accurately and efficiently conveys the desired message to the audience is both an art and a science, typically not taught in the data science curriculum. Visualisation makeo...
arxiv.org
davidselby.bsky.social
What is a "Visible Neural Network"? It's a new kind of deep learning model for multi-omics, where prior knowledge and interpretability are baked into the architecture.

📄 We reviewed dozens of models, datasets & applications, and call for better tools/benchmarks:

www.frontiersin.org/journals/art...
Frontiers | Visible neural networks for multi-omics integration: a critical review
BackgroundBiomarker discovery and drug response prediction are central to personalized medicine, driving demand for predictive models that also offer biologi...
www.frontiersin.org
Reposted by David Selby
cwcyau.bsky.social
Health Research From Home Hackathon 2025 |
This hackathon is being held by Health Research From Home Partnership led by the @OfficialUoM. Register your interest now: health-research-from-home.github.io/DataAnalysis...
Health Research From Home Hackathon 2025
7-9 May 2025
health-research-from-home.github.io
davidselby.bsky.social
Just published: 'Had enough of experts? Quantitative retrieval from large language models'

Can LLMs, having read the scientific literature, offer us useful numerical info to help fill in missing data and fit statistical models, like a real human expert? We investigate:

doi.org/10.1002/sta4...
Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models
Large language models (LLMs) have been extensively studied for their ability to generate convincing natural language sequences; however, their utility for quantitative information retrieval is less w...
doi.org
davidselby.bsky.social
New blog post: Alternatives to @overleaf.com for #rstats, reproducible writing and collaboration

selbydavid.com/2025/03/04/o...
selbydavid.com
davidselby.bsky.social
Paper just accepted in Stat!

Can LLMs replace experts as sources of numerical information, such as Bayesian prior distributions for statistical models, or filling in missing values in tabular datasets for ML tasks?

We evaluate on applications across different fields.

arxiv.org/abs/2402.07770
Had enough of experts? Quantitative knowledge retrieval from large language models
Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less w...
arxiv.org
davidselby.bsky.social
How might one redesign this data visualization to avoid using much-maligned 'plunger plots'?

#visualisation

From www.nature.com/articles/s41...
Plot showing stacked bar plots with error bars to visualize normalized ROC AUC of different machine learning models, before and after fine-tuning for four hours. The main insight is that TabPFN, a tabular foundation model, outperforms tree-based methods such as random forests and XGBoost.
davidselby.bsky.social
Pleased to present our poster at #NeurIPS2024 workshop on Bayesian Decisionmaking and Uncertainty! 🎉 Our work explores using large language models for eliciting expert-informed Bayesian priors. Elicited lots of discussion with the ML community too! Check it out: neurips.cc/virtual/2024...
Sebastian Vollmer, David Selby and Yuichiro Iwashita present their poster, "Had enough of experts? Bayesian prior elicitation from Large Language Models" at the NeurIPS Bayesian Decisionmaking and Uncertainty Workshop 2024 in Vancouver, Canada. Sebastian Vollmer, David Selby and Yuichiro Iwashita present their poster, "Had enough of experts? Bayesian prior elicitation from Large Language Models" at the NeurIPS Bayesian Decisionmaking and Uncertainty Workshop 2024 in Vancouver, Canada.
davidselby.bsky.social
Excited to share our new preprint: Visible neural networks for multi-omics integration: a critical review! 🌟 We systematically analyse 86 studies on biologically informed neural networks (BINNs/VNNs), highlighting trends, challenges, interesting ideas & opportunities. www.biorxiv.org/content/10.1...
Visible neural networks for multi-omics integration: a critical review
Biomarker discovery and drug response prediction is central to personalized medicine, driving demand for predictive models that also offer biological insights. Biologically informed neural networks (B...
www.biorxiv.org