Lightnews — Scholar-powered news

David Selby @davidselby.bsky.social · Sep 5

New preprint: how many patients could we save with LLM priors? Exploring the effect of eliciting informative priors for Bayesian clinical trials. arxiv.org/abs/2509.04250

How many patients could we save with LLM priors?

Imagine a world where clinical trials need far fewer patients to achieve the same statistical power, thanks to the knowledge encoded in large language models (LLMs). We present a novel framework for h...

arxiv.org

1

David Selby @davidselby.bsky.social · Aug 19

📊📉📈 Better data visualizations with AI: can LLMs provide constructive critiques on existing charts? We explore how generative AI can automate #MakeoverMonday -type exercises, suggesting improvements to existing charts.

📄 New preprint + benchmark dataset 💽

arxiv.org/abs/2508.05637

Automated Visualization Makeovers with LLMs

Making a good graphic that accurately and efficiently conveys the desired message to the audience is both an art and a science, typically not taught in the data science curriculum. Visualisation makeo...

arxiv.org

David Selby @davidselby.bsky.social · Aug 5

🧬BioDisco, an open-source biomedical hypothesis generator, uses agentic LLMs, knowledge graphs and literature search, with an iterative self-evaluation loop to discover novel relations, significantly outperforming other architectures.

Preprint: arxiv.org/abs/2508.01285

BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation

Identifying novel hypotheses is essential to scientific research, yet this process risks being overwhelmed by the sheer volume and complexity of available information. Existing automated methods often...

arxiv.org

David Selby @davidselby.bsky.social · Jul 29

New: unofficial @quarto.org template for the upcoming @realaaai.bsky.social 2026 conference. Write your submission in Markdown with reproducible, inline computations!

github.com/Selbosh/aaai...

GitHub - Selbosh/aaai2026-quarto: Unofficial Quarto template for the AAAI-2026 Conference

Unofficial Quarto template for the AAAI-2026 Conference - Selbosh/aaai2026-quarto

github.com

David Selby @davidselby.bsky.social · Jul 21

What is a "Visible Neural Network"? It's a new kind of deep learning model for multi-omics, where prior knowledge and interpretability are baked into the architecture.

📄 We reviewed dozens of models, datasets & applications, and call for better tools/benchmarks:

www.frontiersin.org/journals/art...

Frontiers | Visible neural networks for multi-omics integration: a critical review

BackgroundBiomarker discovery and drug response prediction are central to personalized medicine, driving demand for predictive models that also offer biologi...

www.frontiersin.org

1 4

Reposted by David Selby

cwcyau.bsky.social @cwcyau.bsky.social · Mar 6

Health Research From Home Hackathon 2025 |
This hackathon is being held by Health Research From Home Partnership led by the @OfficialUoM. Register your interest now: health-research-from-home.github.io/DataAnalysis...

Health Research From Home Hackathon 2025

7-9 May 2025

health-research-from-home.github.io

1

David Selby @davidselby.bsky.social · Mar 17

Just published: 'Had enough of experts? Quantitative retrieval from large language models'

Can LLMs, having read the scientific literature, offer us useful numerical info to help fill in missing data and fit statistical models, like a real human expert? We investigate:

doi.org/10.1002/sta4...

Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models

Large language models (LLMs) have been extensively studied for their ability to generate convincing natural language sequences; however, their utility for quantitative information retrieval is less w...

doi.org

1

David Selby @davidselby.bsky.social · Mar 13

New blog post: on all the English I have had to learn since moving to Germany 🇬🇧 🇩🇪

selbydavid.com/2025/03/13/d...

Learning to Denglisch

At the railway station, a lost-looking US soldier asked me if I spoke English. Do I? At times it feels like it, but the Germans keep me guessing. Since moving to Germany, I have been continually teste...

selbydavid.com

1

David Selby @davidselby.bsky.social · Mar 6

New blog post: Alternatives to @overleaf.com for #rstats, reproducible writing and collaboration

selbydavid.com/2025/03/04/o...

selbydavid.com

1 4

Reposted by David Selby

Nature Reviews Genetics @natrevgenet.nature.com · Mar 4

New online! Beyond the black box with biologically informed neural networks

Beyond the black box with biologically informed neural networks

Nature Reviews Genetics, Published online: 04 March 2025; doi:10.1038/s41576-025-00826-1Biologically informed neural networks promise to lead to more explainable, data-driven discoveries in genomics, drug development and precision medicine. Selby et al.…

www.nature.com

1 8

David Selby @davidselby.bsky.social · Mar 4

Thrilled to share our latest publication in @natrevgenet.bsky.social. We explore how deep learning models infused with prior knowledge—biologically-informed neural networks or BINNs—offer better predictive accuracy and interpretability in multi-omics data analysis. www.nature.com/articles/s41...

Beyond the black box with biologically informed neural networks - Nature Reviews Genetics

Biologically informed neural networks promise to lead to more explainable, data-driven discoveries in genomics, drug development and precision medicine. Selby et al. highlight emerging opportunities, ...

www.nature.com

5 12

David Selby @davidselby.bsky.social · Feb 20

Paper just accepted in Stat!

Can LLMs replace experts as sources of numerical information, such as Bayesian prior distributions for statistical models, or filling in missing values in tabular datasets for ML tasks?

We evaluate on applications across different fields.

arxiv.org/abs/2402.07770

Had enough of experts? Quantitative knowledge retrieval from large language models

Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less w...

arxiv.org

David Selby @davidselby.bsky.social · Jan 10

How might one redesign this data visualization to avoid using much-maligned 'plunger plots'?

#visualisation

From www.nature.com/articles/s41...

Plot showing stacked bar plots with error bars to visualize normalized ROC AUC of different machine learning models, before and after fine-tuning for four hours. The main insight is that TabPFN, a tabular foundation model, outperforms tree-based methods such as random forests and XGBoost.

1

David Selby @davidselby.bsky.social · Dec 20

Pleased to present our poster at #NeurIPS2024 workshop on Bayesian Decisionmaking and Uncertainty! 🎉 Our work explores using large language models for eliciting expert-informed Bayesian priors. Elicited lots of discussion with the ML community too! Check it out: neurips.cc/virtual/2024...

Sebastian Vollmer, David Selby and Yuichiro Iwashita present their poster, "Had enough of experts? Bayesian prior elicitation from Large Language Models" at the NeurIPS Bayesian Decisionmaking and Uncertainty Workshop 2024 in Vancouver, Canada.

David Selby @davidselby.bsky.social · Dec 20

Excited to share our new preprint: Visible neural networks for multi-omics integration: a critical review! 🌟 We systematically analyse 86 studies on biologically informed neural networks (BINNs/VNNs), highlighting trends, challenges, interesting ideas & opportunities. www.biorxiv.org/content/10.1...

Visible neural networks for multi-omics integration: a critical review

Biomarker discovery and drug response prediction is central to personalized medicine, driving demand for predictive models that also offer biological insights. Biologically informed neural networks (B...

www.biorxiv.org

4