Justin Silverman
@inschool4life.bsky.social
160 followers 52 following 53 posts
Assistant Professor of Informatics, Statistics, and Medicine at Penn State University https://jsilve24.github.io/SilvermanLab/
Posts Media Videos Starter Packs
inschool4life.bsky.social
Our analysis is the largest to date, we used our newly created MUTT database which consists of over 15,000 samples, from over 30 studies, each with paired sequence counts and microbial load measurements.

Core takeaway, its important to accurately model uncertainty and error.

@ggloor.bsky.social
inschool4life.bsky.social
New Paper!

Machine learning models that attempt to predict microbial load collapse outside of their training context with an R2<0!

In contrast, our Bayesian Partially Identified Models embrace uncertainty in unmeasured microbial load and consistently outpreform.

www.biorxiv.org/content/10.1...
Uncertainty Modeling Outperforms Machine Learning for Microbiome Data Analysis
Microbiome sequencing measures relative rather than absolute abundances, providing no direct information about total microbial load. Normalization methods attempt to compensate, but rely on strong, of...
www.biorxiv.org
Reposted by Justin Silverman
ggloor.bsky.social
Excited to summarize our most recent paper, "Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2" on controlling the false discovery rate (FDR) when analyzing high throughput sequencing (HTS) data. This has been an open problem since the dawn of HTS.
inschool4life.bsky.social
New preprint!

PCR bias doesn’t just distort relative abundances—it reshapes microbiome ecological analyses.

We show that commonly used diversity metrics (e.g., UniFrac or Shannon) are not robust to amplification bias, while perturbation-invariant alternatives are.

www.biorxiv.org/content/10.1...
PCR Bias Impacts Microbiome Ecological Analyses
Polymerase Chain Reaction (PCR) is a critical step in amplicon-based microbial community profiling, allowing the selective amplification of marker genes such as 16S rRNA from environmental or host-ass...
www.biorxiv.org
inschool4life.bsky.social
Thanks! We think so. I think this will help enhance the cost-effectiveness and efficiency of biomarker discovery, our methods grealy enhance positive predictive value of analyses -reducing false signals that cost money to validate and detecting true signals that would otherwise be missed.
Reposted by Justin Silverman
tkorem.bsky.social
Our paper explaining why Gihawi et al. failed to prove an error in the normalization used by the 2020 cancer #microbiome analysis now out as a Matters Arising in @asm.org #mSystems (w/ @george-austin.bsky.social) 🖥️ 🧬

Thread explaining the key points below.

journals.asm.org/doi/10.1128/...
inschool4life.bsky.social
We are also developing a new ALDEx3 library that is about 1000 times faster than ALDEx2 with a streamlined user interface (although its still in beta I am using it regularly)
github.com/jsilve24/ALD...
GitHub - jsilve24/ALDEx3
Contribute to jsilve24/ALDEx3 development by creating an account on GitHub.
github.com
inschool4life.bsky.social
To facilitate adoption, we've update the popular ALDEx2 software package on Bioconductor to support scale model analysis.
GitHub - jsilve24/ALDEx3
Contribute to jsilve24/ALDEx3 development by creating an account on GitHub.
github.com
inschool4life.bsky.social
In real data analysesd simulation studies we find our methods often lead to dramatic decreases in false positves (FDR can drop from >75% to a nominal 5%) while simultaneously maintaining or improving statistical power.
inschool4life.bsky.social
We present scale mdoels, which extend normalization by modeling potential errors in these assumptions (reducing false positives), or by allowing researchers to make more biologically plausible assumptions (reducing false negatives).
inschool4life.bsky.social
Traditional normalization methods often make implicit assumptions abou thte biological system's scale, such as microbial load or total RNA content. These assumptions can lead to false positives and negatives.
Reposted by Justin Silverman
vscooper.micropopbio.org
🚨PA colleagues:

"Senator Fetterman wants to hear from you about how the federal funding freeze is affecting Pennsylvania."

"If your project has been impacted, please fill out our constituent impact form:" forms.office.com/g/mFv2JAPxpC

Get out your Other Support and share that info!
Microsoft Forms
forms.office.com
Reposted by Justin Silverman
inschool4life.bsky.social
Our whole point is that there is information missing from the data -- overcoming that requires additional thought and a careful consideration of what assumptions are biologically plausible in a particular study. e.g., studying antibiotics Microbial load likely decreases post-treatment etc...
inschool4life.bsky.social
An important point if you look to benchmark our methods. Normalizations are kinda "point and click", no additional thought needed by user. We can generalize normalilzations and it helps reduce false positives. But the real advances -- when we see the massive FN/FP decreases is when care is taken.
inschool4life.bsky.social
Love it! Will deffinetly check that out as it would be super helpful for us. An yes, our methods are not yet common (thought they are available in ALDEx2 now!). Reviewers have been resistant as they love normalizations and our methods seem foreign.
inschool4life.bsky.social
Non-linear additive regression (using scalable Bayesian Multinomial Logistic Normal models) is now available in fido (on CRAN)!
neurips.cc/virtual/2024...

Also includes extreemly fast marginal likelihood estimation for hyperparameter tuning.
cran.r-project.org/web/packages...
NeurIPS Efficient Bayesian Additive Regression Models For Microbiome and Gene Expression StudiesNeurIPS 2024
neurips.cc
inschool4life.bsky.social
This builds on our prior work
jmlr.org/papers/v23/1...
where we introduced the CU Sampler for Bayesian MLN models. This is even 1-2 orders of magnitude faster than those methods while still be extreemly accurate.
jmlr.org
inschool4life.bsky.social
New paper was recently accepted to AIStats

arxiv.org/abs/2410.05548

Flexible Multinomial Logistic-Normal time series models (state space models) that scale to extreemly large datasets. Inference is 5-6 orders of magnitude faster than alternatives. R package will soon be released.
Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models
Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative fr...
arxiv.org