Lightnews — Scholar-powered news

Justin Silverman @inschool4life.bsky.social · 8d

@cellpress.bsky.social
We submitted a presubmission inquiry on 9/12 and followed up again on 9/24. We have not heard a response. Is this typical? Could you please help us, we are trying to confirm how we should submit, as a matters arising or as a research article
www.biorxiv.org/content/10.1...

Uncertainty Modeling Outperforms Machine Learning for Microbiome Data Analysis

Microbiome sequencing measures relative rather than absolute abundances, providing no direct information about total microbial load. Normalization methods attempt to compensate, but rely on strong, of...

www.biorxiv.org

Justin Silverman @inschool4life.bsky.social · 21d

Our analysis is the largest to date, we used our newly created MUTT database which consists of over 15,000 samples, from over 30 studies, each with paired sequence counts and microbial load measurements.

Core takeaway, its important to accurately model uncertainty and error.

@ggloor.bsky.social

1

Justin Silverman @inschool4life.bsky.social · 21d

New Paper!

Machine learning models that attempt to predict microbial load collapse outside of their training context with an R2<0!

In contrast, our Bayesian Partially Identified Models embrace uncertainty in unmeasured microbial load and consistently outpreform.

www.biorxiv.org/content/10.1...

Uncertainty Modeling Outperforms Machine Learning for Microbiome Data Analysis

Microbiome sequencing measures relative rather than absolute abundances, providing no direct information about total microbial load. Normalization methods attempt to compensate, but rely on strong, of...

www.biorxiv.org

1 3 7

Reposted by Justin Silverman

Greg Gloor @ggloor.bsky.social · Aug 21

Excited to summarize our most recent paper, "Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2" on controlling the false discovery rate (FDR) when analyzing high throughput sequencing (HTS) data. This has been an open problem since the dawn of HTS.

1 3 6

Justin Silverman @inschool4life.bsky.social · Aug 1

New preprint!

PCR bias doesn’t just distort relative abundances—it reshapes microbiome ecological analyses.

We show that commonly used diversity metrics (e.g., UniFrac or Shannon) are not robust to amplification bias, while perturbation-invariant alternatives are.

www.biorxiv.org/content/10.1...

PCR Bias Impacts Microbiome Ecological Analyses

Polymerase Chain Reaction (PCR) is a critical step in amplicon-based microbial community profiling, allowing the selective amplification of marker genes such as 16S rRNA from environmental or host-ass...

www.biorxiv.org

2

Justin Silverman @inschool4life.bsky.social · Aug 1

Thanks! We think so. I think this will help enhance the cost-effectiveness and efficiency of biomarker discovery, our methods grealy enhance positive predictive value of analyses -reducing false signals that cost money to validate and detecting true signals that would otherwise be missed.

1

Justin Silverman @inschool4life.bsky.social · Jul 1

New Paper:

We relax normalizations to produce statistical methods for bioinformatics that are much more robust and powerful. We see FDR drop from 45% to 5% with increases in power!

This adds to our ongoing work on Scale Reliant Inference.

link.springer.com/article/10.1...

Replacing normalizations with interval assumptions enhances differential expression and differential abundance analyses - BMC Bioinformatics

Background Methods for differential expression and differential abundance analysis often rely on normalization to address sample-to-sample variation in sequencing depth. However, normalizations imply ...

link.springer.com

1 3

Reposted by Justin Silverman

Tal Korem @tkorem.bsky.social · May 2

Our paper explaining why Gihawi et al. failed to prove an error in the normalization used by the 2020 cancer #microbiome analysis now out as a Matters Arising in @asm.org #mSystems (w/ @george-austin.bsky.social) 🖥️ 🧬

Thread explaining the key points below.

journals.asm.org/doi/10.1128/...

3 7

Justin Silverman @inschool4life.bsky.social · May 22

@ggloor.bsky.social

Justin Silverman @inschool4life.bsky.social · May 22

Scale models are not just heuristics but have a rich theoretical foundation based on Bayesian Partially Identified Models. That theory is presented here:

arxiv.org/abs/2201.03616

Scale Reliant Inference

Many scientific fields, including human gut microbiome science, collect multivariate count data where the sum of the counts is unrelated to the scale of the underlying system being measured (e.g., tot...

arxiv.org

1 1

Justin Silverman @inschool4life.bsky.social · May 22

We are also developing a new ALDEx3 library that is about 1000 times faster than ALDEx2 with a streamlined user interface (although its still in beta I am using it regularly)
github.com/jsilve24/ALD...

GitHub - jsilve24/ALDEx3

Contribute to jsilve24/ALDEx3 development by creating an account on GitHub.

github.com

1 1

Justin Silverman @inschool4life.bsky.social · May 22

To facilitate adoption, we've update the popular ALDEx2 software package on Bioconductor to support scale model analysis.

GitHub - jsilve24/ALDEx3

Contribute to jsilve24/ALDEx3 development by creating an account on GitHub.

github.com

1 1

Justin Silverman @inschool4life.bsky.social · May 22

In real data analysesd simulation studies we find our methods often lead to dramatic decreases in false positves (FDR can drop from >75% to a nominal 5%) while simultaneously maintaining or improving statistical power.

1 1

Justin Silverman @inschool4life.bsky.social · May 22

We present scale mdoels, which extend normalization by modeling potential errors in these assumptions (reducing false positives), or by allowing researchers to make more biologically plausible assumptions (reducing false negatives).

1 1

Justin Silverman @inschool4life.bsky.social · May 22

Traditional normalization methods often make implicit assumptions abou thte biological system's scale, such as microbial load or total RNA content. These assumptions can lead to false positives and negatives.

1 2

Justin Silverman @inschool4life.bsky.social · May 22

New paper in Genome Biology!

genomebiology.biomedcentral.com/articles/10....

We introduce scale models, a generalization of normalizations that explciitly account for uncertainty in biological system scale (e.g., microbial load).

Incorporating scale uncertainty in microbiome and gene expression analysis as an extension of normalization - Genome Biology

Statistical normalizations are used in differential analyses to address sample-to-sample variation in sequencing depth. Yet normalizations make strong, implicit assumptions about the scale of biologic...

genomebiology.biomedcentral.com

2 3 8

Reposted by Justin Silverman

Vaughn Cooper @vscooper.micropopbio.org · Feb 22

🚨PA colleagues:

"Senator Fetterman wants to hear from you about how the federal funding freeze is affecting Pennsylvania."

"If your project has been impacted, please fill out our constituent impact form:" forms.office.com/g/mFv2JAPxpC

Get out your Other Support and share that info!

Microsoft Forms

forms.office.com

5 130 120

Reposted by Justin Silverman

NPR @npr.org · Feb 22

The National Institutes of Health had to stop considering new grant applications, delaying funding for research into diseases ranging from heart disease and cancer to Alzheimer's and allergies.

NIH funding freeze stalls applications on $1.5 billion in medical research funds

The National Institutes of Health had to stop considering new grant applications, delaying funding for research into diseases ranging from heart disease and cancer to Alzheimer's and allergies.

www.npr.org

160 1.5K 3.6K

Justin Silverman @inschool4life.bsky.social · Feb 19

Our whole point is that there is information missing from the data -- overcoming that requires additional thought and a careful consideration of what assumptions are biologically plausible in a particular study. e.g., studying antibiotics Microbial load likely decreases post-treatment etc...

1

Justin Silverman @inschool4life.bsky.social · Feb 19

An important point if you look to benchmark our methods. Normalizations are kinda "point and click", no additional thought needed by user. We can generalize normalilzations and it helps reduce false positives. But the real advances -- when we see the massive FN/FP decreases is when care is taken.

1

Justin Silverman @inschool4life.bsky.social · Feb 19

Love it! Will deffinetly check that out as it would be super helpful for us. An yes, our methods are not yet common (thought they are available in ALDEx2 now!). Reviewers have been resistant as they love normalizations and our methods seem foreign.

1 2

Justin Silverman @inschool4life.bsky.social · Feb 19

Non-linear additive regression (using scalable Bayesian Multinomial Logistic Normal models) is now available in fido (on CRAN)!
neurips.cc/virtual/2024...

Also includes extreemly fast marginal likelihood estimation for hyperparameter tuning.
cran.r-project.org/web/packages...

NeurIPS Efficient Bayesian Additive Regression Models For Microbiome and Gene Expression StudiesNeurIPS 2024

neurips.cc

3

Justin Silverman @inschool4life.bsky.social · Feb 19

This builds on our prior work
jmlr.org/papers/v23/1...
where we introduced the CU Sampler for Bayesian MLN models. This is even 1-2 orders of magnitude faster than those methods while still be extreemly accurate.

jmlr.org

Justin Silverman @inschool4life.bsky.social · Feb 19

New paper was recently accepted to AIStats

arxiv.org/abs/2410.05548

Flexible Multinomial Logistic-Normal time series models (state space models) that scale to extreemly large datasets. Inference is 5-6 orders of magnitude faster than alternatives. R package will soon be released.

Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models

Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative fr...

arxiv.org

1 2

Justin Silverman @inschool4life.bsky.social · Feb 19

Here is a better link to the new paper:

arxiv.org/abs/2410.05548

Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models

Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative fr...

arxiv.org