Lightnews — Scholar-powered news

borzoi-paper/extensions/prime at main · calico/borzoi-paper

David Kelley @drkbio.bsky.social · Jul 23

We’ve done some experiments, but the metrics aren’t conclusive, so choose your own adventure! We’ve released these models open source, open weight for all to use. github.com/calico/borzo...

github.com

2

David Kelley @drkbio.bsky.social · Jul 23

We hypothesized that training with cell-type-specific and 3' data might make these models particularly effective for transfer to datasets with similar properties.

Parameter-Efficient Fine-Tuning of a Supervised Regulatory Sequence Model

David Kelley @drkbio.bsky.social · Jul 23

Transfer learning has emerged as a key application for multitask sequence models like these. For more, check out another recent paper from Han Yuan, whose analysis explores various transfer strategies and shows how powerful this approach can be. www.biorxiv.org/content/10.1...

DNA sequence deep learning models accurately predict epigenetic and transcriptional profiles, enabling analysis of gene regulation and genetic variant effects. While large-scale training models like E...

www.biorxiv.org

David Kelley @drkbio.bsky.social · Jul 23

Hence the name: Borzoi Prime to emphasize their 3’ expertise!

David Kelley @drkbio.bsky.social · Jul 23

Indeed, he discovered the new models better predict alternative polyadenylation and QTL variants that affect where transcripts get cleaved and polyadenylated. This key regulatory layer influences cell type-specific protein production.

David Kelley @drkbio.bsky.social · Jul 23

Drawing on his expertise and interest in isoform regulation, Johannes hypothesized that single-cell RNA-seq’s 3’ sequencing protocols might reveal additional capabilities in these models.

David Kelley @drkbio.bsky.social · Jul 23

Using single cell eQTL studies, he evaluated the cell type specific variant effect predictions and found good concordance.

David Kelley @drkbio.bsky.social · Jul 23

As cell-type-specific applications emerged, Johannes Linder took a fresh look.

David Kelley @drkbio.bsky.social · Jul 23

We trained these models in early 2023 (which is why they’re algorithmically similar to the originals), but initial metrics were underwhelming, so we shelved them.

David Kelley @drkbio.bsky.social · Jul 23

Side note—want your amazing data included in future training runs of open source, open weight models? Make and release BigWig tracks!

David Kelley @drkbio.bsky.social · Jul 23

We curated several cell atlas collections to produce pseudobulk coverage tracks. Thank you to the CZI Tabula projects and the BICCN Brain Cell Atlas for making this possible!

David Kelley @drkbio.bsky.social · Jul 23

A limitation of the first Borzoi training run was the absence of cell type specific RNA-seq tracks; most are heterogeneous bulk samples.

Predicting cell type-specific coverage profiles from DNA sequence

David Kelley @drkbio.bsky.social · Jul 23

We’re excited to share a follow-up Borzoi training run and an analysis of the capabilities that emerged. www.biorxiv.org/content/10.1...

Predicting expression profiles from RNA-seq experiments provides a powerful approach for universal sequence-based variant effect prediction, enabling researchers to score variants that affect total ge...

www.biorxiv.org

David Kelley @drkbio.bsky.social · Jul 21

Alongside the manuscript and analysis, we released Borzoi predictions for 19.5 million common and low-frequency UK Biobank variants. Code for scoring additional variants with Borzoi is available here: github.com/calico/baske...

2

David Kelley @drkbio.bsky.social · Jul 21

Moving forward, we suspect there are further improvements available. The Borzoi predictions cover most body tissues, but they aren’t yet zoomed into specific cell types. Alternative nonlinear heritability models may usurp S-LDSC for fitting variant priors.

David Kelley @drkbio.bsky.social · Jul 21

Generally, we found that Borzoi predictions improve fine-mapping clarity and gene prioritization. We’re using Sniff to better analyze aging-related trait GWAS at Calico.

David Kelley @drkbio.bsky.social · Jul 21

In our paper, we used Borzoi (our latest regulatory sequence activity predictor) for this approach, calling the overall workflow Sniff (a quintessential pastime for all working hounds!)

1 3

David Kelley @drkbio.bsky.social · Jul 21

We can use these importance scores as priors in Bayesian fine-mapping methods like SuSiE. Variants that both look functional AND show statistical association get prioritized.

David Kelley @drkbio.bsky.social · Jul 21

The S-LDSC and PolyFun frameworks address these questions by asking: are variants with specific functional signatures more likely to drive trait associations? This lets us score each variant’s predicted importance for the trait.

David Kelley @drkbio.bsky.social · Jul 21

Can we learn which aspects of these fingerprints matter for each disease? Does a GWAS provide enough signal to figure out if liver function matters more than brain function for cholesterol levels?

David Kelley @drkbio.bsky.social · Jul 21

Promisingly, machine learning approaches to learn sequence to regulatory function are advancing and now capable of usefully predicting how genetic variants perturb gene regulation across the body. Think of it as creating a detailed “functional fingerprint” for each variant.

David Kelley @drkbio.bsky.social · Jul 21

However, gene regulation is complicated, varying across cell types of the body and responding to their environment. So predicting variant effects requires understanding this rich, multidimensional landscape.

David Kelley @drkbio.bsky.social · Jul 21

To affect a trait, a variant must first change how genes work—either by altering proteins or gene regulation. If we can predict these functional effects, we can be smarter about which variants to focus on.

David Kelley @drkbio.bsky.social · Jul 21

Genetic association studies are incredibly powerful for finding DNA regions linked to traits, but they often implicate dozens of variants due to linkage disequilibrium. Which one is the real culprit?