David Kelley
@drkbio.bsky.social
140 followers 57 following 41 posts
Making sophisticated guesses at how DNA will behave.
Posts Media Videos Starter Packs
drkbio.bsky.social
We’ve done some experiments, but the metrics aren’t conclusive, so choose your own adventure! We’ve released these models open source, open weight for all to use. github.com/calico/borzo...
borzoi-paper/extensions/prime at main · calico/borzoi-paper
Analyses related to the Borzoi paper. Contribute to calico/borzoi-paper development by creating an account on GitHub.
github.com
drkbio.bsky.social
We hypothesized that training with cell-type-specific and 3' data might make these models particularly effective for transfer to datasets with similar properties.
drkbio.bsky.social
Transfer learning has emerged as a key application for multitask sequence models like these. For more, check out another recent paper from Han Yuan, whose analysis explores various transfer strategies and shows how powerful this approach can be. www.biorxiv.org/content/10.1...
Parameter-Efficient Fine-Tuning of a Supervised Regulatory Sequence Model
DNA sequence deep learning models accurately predict epigenetic and transcriptional profiles, enabling analysis of gene regulation and genetic variant effects. While large-scale training models like E...
www.biorxiv.org
drkbio.bsky.social
Hence the name: Borzoi Prime to emphasize their 3’ expertise!
drkbio.bsky.social
Indeed, he discovered the new models better predict alternative polyadenylation and QTL variants that affect where transcripts get cleaved and polyadenylated. This key regulatory layer influences cell type-specific protein production.
drkbio.bsky.social
Drawing on his expertise and interest in isoform regulation, Johannes hypothesized that single-cell RNA-seq’s 3’ sequencing protocols might reveal additional capabilities in these models.
drkbio.bsky.social
Using single cell eQTL studies, he evaluated the cell type specific variant effect predictions and found good concordance.
drkbio.bsky.social
As cell-type-specific applications emerged, Johannes Linder took a fresh look.
drkbio.bsky.social
We trained these models in early 2023 (which is why they’re algorithmically similar to the originals), but initial metrics were underwhelming, so we shelved them.
drkbio.bsky.social
Side note—want your amazing data included in future training runs of open source, open weight models? Make and release BigWig tracks!
drkbio.bsky.social
We curated several cell atlas collections to produce pseudobulk coverage tracks. Thank you to the CZI Tabula projects and the BICCN Brain Cell Atlas for making this possible!
drkbio.bsky.social
A limitation of the first Borzoi training run was the absence of cell type specific RNA-seq tracks; most are heterogeneous bulk samples.
drkbio.bsky.social
Alongside the manuscript and analysis, we released Borzoi predictions for 19.5 million common and low-frequency UK Biobank variants. Code for scoring additional variants with Borzoi is available here: github.com/calico/baske...
drkbio.bsky.social
Moving forward, we suspect there are further improvements available. The Borzoi predictions cover most body tissues, but they aren’t yet zoomed into specific cell types. Alternative nonlinear heritability models may usurp S-LDSC for fitting variant priors.
drkbio.bsky.social
Generally, we found that Borzoi predictions improve fine-mapping clarity and gene prioritization. We’re using Sniff to better analyze aging-related trait GWAS at Calico.
drkbio.bsky.social
In our paper, we used Borzoi (our latest regulatory sequence activity predictor) for this approach, calling the overall workflow Sniff (a quintessential pastime for all working hounds!)
drkbio.bsky.social
We can use these importance scores as priors in Bayesian fine-mapping methods like SuSiE. Variants that both look functional AND show statistical association get prioritized.
drkbio.bsky.social
The S-LDSC and PolyFun frameworks address these questions by asking: are variants with specific functional signatures more likely to drive trait associations? This lets us score each variant’s predicted importance for the trait.
drkbio.bsky.social
Can we learn which aspects of these fingerprints matter for each disease? Does a GWAS provide enough signal to figure out if liver function matters more than brain function for cholesterol levels?
drkbio.bsky.social
Promisingly, machine learning approaches to learn sequence to regulatory function are advancing and now capable of usefully predicting how genetic variants perturb gene regulation across the body. Think of it as creating a detailed “functional fingerprint” for each variant.
drkbio.bsky.social
However, gene regulation is complicated, varying across cell types of the body and responding to their environment. So predicting variant effects requires understanding this rich, multidimensional landscape.
drkbio.bsky.social
To affect a trait, a variant must first change how genes work—either by altering proteins or gene regulation. If we can predict these functional effects, we can be smarter about which variants to focus on.
drkbio.bsky.social
Genetic association studies are incredibly powerful for finding DNA regions linked to traits, but they often implicate dozens of variants due to linkage disequilibrium. Which one is the real culprit?