Lightnews — Scholar-powered news

Sara Mostafavi

@saramostafavi.bsky.social

240 followers 51 following 4 posts

VP, Genentech Associate Professor at the Allen School of Computer Science and Engineering, University of Washington (on leave)

Posts Media Videos Starter Packs

Reposted by Sara Mostafavi

Anshul Kundaje @anshulkundaje.bsky.social · Jun 19

@saramostafavi.bsky.social (@Genentech) & I (@Stanford) r excited to announce co-advised postdoc positions for candidates with deep expertise in ML for bio (especially sequence to function models, causal perturbational models & single cell models). See details below. Pls RT 1/

1 40 55

Sara Mostafavi @saramostafavi.bsky.social · Apr 17

CC: @lxsasse.bsky.social @xinmingtu.bsky.social

Sara Mostafavi @saramostafavi.bsky.social · Apr 16

Some encouraging news for cross-gene generalization of allele effects in S2F models. www.biorxiv.org/content/10.1...

Deep genomic models of allele-specific measurements

Allele-specific quantification of sequencing data, such as gene expression, allows for a causal investigation of how DNA sequence variations influence cis gene regulation. Current methods for analyzin...

www.biorxiv.org

1 7 15

Sara Mostafavi @saramostafavi.bsky.social · Mar 15

Our new pre-print, investigating a few important questions when we train S2F models on different types of MPRA datasets. Congrats to Yilun and @xinmingtu.bsky.social www.biorxiv.org/content/10.1...

Investigating Data Size, Sequence Diversity, and Model Complexity in MPRA-based Sequence-to-Function Prediction

We created the MPRA Dataset Collection (MDC), a curated resource of MPRA data from 12 studies comprising over 150 million labeled DNA subsequences. These datasets include both random and natural genom...

www.biorxiv.org

11 25

Sara Mostafavi @saramostafavi.bsky.social · Feb 23

Our new paper describing a scalable approach for training sequence-to-function models on personal genomes ("personal genome training"), includes our observations on when this works and its limitations. www.biorxiv.org/content/10.1...
Congrats: Anna, @xinmingtu.bsky.social , @lxsasse.bsky.social

A scalable approach to investigating sequence-to-expression prediction from personal genomes

A key promise of sequence-to-function (S2F) models is their ability to evaluate arbitrary sequence inputs, providing a robust framework for understanding genotype-phenotype relationships. However, despite strong performance across genomic loci , S2F models struggle with inter-individual variation. Training a model to make genotype-dependent predictions at a single locus-an approach we call personal genome training-offers a potential solution. We introduce SAGE-net, a scalable framework and software package for training and evaluating S2F models using personal genomes. Leveraging its scalability, we conduct extensive experiments on model and training hyperparameters, demonstrating that training on personal genomes improves predictions for held-out individuals. However, the model achieves this by identifying predictive variants rather than learning a cis-regulatory grammar that generalizes across loci. This failure to generalize persists across a range of hyperparameter settings. These findings highlight the need for further exploration to unlock the full potential of S2F models in decoding the regulatory grammar of personal genomes. Scalable software and infrastructure development will be critical to this progress. ### Competing Interest Statement The authors have declared no competing interest.

www.biorxiv.org

15 31

Reposted by Sara Mostafavi

David A Knowles @davidaknowles.bsky.social · Jan 27

#MLCB2025 will be Sept 10-11 at @nygenome.org in NYC! Paper deadline June 1st & in-person registration will open in May. Please sign up for our mailing list groups.google.com/g/mlcb/ for future announcements. More details at mlcb.github.io. Please RP!

14 33