Lightnews — Scholar-powered news

Reposted by Gherman Novakovsky

Wyeth Wasserman @wywywa.bsky.social · Jul 22

I'm hiring a Bioinformatics Research Associate for the Silent Genomes Project. PhD required, restricted to Canadians, work must be performed in British Columbia.

Great for those who love pipelines, whole genome data and work with a social purpose.
ubc.wd10.myworkdayjobs.com/ubcfacultyjo...

Research Associate

Academic Job Category Faculty Non Bargaining Job Title Research Associate Department Wasserman Laboratory | Department of Medical Genetics | Faculty of Medicine (Wyeth Wasserman) Posting End Date Augu...

ubc.wd10.myworkdayjobs.com

6 6

Reposted by Gherman Novakovsky

Anshul Kundaje @anshulkundaje.bsky.social · Jun 19

@saramostafavi.bsky.social (@Genentech) & I (@Stanford) r excited to announce co-advised postdoc positions for candidates with deep expertise in ML for bio (especially sequence to function models, causal perturbational models & single cell models). See details below. Pls RT 1/

1 40 55

Gherman Novakovsky @gnovakovsky.bsky.social · Jun 10

Yes, that's exactly what it is. Predicting the difference here is important.

1

Gherman Novakovsky @gnovakovsky.bsky.social · Jun 10

This ensures the model focuses on the actual variant and doesn't overfit to correlated but irrelevant features, which leads to a better generalization.

1 1

Gherman Novakovsky @gnovakovsky.bsky.social · Jun 10

Certainly! Here the entire model is shared, two copies see inputs that differ only at a single base pair (a variant of interest), and the model weights are tuned to learn the difference in effect size correctly.

1 1

Gherman Novakovsky @gnovakovsky.bsky.social · Jun 10

Great question! That's our best guess as well and we highlight this in the paper by saying that MPRA experimental data from individual cell lines could have limitations for variant interpretation.

1

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

Huge thanks to the amazing Illumina team—this was an incredible learning experience! I'm excited to keep pushing forward as we develop models to tackle gene expression and non-coding variant interpretation. (16/)

2

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

A complementary thread from my colleague Kishore Jaganathan ‪@kjaganatha.bsky.social‬ bsky.app/profile/kjag... (15/)

kjaganatha.bsky.social @kjaganatha.bsky.social · May 29

We're thrilled to introduce PromoterAI — a tool for accurately identifying promoter variants that impact gene expression. 🧵 (1/)

1 2

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

Want to learn more about PromoterAI?
📄 Read the paper: science.org/doi/10.1126/...
💻 Explore the code & precomputed scores: github.com/Illumina/Pro.... (14/)

Predicting expression-altering promoter mutations with deep learning

Only a minority of patients with rare genetic diseases are currently diagnosed by exome sequencing, suggesting that additional unrecognized pathogenic variants may reside in non-coding sequence. Here,...

science.org

1 3 21

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

We followed up by testing promoter variants in Mendelian genes using MPRA. Surprisingly, PromoterAI was more effective than MPRA at prioritizing variants linked to patient phenotypes, highlighting limitations of MPRA for rare disease interpretation. (13/)

2 1

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

While we noticed that the use of additional species such as mouse does not lead to substantial improvement of variant effect prediction, it does help with ensembling. Thus, the final model is an ensemble of two: trained on human only and trained on mouse+human together. (12/)

1 1

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

In the Genomics England rare disease cohort, functional promoter variants predicted by PromoterAI were enriched in phenotype-matched Mendelian genes. These variants accounted for an estimated 6% of the rare disease genetic burden. (11/)

1 2

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

In the UK biobank cohort, PromoterAI's predicted promoter variant effects correlated strongly with measured protein levels and quantitative traits, suggesting that promoter variants contribute meaningfully to phenotypic variation in the general population. (10/)

1 1

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

PromoterAI's embeddings split promoters into three distinct classes: P1 (~9K genes, ubiquitously active), P2 (~3K genes, bivalent chromatin), E (~6K genes, enhancer-like). The E class, enriched for TATA boxes, may reflect enhancers co-opted as promoters. (9/)

1 1

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

Fine-tuning improved PromoterAI’s ability to predict the direction of motif effects — a known issue of multitask models. The model often recognized motifs before fine-tuning, but got the direction wrong. After fine-tuning, its predictions aligned better with the data. (8/)

1 1 2

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

We used our list of gene expression outliers to explore their effect on transcription factor binding sites. Our results show that it is easier for new variants to cause outlier gene expression by disrupting existing regulatory components rather than creating new ones. (7/)

1 1

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

We also attempted to fine-tune Enformer and Borzoi on our promoter variant set. While performance improved, both models lagged behind PromoterAI. Notably, PromoterAI outperformed Enformer and was similar to Borzoi before fine-tuning. (6/)

1 1 4

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

When it comes to predicting expression effects of promoter variants, PromoterAI achieved best performance across benchmarks spanning RNA, proteins, QTLs, and MPRA. (5/)

1 1

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

The second step was to fine-tune the model using a carefully curated list of rare promoter variants linked to aberrant gene expression. The fine-tuning was done using a twin-network setup to ensure the generalization across unseen genes and datasets. (4/)

2 3

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

First, we pre-trained PromoterAI to predict histone marks, TF binding, DNA accessibility, and CAGE signal from a genomic sequence. The key difference with models like Enformer and Borzoi is that we predict at a single base-pair resolution and use only TSS-centered regions. (3/)

1 2

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

PromoterAI is built from transformer-inspired blocks called metaformers — but instead of attention, we use depthwise convolutions, making it a fully convolutional model. We believe that CNN-based methods are not surpassed yet and remain a great choice for genomics tasks. (2/)

1 3

Gherman Novakovsky @gnovakovsky.bsky.social · May 29

Excited to share my first contribution here at Illumina! We developed PromoterAI, a deep neural network that accurately identifies non-coding promoter variants that disrupt gene expression.🧵 (1/)

1 21 59

Reposted by Gherman Novakovsky

Stein Aerts @steinaerts.bsky.social · May 16

Two massive glioblastoma papers, datasets, trajectories, insights, and.. a very cool new method for GRN inference - scDORI -from @steglelab.bsky.social @oliverstegle.bsky.social @bayraktarlab.bsky.social & Moritz Mall www.biorxiv.org/content/10.1... www.biorxiv.org/content/10.1...

Decoding Plasticity Regulators and Transition Trajectories in Glioblastoma with Single-cell Multiomics

Glioblastoma (GB) is one of the most lethal human cancers, marked by profound intratumoral heterogeneity and near-universal treatment resistance. Cellular plasticity, the capacity of cancer cells to t...

www.biorxiv.org

1 16 38

Reposted by Gherman Novakovsky

Manu Saraswat @manusaraswat.bsky.social · May 16

🧠 Excited to share my main PhD project! We mapped the regulatory rules governing Glioblastoma plasticity using single-cell multi-omics and deep learning. This work is part of a two-paper series with @bayraktarlab.bsky.social @oliverstegle.bsky.social and @moritzmall.bsky.social, Preprint at end🧵👇

1 29 76

Reposted by Gherman Novakovsky

Haky Im @hakyim.bsky.social · May 14

Check out our scPrediXcan paper
www.cell.com/cell-genomic...
Led by the talented @Charles_Zhou12 and supervised by @MengjieChen6
and me, with thanks to many contributors.

scPrediXcan integrates deep learning and single cell expression data into a powerful cell type specific TWAS framework.

scPrediXcan integrates deep learning methods and single-cell data into a cell-type-specific transcriptome-wide association study framework

Zhou et al. introduce scPrediXcan, a novel transcriptome-wide association study framework that integrates the deep learning-based model ctPred for cell-type-specific expression prediction. Applied to ...

www.cell.com

6 14