Gherman Novakovsky
@gnovakovsky.bsky.social
160 followers 140 following 20 posts
PhD, Illumina AI lab
Posts Media Videos Starter Packs
Pinned
gnovakovsky.bsky.social
Excited to share my first contribution here at Illumina! We developed PromoterAI, a deep neural network that accurately identifies non-coding promoter variants that disrupt gene expression.🧵 (1/)
Reposted by Gherman Novakovsky
wywywa.bsky.social
I'm hiring a Bioinformatics Research Associate for the Silent Genomes Project. PhD required, restricted to Canadians, work must be performed in British Columbia.

Great for those who love pipelines, whole genome data and work with a social purpose.
ubc.wd10.myworkdayjobs.com/ubcfacultyjo...
Research Associate
Academic Job Category Faculty Non Bargaining Job Title Research Associate Department Wasserman Laboratory | Department of Medical Genetics | Faculty of Medicine (Wyeth Wasserman) Posting End Date Augu...
ubc.wd10.myworkdayjobs.com
Reposted by Gherman Novakovsky
anshulkundaje.bsky.social
@saramostafavi.bsky.social (@Genentech) & I (@Stanford) r excited to announce co-advised postdoc positions for candidates with deep expertise in ML for bio (especially sequence to function models, causal perturbational models & single cell models). See details below. Pls RT 1/
gnovakovsky.bsky.social
Yes, that's exactly what it is. Predicting the difference here is important.
gnovakovsky.bsky.social
This ensures the model focuses on the actual variant and doesn't overfit to correlated but irrelevant features, which leads to a better generalization.
gnovakovsky.bsky.social
Certainly! Here the entire model is shared, two copies see inputs that differ only at a single base pair (a variant of interest), and the model weights are tuned to learn the difference in effect size correctly.
gnovakovsky.bsky.social
Great question! That's our best guess as well and we highlight this in the paper by saying that MPRA experimental data from individual cell lines could have limitations for variant interpretation.
gnovakovsky.bsky.social
Huge thanks to the amazing Illumina team—this was an incredible learning experience! I'm excited to keep pushing forward as we develop models to tackle gene expression and non-coding variant interpretation. (16/)
gnovakovsky.bsky.social
A complementary thread from my colleague Kishore Jaganathan ‪@kjaganatha.bsky.social‬ bsky.app/profile/kjag... (15/)
kjaganatha.bsky.social
We're thrilled to introduce PromoterAI — a tool for accurately identifying promoter variants that impact gene expression. 🧵 (1/)
gnovakovsky.bsky.social
We followed up by testing promoter variants in Mendelian genes using MPRA. Surprisingly, PromoterAI was more effective than MPRA at prioritizing variants linked to patient phenotypes, highlighting limitations of MPRA for rare disease interpretation. (13/)
gnovakovsky.bsky.social
While we noticed that the use of additional species such as mouse does not lead to substantial improvement of variant effect prediction, it does help with ensembling. Thus, the final model is an ensemble of two: trained on human only and trained on mouse+human together. (12/)
gnovakovsky.bsky.social
In the Genomics England rare disease cohort, functional promoter variants predicted by PromoterAI were enriched in phenotype-matched Mendelian genes. These variants accounted for an estimated 6% of the rare disease genetic burden. (11/)
gnovakovsky.bsky.social
In the UK biobank cohort, PromoterAI's predicted promoter variant effects correlated strongly with measured protein levels and quantitative traits, suggesting that promoter variants contribute meaningfully to phenotypic variation in the general population. (10/)
gnovakovsky.bsky.social
PromoterAI's embeddings split promoters into three distinct classes: P1 (~9K genes, ubiquitously active), P2 (~3K genes, bivalent chromatin), E (~6K genes, enhancer-like). The E class, enriched for TATA boxes, may reflect enhancers co-opted as promoters. (9/)
gnovakovsky.bsky.social
Fine-tuning improved PromoterAI’s ability to predict the direction of motif effects — a known issue of multitask models. The model often recognized motifs before fine-tuning, but got the direction wrong. After fine-tuning, its predictions aligned better with the data. (8/)
gnovakovsky.bsky.social
We used our list of gene expression outliers to explore their effect on transcription factor binding sites. Our results show that it is easier for new variants to cause outlier gene expression by disrupting existing regulatory components rather than creating new ones. (7/)
gnovakovsky.bsky.social
We also attempted to fine-tune Enformer and Borzoi on our promoter variant set. While performance improved, both models lagged behind PromoterAI. Notably, PromoterAI outperformed Enformer and was similar to Borzoi before fine-tuning. (6/)
gnovakovsky.bsky.social
When it comes to predicting expression effects of promoter variants, PromoterAI achieved best performance across benchmarks spanning RNA, proteins, QTLs, and MPRA. (5/)
gnovakovsky.bsky.social
The second step was to fine-tune the model using a carefully curated list of rare promoter variants linked to aberrant gene expression. The fine-tuning was done using a twin-network setup to ensure the generalization across unseen genes and datasets. (4/)
gnovakovsky.bsky.social
First, we pre-trained PromoterAI to predict histone marks, TF binding, DNA accessibility, and CAGE signal from a genomic sequence. The key difference with models like Enformer and Borzoi is that we predict at a single base-pair resolution and use only TSS-centered regions. (3/)
gnovakovsky.bsky.social
PromoterAI is built from transformer-inspired blocks called metaformers — but instead of attention, we use depthwise convolutions, making it a fully convolutional model. We believe that CNN-based methods are not surpassed yet and remain a great choice for genomics tasks. (2/)
gnovakovsky.bsky.social
Excited to share my first contribution here at Illumina! We developed PromoterAI, a deep neural network that accurately identifies non-coding promoter variants that disrupt gene expression.🧵 (1/)
Reposted by Gherman Novakovsky
manusaraswat.bsky.social
🧠 Excited to share my main PhD project! We mapped the regulatory rules governing Glioblastoma plasticity using single-cell multi-omics and deep learning. This work is part of a two-paper series with @bayraktarlab.bsky.social @oliverstegle.bsky.social and @moritzmall.bsky.social, Preprint at end🧵👇
Reposted by Gherman Novakovsky