Lightnews — Scholar-powered news

Yun S. Song @yun-s-song.bsky.social · 16d

Not yet, but we will surely generate bp-resolution genome-wide scores for all six species studied in the paper and make them publicly available. For now, we have predictions for ~10M variants used in the S-LDSC analysis in humans.

3

Reposted by Yun S. Song

Anshul Kundaje @anshulkundaje.bsky.social · 16d

This is truly an incredible breakthrough IMO. Really exemplifies what you get when deep domain expertise (popgen/evolution/disease genetics in this case) fuses with cleverly crafted ML. What u get r sleek, well thought out architectures that absolutely destroy the behemoths. Wow!! 1/

Yun S. Song @yun-s-song.bsky.social · 16d

We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
www.biorxiv.org/content/10.1...
(1/n)

1 14 59

Yun S. Song @yun-s-song.bsky.social · 16d

All in all, we believe that GPN-Star offers a scalable & flexible approach for training effective gLMs.

This work was led by my talented students @czye.bsky.social and @gonzalobenegas.bsky.social, with contributions from other lab members, @peterdfields.bsky.social at Jax, & B. Clarke at DKFZ
(n/n)

1 1 4

Yun S. Song @yun-s-song.bsky.social · 16d

Upon publication, we will release base-resolution predictions for the human genome and the five model organisms.
Codes to train the model, run inference, and reproduce the analyses are available on GitHub (github.com/songlab-cal/...) and Hugging Face (tinyurl.com/nhhcppvm).
(9/n)

GitHub - songlab-cal/gpn: Genomic Pre-trained Network

Genomic Pre-trained Network. Contribute to songlab-cal/gpn development by creating an account on GitHub.

github.com

1 7

Yun S. Song @yun-s-song.bsky.social · 16d

To show that GPN-Star is a robust and generalizable framework that can advance biology beyond human genetics, we apply it to train gLMs for five well-studied model organisms and demonstrate their effectiveness in assessing variant effects in these species.
(8/n)

1 4

Yun S. Song @yun-s-song.bsky.social · 16d

In addition, GPN-Star exhibits meaningful nucleotide dependencies that align with known functional dependencies, indicating its potential to help understand genomic syntax. This represents a notable advance over traditional conservation scores.
(7/n)

1 7

Yun S. Song @yun-s-song.bsky.social · 16d

By training GPN-Star on vertebrate, mammal, and primate alignments, we reveal task-dependent advantages of modeling deeper versus more recent evolution. These findings offer new biological insights and practical guidance for developing future gLMs and evolutionary models.
(6/n)

1 2 4

Yun S. Song @yun-s-song.bsky.social · 16d

GPN-Star achieves unprecedented SNP heritability enrichments across over 100 human complex traits. Moreover, we devise a simple approach to incorporate tissue-specificity into the model prediction and show that it further improves heritability enrichment.
(5/n)

1 4

Yun S. Song @yun-s-song.bsky.social · 16d

We compare GPN-Star with several models, including the recent AlphaGenome and Evo2 models with up to 1Mb context size and 40B parameters, and observe that GPN-Star consistently ranks at the top across a wide range of human variant effect prediction tasks.
(4/n)

1 3

Yun S. Song @yun-s-song.bsky.social · 16d

We also introduce a calibration method that removes the confounding effect of mutation rate variation from gLM predictions for the first time. This improves downstream performance and enables a more direct interpretation of model scores as estimates of selective constraint.
(3/n)

1 1 5

Yun S. Song @yun-s-song.bsky.social · 16d

GPN-Star features a novel phylogeny-aware architecture that enables the model to explicitly capture evolutionary relationships encoded in whole-genome alignments and overcomes the key limitations of our earlier model GPN-MSA (doi.org/10.1038/s415...).
(2/n)

1 8

Yun S. Song @yun-s-song.bsky.social · 16d

We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
www.biorxiv.org/content/10.1...
(1/n)

4 89 170

Yun S. Song @yun-s-song.bsky.social · 27d

Thanks, Josh. I wish you had been one of our reviewers—life would’ve been so much easier.

2

Yun S. Song @yun-s-song.bsky.social · 27d

SINGER, our ARG inference method, is finally published and freely available online:

doi.org/10.1038/s415...

It was a long journey – 16 months from initial submission to acceptance. Is it just me, or has peer review gotten more arduous lately? 4+ rounds of review isn't so unusual these days...

Robust and accurate Bayesian inference of genome-wide genealogies for hundreds of genomes - Nature Genetics

SINGER is a method for creating ancestral recombination graphs to understand the genealogical history of genomes. The method has increased speed, and thus scalability, without sacrificing accuracy.

doi.org

1 51 97

Reposted by Yun S. Song

Alan Aw @alan-aw.bsky.social · Sep 5

Hi Bluesky — Dedicating my first post to this work and software, led by the incredibly meticulous and capable @fandingzhou.bsky.social! An earlier version of this was shared at the 2022 Bioconductor Conference (bioc2022.bioconductor.org/schedule/).

fandingzhou.bsky.social @fandingzhou.bsky.social · Sep 5

Gene expression changes aren’t just about mean shifts — variability shifts matter too, especially for aging. We're thrilled to introduce QRscore, a flexible non-parametric framework for detecting shifts in mean and variance across conditions. doi.org/10.1016/j.cr...

1 1 3

Reposted by Yun S. Song

fandingzhou.bsky.social @fandingzhou.bsky.social · Sep 5

Gene expression changes aren’t just about mean shifts — variability shifts matter too, especially for aging. We're thrilled to introduce QRscore, a flexible non-parametric framework for detecting shifts in mean and variance across conditions. doi.org/10.1016/j.cr...

1 3 12

Reposted by Yun S. Song

psathyrella.bsky.social @psathyrella.bsky.social · Aug 16

In a new preprint we use deep learning on lineage trees to infer the functional form of the relationship between affinity and fitness that controls antibody evolution in germinal centers: arxiv.org/abs/2508.09871 🧵

Inference of germinal center evolutionary dynamics via simulation-based deep learning

B cells and the antibodies they produce are vital to health and survival, motivating research on the details of the mutational and evolutionary processes in the germinal centers (GC) from which mature...

arxiv.org

1 9 15

Yun S. Song @yun-s-song.bsky.social · Aug 15

This work was led by my talented student Milind Jagota @milindjagota.bsky.social in collaboration with colleagues at UC Berkeley, UCSF (the Ye Lab @yimmieg.bsky.social), and Fred Hutch (the Matsen Lab @matsen.bsky.social). We are grateful to all co-authors for their enthusiasm and hard work. (n/n)

ky.social

5

Yun S. Song @yun-s-song.bsky.social · Aug 15

From a machine learning perspective, this work illustrates the value of high-quality negative examples. The paper is mostly focused on BCR light chains, but we are excited about extensions. (10/n)

1 2

Yun S. Song @yun-s-song.bsky.social · Aug 15

We interpret what sequence features the model associates with dysfunction. One example is shown below. For a specific light chain V- and J- gene, we observe sharp selection on CDRL3 length, and on certain amino acids. (9/n)

1 1

Yun S. Song @yun-s-song.bsky.social · Aug 15

In new data, we find that very low scores are associated with reduced surface expression in naive B cells. To our knowledge, this is the first time expression variation in naive B cells has been linked to the light chain. (8/n)

1 1

Yun S. Song @yun-s-song.bsky.social · Aug 15

B cells can further mutate antibodies to improve binding. We compare observed mutations to random control sets of mutations. Mutations that significantly decrease model scores appear to be selected out. However, this only works in a few positions. (7/n)

1 1

Yun S. Song @yun-s-song.bsky.social · Aug 15

Models trained on allelic inclusion generalize to predict antibody properties with no direct training. Here we apply models to independent data measuring polyreactivity of human antibodies and observe correlation with polyreactivity. Baselines don’t capture this signal. (6/n)

1 1

Yun S. Song @yun-s-song.bsky.social · Aug 15

We don’t know which sequence in each double-light B cell is “bad”, but we develop a training framework that doesn’t need this information. We compare with baseline approaches that don’t use the new allelic inclusion data. (5/n)

1 2

Yun S. Song @yun-s-song.bsky.social · Aug 15

We propose using double-light B cells as negative examples for antibody machine learning. Double-light B cells can be observed at scale in some recent datasets of human antibodies. Each such cell has one “bad” sequence, whereas other cells all have functional antibodies. (4/n)

1 3