Frederick "Erick" Matsen
@matsen.bsky.social
550 followers 130 following 41 posts
I ♥ evolution, immunology, math, & computers. Professor at Fred Hutch & Investigator at HHMI. http://matsen.fredhutch.org/
Posts Media Videos Starter Packs
matsen.bsky.social
... and second is to have a map from the figures to where they are made in the associated "experiments" code repository (github.com/matsengrp/dn...):
matsen.bsky.social
I forgot to post two things I liked doing in this paper that I hope catch on. First is to have links in the methods section to the model fitting code (in a tagged version github.com/matsengrp/ne... as the code continues to evolve):
matsen.bsky.social
Oh, and here is a picture of a cyborg-Darwin (cooked up by Gemini), after he realized how useful transformers are. For some reason MBE didn't want it as a cover image!
matsen.bsky.social
Many thanks to Kevin Sung and Mackenzie Johnson for leading the all-important task of data prep, Will Dumm for code and methods contributions, David Rich for structural work, and Tyler Starr, Yun Song, Phil Bradley, Julia Fukuyama, and Hugh Haddox for conceptual help.
matsen.bsky.social
We have positioned our group in this niche: we want to answer biological questions using ML-supercharged versions of the methods that scientists have been using for decades to derive insight.

More in this theme to come!
matsen.bsky.social
Stepping back, I think that transformers and their ilk have so much to offer fields like molecular evolution. Now we can parameterize statistical models using a sequence as an input!
matsen.bsky.social
And because natural selection is predicted for individual sequences, we can also investigate changes in selection strength as a sequence evolves down a tree:
matsen.bsky.social
Because this model isn't constrained to work with a fixed-width multiple sequence alignment we can do things like look at per-site selection factors on sequences with varying CDR3 length:
matsen.bsky.social
If a selection factor at a given site for a given sequence is

• > 1 that is diversifying selection
• = 1 that is neutral selection
• < 1 that is purifying selection.
matsen.bsky.social
The model is above. In many ways it is like a classical model of mutation and selection, but the mutation model is a convolutional model and the selection model is a transformer-encoder mapping from AA sequences to a vector of selection factors of the same length as the sequence.
matsen.bsky.social
The final version of our transformer-based model of natural selection has come out in MBE. I hope some molecular evolution researchers find this interesting & useful as a way to express richer models of natural selection. doi.org/10.1093/mol... (short 🧵)
matsen.bsky.social
Hats off to first author Kevin Sung www.linkedin.com/in/kevinsun... and the rest of the team 🙏 !
matsen.bsky.social
It's possible that more complex models not more significantly dominating comes from a lack of suitable training data, namely neutrally evolving out-of-frame sequences. We tried to augment the training data, with no luck.
matsen.bsky.social
The resulting models are better than 5-mer models, but only modestly so. We made many efforts to include a per-site rate but concluded that the effects of such a rate were weak enough that including them did not improve model performance.
matsen.bsky.social
Solution: first embed 3-mers and then the number of parameters goes up only linearly with the context width.
matsen.bsky.social
The final version of our "Thrifty" paper is up now: elifesciences.org/articles/10... .

We were motivated to fit wide-context mutation models based on previous analyses showing "mesoscale" effects and a position-specific effect. But, how to avoid exploding the number of parameters? 🧵
Thrifty wide-context models of B cell receptor somatic hypermutation
Convolutional embedding models efficiently capture wide sequence context in antibody somatic hypermutation, avoiding exponential k-mer parameter scaling and eliminating the need for per-site modeling.
elifesciences.org
matsen.bsky.social
Is an idea likely to advance the field?

Our "preflight check" exercise provides a structured approach for thinking through computational biology research projects.

github.com/matsengrp/pr...

Thanks to @sdwfrost.bsky.social for the core idea!
GitHub - matsengrp/preflight
Contribute to matsengrp/preflight development by creating an account on GitHub.
github.com
matsen.bsky.social
Interested in doing a PhD or postdoc in our group? Here is a letter to you: matsen.group/general/202...

We are ready to recruit a trainee who can help develop the next generation of our transformer-based models of natural selection. See the "joining" tab of our website for details.
Dear future trainee:
Let's have fun, work hard, and feel lucky that our job is to expand the boundary of knowledge.
matsen.group
matsen.bsky.social
Open bioinformatics position on next-generation protein evolution models! Join HHMI's AI initiative at Janelia Farm, Virginia, (an amazing place) and work closely with our team. Help us build the future! 🧬 + 🤖 = ❤️
hhmi.wd1.myworkdayjobs.com/en-US/Exter...
matsen.bsky.social
Why does selection feel so weak relative to mutation in affinity maturation? A new blog post giving three perspectives, including our new transformer-based model of natural selection on antibodies: matsen.group/general/202...
The term 'affinity maturation' understates the influence of somatic hypermutation
Three recent papers quantify how nucleotide-level mutation processes drive antibody evolution.
matsen.group