@georghochberg.bsky.social
200 followers 76 following 48 posts
Evolutionary biochemist and protein complex enthusiast at the University of Marburg.
Posts Media Videos Starter Packs
Pinned
georghochberg.bsky.social
That is an alignment scoring matrix, similar to Blosum for amino acids. Alignment and phylogenetic substitution matrices have different purposes (one scores co-occurences within a column, the other the likelihood of substitution across a branch).
georghochberg.bsky.social
This matrix comes with the same simplifying assumptions we make about amino acid evolution, so there is that.
It will probably work for ABC transporters, but shoot me an email if you need help.
georghochberg.bsky.social
Exactly the opposite. This is an evolutionary model like WAG.
georghochberg.bsky.social
Oh and Sriram should be a co-corresponding author on this with me, but the journal messed this up after the proofs. I hope this will be corrected when the final version is out. He really drove this project.
georghochberg.bsky.social
Finally, I also want to point out the contribution of other groups in this field, in particular Nick Matzke and Christophe Dessimoz, who developed similar ideas to ours.
georghochberg.bsky.social
We are very excited about the possibilities this opens up, especially with respect to very deep history. We have number of very exciting things cooking that I’m dying to show to people when they are ready.
georghochberg.bsky.social
We are hoping that this matrix will eventually be integrated into IQtree directly, but for now you can just download it and use it as custom substitution model.
georghochberg.bsky.social
Our paper also discusses what we think this new method is good for, and what it may not be good for, as well as how it differs from other structural phylogenetics methods that other clever people have already developed.
georghochberg.bsky.social
We then use this new model to revisit the old question about the root using the structures of universal paralogs. Thankfully, the answer is that the root indeed lies where everyone always thought it would. But now we finally have certainty.
georghochberg.bsky.social
We inferred such a model from 3Di alignments of 1000s of protein families, so that you don't have to. The matrix is available with the paper, please feel free to use it immediately. You can use it easily with IQtree.
georghochberg.bsky.social
The key missing ingredient for this is a substitution model that describes how one 3Di character evolves into another. This is what our paper now provides.
georghochberg.bsky.social
This alphabet has 20 letters, and therefore integrates well with existing tree inference software (to which a 3di alignment looks like a regular old protein alignment). This means we can use this to infer trees.
georghochberg.bsky.social
The key development is the 3Di structural alphbet, developed by @martinsteinegger.bsky.social et al. 3Di allows a structural 'sequence' to be derived from a protein structure and was developed for remote homology searches.
georghochberg.bsky.social
The big problem was that we needed a model for the evolution of protein structures, such that we could unlock the power of maximum likelihood phylogenetics.
georghochberg.bsky.social
The idea is that instead of sequences, we could use structures to infer phylogenies. This idea is not ours, and it’s not new either. But for a long time there just wasn’t a good way to easily do this.
georghochberg.bsky.social
Here’s where our paper comes in. The central idea is that structure evolves more slowly than proteins. You can recognize structural homology over much greater phylogenetic distances than sequence homology.
georghochberg.bsky.social
This had long raised suspicions that the root may be wrong, or at least not certain, because our phylogenetic models struggle to find signal in these very diverged sequences. Other important problems, like the origin of photosynthesis suffer from the same issue.
georghochberg.bsky.social
There are a handful of such paralogs and that’s where the inference about the root of the tree of life came from. But it’s a messy business. The sequences of these universal paralogs have in most cases diverged almost beyond recognition.
georghochberg.bsky.social
The answer is so-called universal paralogs. These are genes that duplicated prior to LUCA, such that organisms now usually have two copies of them. If we throw both paralogs on the same tree, they root each other.
georghochberg.bsky.social
You might not know how this was actually decided. There is no outgroup for life, so how was that tree even rooted?
georghochberg.bsky.social
This is common and affects things you might think we know for sure. For example, I am sure you have seen the three (now two-) domain tree of life, that places the root (and therefore LUCA) on the branch between bacteria and Archaea.
georghochberg.bsky.social
For example, you might be interested in the relationship between two proteins that you suspect are related. But their lineages separated so long ago that their amino acid sequences no longer have any recognizable sequence similarity.