Lightnews — Scholar-powered news

Reposted by Vikram Shivakumar

Sina Majidian @sinamajidian.bsky.social · 17d

Excited to share our EvANI benchmarking workflow, published in Briefings in Bioinformatics doi.org/10.1093/bib/...
Computing average nucleotide identity (ANI) is neither conceptually nor computationally trivial. Its definition has evolved over years, with different meanings and assumptions (1/5)

Figure 1(A) ANI quantifies the similarity between two genomes. ANI can be defined as the number of aligned positions where the two aligned bases are identical, divided by the total number of aligned bases. Historically, ANI was calculated using a single gene family for multiple sequence alignment. Another approach finds orthologous genes between two genomes and reports the average similarity between their CDSs. This method was later extended to whole-genome alignment by identifying local alignments and excluding supplementary alignments with lower similarity. (B) Different ANI tools employ various approaches in calculating ANI values. ANIm, OrthoANI, and FastANI use aligners to identify homologous regions, whereas Mash uses k-mer hashing to estimate similarities. Only alignments with higher similarity represented by green arrows are included in ANI calculations, while red arrows, corresponding to paralogs, are excluded. (C) The proposed benchmarking method evaluates the performance of different tools using both real and simulated data. It assumes that more distantly related species on the phylogenetic tree should have lower ANI similarities. This is measured by calculating the statistics of Spearman rank correlation. We expect a negative correlation between ANI and the tree distance (scatter plot on the right).
https://academic.oup.com/bib/article/doi/10.1093/bib/bbaf267/8160681

1 12 28

Vikram Shivakumar @vikramshivakumar.bsky.social · Aug 22

10/10 tool name 👌

1 3

Reposted by Vikram Shivakumar

Sina Majidian @sinamajidian.bsky.social · Aug 20

Great talk by Vikram @vikramshivakumar.bsky.social on studying pangenomes and synteny visualization in #WABI25
Github: github.com/vikshiv/mume...
First paper: genomebiology.biomedcentral.com/articles/10....
Second: www.biorxiv.org/content/10.1... #WABI2025

Anchor-based merging requires a common sequence (red) present in each partition. Multi-MUMs are merged by identifying overlaps between partition-specific matches in the anchor coordinate space, and a uniqueness threshold determines if a MUM is still unique in each partition after truncation. (B) String-based merging enables computation of multi-MUMs between partitions without a common sequence. An example tree (left) is shown, highlighting the use case where partial multi-MUMs specific to internal nodes (starred) can be computed by merging subclade- based partitions up a tree. (right) MUM overlaps are computed by running Mumemto on the MUM sequences, and the uniqueness threshold array ensures overlaps remain unique across the merged dataset. (C) An example Burrows-Wheeler Transform (BWT), matrix (BWM), and Longest Common Prefix (LCP) array, with sequence IDs for each suffix shown (ID). A non-maximal unique match (UM) is shown, and the uniqueness threshold for this match is found using the flanking LCP values. (D) A partial multi-MUM (in blue) is found in all-but-one sequence (excluded in red). Using two anchor sequences (red and orange), all-but-one partial MUMs can be computed using an augmented anchor-based merging method.

(A) Phylogeny of geographically diverse A. thaliana accessions (Lian et al. 2024), with broad geographical regions colored. Internal nodes are labeled with the coverage of partial multi-MUMs across the leaves of each node. Internal node partial MUMs are computed by merging subtree-based partitions progressively up the phylogeny. (B) Global multi-MUM synteny across the full dataset shown in blue (with inversions in green). Global MUMs are computed by merging all partitions together (representing the root node). Additionally, three geographically distinct subgroups are highlighted and partition-specific multi-MUMs (in purple, with inversions in pink) reveal local structural variation in centromeric regions.

8 22

Reposted by Vikram Shivakumar

Rob Patro @robp.bsky.social · Aug 20

Vikram Shivakumar telling us about "Partitioned Multi-MUM finding for scalable pangenomics" #WABI25! So many kinds of matches!

3 8

Reposted by Vikram Shivakumar

Liana Lareau @lianafaye.bsky.social · Aug 7

This preprint from Helen Sakharova is one of the coolest things to come out of my lab: “Protein language models reveal evolutionary constraints on synonymous codon choice.” Codon choice is a big puzzle in how information is encoded in genomes, and we have a new angle. www.biorxiv.org/content/10.1...

Protein language models reveal evolutionary constraints on synonymous codon choice

Evolution has shaped the genetic code, with subtle pressures leading to preferences for some synonymous codons over others. Codons are translated at different speeds by the ribosome, imposing constrai...

www.biorxiv.org

6 83 210

Vikram Shivakumar @vikramshivakumar.bsky.social · Aug 6

Not saying I agree either way, but one pro for text-based file formats are less dependencies needed for viewing files

1

Vikram Shivakumar @vikramshivakumar.bsky.social · Jul 31

This is so amazing, thank you!

1

Reposted by Vikram Shivakumar

Petra Korlević @petrathepostdoc.bsky.social · Jul 31

#SciArt doodle of @vikramshivakumar.bsky.social's talk yesterday at the @sangerinstitute.bsky.social on MUMs*

*maximal unique matches in pangenomes, now if you did that on sequenced moms you could do mummoms

comic doodle of Vikram Shivakumar in a sweater and checkered shirt on a pink gradient background, with various elements of the talk to the left: two old moms pointing at MUMs, below an explanation of what those are (large chunks of the same DNA sequence through the genome), at the bottom a few of the organisms worked on: a tomato, a potato, an arabidopsis weed.

2 2 18

Reposted by Vikram Shivakumar

jakobheinz.bsky.social @jakobheinz.bsky.social · Jul 21

Excited to share our new preprint on detecting foldback artifacts in long reads with my advisors Matthew Meyerson and @lh3lh3.bsky.social ! Stop by poster C-180 on Wednesday at ISMB/ECCB2025 to learn more and chat!

bioRxiv Bioinfo @biorxiv-bioinfo.bsky.social · Jul 19

Detecting Foldback Artifacts in Long Reads https://www.biorxiv.org/content/10.1101/2025.07.15.664946v1

2 4

Vikram Shivakumar @vikramshivakumar.bsky.social · Jul 21

And of course, the poster itself:

2

Vikram Shivakumar @vikramshivakumar.bsky.social · Jul 21

If you’re in Liverpool, stop by my poster A217 at ISMB/EECB 2025, and chat about all things pangenomes, MUMs, and alignment (and the Beatles or Oasis-mania)

2 6 16

Vikram Shivakumar @vikramshivakumar.bsky.social · Jun 17

Really excited to see this published! To more mum-finding 🍻

Ben Langmead @benlangmead.bsky.social · Jun 17

Now published! Note that since Vikram's original post (quoted here), he's made it easy to dynamically update a set of multi-MUMs (e.g. when more genomes are added to a pangenome) and to find multi-MUMs for huge collections like HPRCv2 genomebiology.biomedcentral.com/articles/10....

2 7 21

Reposted by Vikram Shivakumar

Rob Patro @robp.bsky.social · Jun 16

🖥️🧬We're thrilled to announce that one of our keynote speakers at #WABI2025 will be the inimitable @benlangmead.bsky.social! wabiconf.github.io/2025/talks/t... Ben's keynote is titled "We are what we index; a primer for the Wheeler Graph era", & it's sure to be a whirlwind tour of full-text indexing!

We are what we index; a primer for the Wheeler Graph era

Talk by Ben Langmead - WABI 2025

wabiconf.github.io

1 5 20

Reposted by Vikram Shivakumar

Mohsen Zakeri @mohsenzakeri.bsky.social · May 29

1/5 We introduce Movi Color, led by Steven Tan (a brilliant undergrad member of Langmead lab) for taxonomic and multi-class classification. It uses a full-text index based on the move structure and does not rely on predefined values (like k-mer length) for index building.
github.com/mohsenzakeri...

1 6 15

Vikram Shivakumar @vikramshivakumar.bsky.social · May 27

We've released a new version (v1.3) of Mumemto (github.com/vikshiv/mume...) that implements merging. Running Mumemto in merge-mode makes the output set of multi-MUMs dynamic, so adding new assemblies is as easy as computing a new set of MUMs and merging them in.

GitHub - vikshiv/mumemto: Mumemto: multi-MUM and MEM finding across pangenomes

Mumemto: multi-MUM and MEM finding across pangenomes - vikshiv/mumemto

github.com

5

Vikram Shivakumar @vikramshivakumar.bsky.social · May 27

We can also merge along the shape of a phylogenetic tree, finding clade-specific variation and conserved elements. Previously, adding new assemblies can lose MUMs, which must be present across the whole collection. Now we can find MUMs that reveal local variation distinct to specific subgroups. 3/n

1 2 4

Vikram Shivakumar @vikramshivakumar.bsky.social · May 27

We implement two partition/merge algorithms that can merge multi-MUMs between datasets. This makes Mumemto highly parallelizable, but also very memory efficient if partitions are computed in serial. 2/n

1 2

Vikram Shivakumar @vikramshivakumar.bsky.social · May 27

Excited to share a new update to Mumemto, scaling MUM and conserved element finding to any size pangenome! Preprint out now w/ @benlangmead.bsky.social.
Mumemto scales to the new HPRC v2 release and beyond, and can merge in future assemblies without any recomputation! 1/n

Partitioned Multi-MUM finding for scalable pangenomics

Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We prev...

www.biorxiv.org

1 15 27

Reposted by Vikram Shivakumar

Arun Das @arun-das.bsky.social · May 15

Our pre-print on investigating variation in South Asian genomes is now out!

Thank you to @mikeschatz.bsky.social, @rajivmccoy.bsky.social and @aabiddanda.bsky.social for all their work on this.

🧵 A thread on the key results and takeaways from our work:

bioRxiv Genomics @biorxiv-genomic.bsky.social · May 15

Assembling unmapped reads reveals hidden variation in South Asian genomes https://www.biorxiv.org/content/10.1101/2025.05.14.653340v1

2 9 23

Reposted by Vikram Shivakumar

Sara Carioscia @saracarioscia.bsky.social · May 7

If you are here at #bog25 please check out my poster (number 87) tonight! 😁 Showing our work on common variation associated with aneuploidy in human embryos

7 20

Vikram Shivakumar @vikramshivakumar.bsky.social · May 9

Excited to share our latest work on comparing and visualizing multiple genome assemblies to identify conservation and structural variation in pangenomes with Mumemto! Check out poster 250 at #bog25 if you are here. New preprint coming very soon 👀

14 34

Reposted by Vikram Shivakumar

Igor Martayan @imartayan.bsky.social · Apr 27

Next up is Nathaniel Brown from @benlangmead.bsky.social's group presenting col-bwt, a new algorithm for computing chain statistics using multi-maximal unique matches.

www.biorxiv.org/content/10.1...

2 5 15

Reposted by Vikram Shivakumar

Nature Methods @natmethods.nature.com · Mar 28

Uncalled4: a toolkit for nanopore signal alignment, analysis and visualization of DNA and RNA modifications.

www.nature.com/articles/s41...

1 26 46

Reposted by Vikram Shivakumar

Bohan Ni @bohanni.bsky.social · Mar 26

Happy to share our work characterizing functional rare SVs in rare diseases with long-read genome sequencing and transcriptomic outlier data: genome.cshlp.org/content/earl...

Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease

An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

genome.cshlp.org

1 7 10