Lightnews — Scholar-powered news

Reposted by Mohsen Zakeri

Sina Majidian @sinamajidian.bsky.social · 17d

EvANI benchmarking workflow for evolutionary distance estimation academic.oup.com/bib/article/...

An great teamwork by @mohsenzakeri.bsky.social, @stephenhwang.bsky.social and me, with the excellent mentorship of @benlangmead.bsky.social

EvANI benchmarking workflow for evolutionary distance estimation

Abstract. Advances in long-read sequencing technology have led to a rapid increase in high-quality genome assemblies. These make it possible to compare gen

academic.oup.com

1 3 11

Reposted by Mohsen Zakeri

Sina Majidian @sinamajidian.bsky.social · Aug 20

Great talk by Vikram @vikramshivakumar.bsky.social on studying pangenomes and synteny visualization in #WABI25
Github: github.com/vikshiv/mume...
First paper: genomebiology.biomedcentral.com/articles/10....
Second: www.biorxiv.org/content/10.1... #WABI2025

Anchor-based merging requires a common sequence (red) present in each partition. Multi-MUMs are merged by identifying overlaps between partition-specific matches in the anchor coordinate space, and a uniqueness threshold determines if a MUM is still unique in each partition after truncation. (B) String-based merging enables computation of multi-MUMs between partitions without a common sequence. An example tree (left) is shown, highlighting the use case where partial multi-MUMs specific to internal nodes (starred) can be computed by merging subclade- based partitions up a tree. (right) MUM overlaps are computed by running Mumemto on the MUM sequences, and the uniqueness threshold array ensures overlaps remain unique across the merged dataset. (C) An example Burrows-Wheeler Transform (BWT), matrix (BWM), and Longest Common Prefix (LCP) array, with sequence IDs for each suffix shown (ID). A non-maximal unique match (UM) is shown, and the uniqueness threshold for this match is found using the flanking LCP values. (D) A partial multi-MUM (in blue) is found in all-but-one sequence (excluded in red). Using two anchor sequences (red and orange), all-but-one partial MUMs can be computed using an augmented anchor-based merging method.

(A) Phylogeny of geographically diverse A. thaliana accessions (Lian et al. 2024), with broad geographical regions colored. Internal nodes are labeled with the coverage of partial multi-MUMs across the leaves of each node. Internal node partial MUMs are computed by merging subtree-based partitions progressively up the phylogeny. (B) Global multi-MUM synteny across the full dataset shown in blue (with inversions in green). Global MUMs are computed by merging all partitions together (representing the root node). Additionally, three geographically distinct subgroups are highlighted and partition-specific multi-MUMs (in purple, with inversions in pink) reveal local structural variation in centromeric regions.

8 22

Reposted by Mohsen Zakeri

Rob Patro @robp.bsky.social · Aug 20

The 25th iteration of the excellent Conference for Algorithms in Bioinformatics (WABI) starts tomorrow at UMD @umdscience.bsky.social at the Brendan Iribe Center. You can find details at the website wabiconf.github.io/2025/. We'll use the tag #WABI25 for the meeting!

WABI 2025

WABI Conference on Algorithms in Bioinformatics

wabiconf.github.io

9 17

Reposted by Mohsen Zakeri

Rob Patro @robp.bsky.social · Jun 30

🧬🖥️ In addition to an update to oarfish, a new version (0.14.0) of piscem (zenodo.org/records/1509...) has just been released. This version pulls in some of the latest improvements to sshash by @jermp.bsky.social! 1/2

The piscem index

This manuscript provides a brief overview of the piscem index — a fast and compact index for the compacted, colored, reference (i.e., storing positional information about the input references) de Brui...

zenodo.org

1 3 5

Reposted by Mohsen Zakeri

Rob Patro @robp.bsky.social · Jun 23

The second keynote address at WABI '25 will be by Christina Boucher. She will talk about "Recursive Parsing and Grammar Compression in the Era of Pangenomics". PFP (& RPFP) has enabled tremendous advances in representation & indexing; this will be an exciting talk!
wabiconf.github.io/2025/talks/t...

Recursive Parsing and Grammar Compression in the Era of Pangenomics

Talk by Christina Boucher - WABI 2025

wabiconf.github.io

2 8

Reposted by Mohsen Zakeri

Mia Farrow @miafarrow.bsky.social · Jun 20

No war with Iran

50 120 930

Reposted by Mohsen Zakeri

Kuan-Hao Chao @kuanhaochao.bsky.social · Jun 17

Excited to introduce LiftOn – an open-source tool for accurate, scalable liftover of genome annotations (GFF) across assemblies. 🚀

👉 Code & community: github.com/Kuanhao-Chao...

It’s been incredibly rewarding building this for the genomics community. Can’t wait for your feedback and contributions!

8 22

Mohsen Zakeri @mohsenzakeri.bsky.social · Jun 7

Huge congrats! 🎉

1 1

Reposted by Mohsen Zakeri

Ben Langmead @benlangmead.bsky.social · May 29

Excellent work, Steven & Mohsen! See thread below

Mohsen Zakeri @mohsenzakeri.bsky.social · May 29

1/5 We introduce Movi Color, led by Steven Tan (a brilliant undergrad member of Langmead lab) for taxonomic and multi-class classification. It uses a full-text index based on the move structure and does not rely on predefined values (like k-mer length) for index building.
github.com/mohsenzakeri...

2 8

Mohsen Zakeri @mohsenzakeri.bsky.social · May 29

5/5 Processing the reads with Movi Color is as fast as Kraken 2, and 20x faster than Metabuli’s total query time. Movi Color is able to index sets of complete genomes from many species, but uses significantly more memory. The memory footprint can be reduced by using minimizer-digestion approaches.

2

Mohsen Zakeri @mohsenzakeri.bsky.social · May 29

4/5 Movi Color is 2x more accurate than Kraken 2 and Metabuli for taxonomic classification of ONT reads at the species level.

1 1

Mohsen Zakeri @mohsenzakeri.bsky.social · May 29

3/5 Movi Color classifies a read based on the colors observed during the pseudo matching lengths (PML) computation procedure.

1 2

Mohsen Zakeri @mohsenzakeri.bsky.social · May 29

2/5 Movi Color adds colors to BWT runs. Like in colored Bruijn graphs, colors are sets of documents, defined based on the origin of the suffixes in each BWT run. Each distinct color is stored once in the color table.

1 1 3

Mohsen Zakeri @mohsenzakeri.bsky.social · May 29

1/5 We introduce Movi Color, led by Steven Tan (a brilliant undergrad member of Langmead lab) for taxonomic and multi-class classification. It uses a full-text index based on the move structure and does not rely on predefined values (like k-mer length) for index building.
github.com/mohsenzakeri...

1 6 15

Reposted by Mohsen Zakeri

Vikram Shivakumar @vikramshivakumar.bsky.social · May 27

Excited to share a new update to Mumemto, scaling MUM and conserved element finding to any size pangenome! Preprint out now w/ @benlangmead.bsky.social.
Mumemto scales to the new HPRC v2 release and beyond, and can merge in future assemblies without any recomputation! 1/n

Partitioned Multi-MUM finding for scalable pangenomics

Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We prev...

www.biorxiv.org

1 15 27

Reposted by Mohsen Zakeri

Rob Patro @robp.bsky.social · May 7

The deadline for WABI 2025 has been extended (but is still rapidly approaching) wabiconf.github.io/2025/

* abstract deadline: May 12 (AoE)
* paper deadline: May 15 (AoE)

Consider submitting your exciting algorithmic bioinformatics work to the WABI conference!

WABI 2025

WABI Conference on Algorithms in Bioinformatics

wabiconf.github.io

11 10

Reposted by Mohsen Zakeri

Arun Das @arun-das.bsky.social · Apr 21

I'll also be on the job market this summer, so please reach out if you're interested!

You can find out more about me at these links:
LinkedIn: www.linkedin.com/in/arun96/
Personal Website: arundas.org

Arun Das

arundas.org

3 2

Reposted by Mohsen Zakeri

Igor Martayan @imartayan.bsky.social · Apr 27

Next up is Nathaniel Brown from @benlangmead.bsky.social's group presenting col-bwt, a new algorithm for computing chain statistics using multi-maximal unique matches.

www.biorxiv.org/content/10.1...

2 5 15

Reposted by Mohsen Zakeri

Rob Patro @robp.bsky.social · Apr 5

Hey #genomics, #bioinformatics & #algorithms peeps 💻🧬. If you haven't seen the CfP for WABI '25 yet, check out the website wabiconf.github.io/2025/. It will be held at UMD @umdscience.bsky.social with Broňa Brejová & myself as co-chairs! Submit your exciting & late-breaking algorithmic work to WABI

WABI 2025

WABI Conference on Algorithms in Bioinformatics

wabiconf.github.io

19 29

Reposted by Mohsen Zakeri

Rob Patro @robp.bsky.social · Mar 12

On Thurs, March 13 at 9AM (ET), @noorpratap.bsky.social will be defending his dissertation!

If you want to learn more about tree-based quantification & differential testing, or scATAC-seq preprocessing; tune in!

Talk link: umd.zoom.us/j/9873133564...

Abstract: talks.cs.umd.edu/talks/4137

Talks

talks.cs.umd.edu

2 18

Reposted by Mohsen Zakeri

Vikram Shivakumar @vikramshivakumar.bsky.social · Feb 26

We ran Mumemto on 474 human assemblies from @humanpangenome.bsky.social to find syntenic regions using MUMs. Mumemto scales remarkably well to large pangenomes thanks to compressed-space algos! It took under 2 days across 7 nodes (each using ~500 GB memory).

1 5 9

Reposted by Mohsen Zakeri

recombseq.bsky.social @recombseq.bsky.social · Jan 24

🚨 Keynotes at RECOMB-seq 2025! 🚨

🌟 Alicia Oshlack – computational transcriptomics
@aliciao.bsky.social

🌟 Rayan Chikhi – sequencing data structures
@rayanchikhi.bsky.social

🗓️ Dates: April 24–25, 2025
📍 Seoul, South Korea

recomb-seq.github.io/speakers/

24 31

Reposted by Mohsen Zakeri

Ben Langmead @benlangmead.bsky.social · Dec 11

Very excited to see Movi (by @mohsenzakeri.bsky.social) now out in iScience: www.cell.com/iscience/ful.... Movi builds on the "move structure" pangenome index, a compressed full-text index and close cousin to r-index. Compared to r-index, the move structure is simpler and more cache-efficient.

Movi: A fast and cache-efficient full-text pangenome index

Biocomputational method; Classification of bioinformatical subject; Genomic analysis

www.cell.com

1 17 44

Mohsen Zakeri @mohsenzakeri.bsky.social · Feb 19

4/4 Mov isi now capable of performing count query with the backward search procedure which is now implemented for the move structure. Movi is 16 times faster than r-index while using about 3 times more memory to perform the count query.

1

Mohsen Zakeri @mohsenzakeri.bsky.social · Feb 19

3/4 Prefetching uses a single thread while processing many reads concurrently. Using prefetching, the median latency observed for Movi’s inner loop is 91 ns.

1 1