Wei Shen 沈 伟
@shenwei356.bsky.social
2.3K followers 940 following 27 posts
Associate professor of Bioinformatics at Chongqing Medical University, China. Lab: https://mbio.info Personal: http://shenwei.me/ http://shenwei356.bsky.social
Posts Media Videos Starter Packs
Pinned
shenwei356.bsky.social
I sincerely appreciate the opportunity to visit @ebi.embl.org (thanks to the @embl.org Sabbatical fellowship). The guidance and support I received from Zam (@zaminiqbal.bsky.social), John (@bacpop.org) and other colleagues have been immensely valuable! You changed my career!❤️
zaminiqbal.bsky.social
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
Reposted by Wei Shen 沈 伟
ebi.embl.org
There are millions of openly available microbial genomes, but searching them can be slow.

Until now 🥁

Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.

www.ebi.ac.uk/about/news/r...
🦠
How to rapidly search the world’s microbial DNA
By making the world’s microbial DNA easier to explore, LexicMap helps researchers track outbreaks, study antibiotic resistance, and understand microbial diversity.
www.ebi.ac.uk
Reposted by Wei Shen 沈 伟
Thank you folks for your feedback on our survey about Hash functions in genomic sequence analysis. We've updated the paper and you can see the new version here: tinyurl.com/4kk9ccmt.
Reposted by Wei Shen 沈 伟
zaminiqbal.bsky.social
Delighted to see our paper studying the evolution of plasmids over the last 100 years, now out! Years of work by Adrian Cazares, also Nick Thomson @sangerinstitute.bsky.social - this version much improved over the preprint. Final version should be open access, apols.
Thread 1/n
shenwei356.bsky.social
I think it's because there are only a few bioinformatics packages to use. Most people don't want to reinvent wheels like me 😅
shenwei356.bsky.social
I sincerely appreciate the opportunity to visit @ebi.embl.org (thanks to the @embl.org Sabbatical fellowship). The guidance and support I received from Zam (@zaminiqbal.bsky.social), John (@bacpop.org) and other colleagues have been immensely valuable! You changed my career!❤️
zaminiqbal.bsky.social
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
Reposted by Wei Shen 沈 伟
robp.bsky.social
Hashing vs. sorting; interesting! reiner.org/hashed-sorting. Also I wonder if, depending on your use case, semi-sorting provides an even greater benefit? 🧬🖥️
Hashed sorting is typically faster than hash tables
Benchmarks and theoretical explanation of why and when hashed radix sort beats hash tables.
reiner.org
shenwei356.bsky.social
Amazing Jim!
jimshaw.bsky.social
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
biorxiv-bioinfo.bsky.social
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
Reposted by Wei Shen 沈 伟
rayanchikhi.bsky.social
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...
shenwei356.bsky.social
So amazed at how productive you were in 4 or 5 years, with so much great work!
Reposted by Wei Shen 沈 伟
curiouscoding.nl
Finally printed and submitted my thesis :)

You may call me Dr. now 🎓

curiouscoding.nl/thesis.pdf
Picture of my printed thesis: Optimal Throughput Bioinformatics

With a cover image depicting increasingly more efficient pairwise alignment algorithms.
Reposted by Wei Shen 沈 伟
biocs.bsky.social
We're very happy to release our new database Metalog metalog.embl.de ! It offers manually curated and harmonised contextual data for 110k metagenomics samples across the globe, incl. precomputed taxonomic profiles, for interactive browsing and for download 🧵 1/7

#microsky
Metalog
Metalog is a repository of manually annotated metadata (or contextual data) for metagenomic sequencing data from across the globe.
metalog.embl.de
Reposted by Wei Shen 沈 伟
zaminiqbal.bsky.social
Delighted to see this paper from danderson123.bsky.social 's PhD out. We have been building tools for AMR gene detection for over a decade now, but multicopy genes remain challenging. Dan shows that with a gene-space de Bruijn graph and long reads, you can do well
www.biorxiv.org/content/10.1...
shenwei356.bsky.social
Ah! Do you mean that there would be a large number of seed matches, which bring too many candidates to check.

LexicMap returns anchors from 15-31 bp, so it can have a high specificity with fewer candidates to match when choosing the top chains. But by default, it checks all chains for sensitivity.
shenwei356.bsky.social
Besides, Minimap2 is mainly designed for a few ref genomes with higher throughput. While LexicMap emphasizes scalability (million-scale ref genomes) and it's faster with a few queries. We've added Minimap2 and other tools in our benchmarks, which will be public in a few weeks.
shenwei356.bsky.social
Sorry, I didn't get the point of "avoid having too many 19-mers".

The biggest difference is that LexicMap's seeds support variable-length (prefix or suffix) matching, rather than exact match in tools such as Minimap2. But Minimap is more tolerant of mutations than LexicMap as it has denser seeds.
shenwei356.bsky.social
🚀 ​LexicMap v0.6.0 is released!
✅ ​More accurate alignments!
🎯 ​Higher sensitivity for short queries (>100bp)!
💡 ​Denser seeds, same index size!

🔬 Function: Efficient sequence alignment against millions of prokaryotic genomes!
📖 Docs: bioinf.shenwei.me/LexicMap/
Release LexicMap v0.6.0 · shenwei356/LexicMap
v0.6.0 - 2025-03-25 This version is compatible with indexes created by previous versions (requires a one-time, automatic preprocessing), but rebuilding the index is recommended for more accurate re...
github.com
shenwei356.bsky.social
Thrilled to finally debut my lab site🎉: mbio.info — which stands for Microbial Bioinformatics, a domain I purchased 9 years ago and has been waiting for this moment ever since!"