Lightnews — Scholar-powered news

Roland Faure @rfaure.bsky.social · 4d

🧵6/ 6
Since MSRs sketches are sequence, they are super easy to use. I think they could be useful for many other problems, e.g. SNP calling, pangenome graphs, indexing, etc.

1

Roland Faure @rfaure.bsky.social · 4d

🧵5/6
The sketching makes assembly extremely fast: a gut metagenome sample of 138Gbp of sequencing data was assembled in less that 2h and 10G RAM on 8 threads ⚡. And thanks to MSRs, *highly similar strains are not collapsed*

1 1 1

Roland Faure @rfaure.bsky.social · 4d

🧵4/6
Two key properties that make MSRs sketches really cool:
👉 They are alignable sequences: you can just feed them in existing assembler
👉 MSR sketches can *keep all the SNPs*, i.e. two highly similar sequences are (almost) always reduced to different sketches -> useful to separate similar strains

1 1

Roland Faure @rfaure.bsky.social · 4d

🧵3/ 6
MSRs have been defined by @lblassel.bsky.social @rayanchikhi.bsky.social and @pashadag.bsky.social in pmc.ncbi.nlm.nih.gov/articles/PMC....
Take a sequence, a value of k, and stream all k-mers through a function that output either a base or the empty character, and you got your sketch

1

Roland Faure @rfaure.bsky.social · 4d

🧵2/6
Conceptually, the assembler is on the same lines as metaMDBG:
1. sketching reads
2. assembly procedure on the sketches
3. reversing to base-space to obtain the final assembly
The main difference is the sketching scheme: we introduce *Mapping-friendly Sequence Reductions (MSR) sketching*

1 1

Roland Faure @rfaure.bsky.social · 4d

Our preprint on our new metagenomic HiFi assembler Alice is out 🥳 Based on a *new sketching method* (🧵1/6)
👉 Preprint www.biorxiv.org/content/10.1...
👉 Github github.com/rolandfaure/...

Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching

We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...

www.biorxiv.org

2 13 20

Reposted by Roland Faure

Rayan Chikhi @rayanchikhi.bsky.social · Sep 3

🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...

3 120 220

Reposted by Roland Faure

Jim Shaw @jimshaw.bsky.social · Sep 7

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

bioRxiv Bioinfo @biorxiv-bioinfo.bsky.social · Sep 7

High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1

5 76 110

Roland Faure @rfaure.bsky.social · May 16

Congrats! Nice results 🎉

1

Reposted by Roland Faure

Josipa Lipovac @jlipovac.bsky.social · May 16

I am happy to share our new preprint introducing MADRe - a pipeline for Metagenomic Assembly-Driven Database Reduction, enabling accurate and computationally efficient strain-level metagenomic classification.

🔗https://www.biorxiv.org/content/10.1101/2025.05.12.653324v1
1/9

2 7 12

Reposted by Roland Faure

Camille Marchet ⚡ @camillemrcht.bsky.social · Apr 24

Starting #RECOMBseq with @rayanchikhi.bsky.social 's keynote. Here stressing our responsibility as scientists to enable access to a common good: genomic data

1 10 30

Reposted by Roland Faure

Michael Baym @baym.lol · Apr 9

Side note: you could, speaking purely theoretically, also fit every microbe onto an SD card, which is within the weight limit for a carrier pigeon. For some distances, it would be faster than the internet for transmitting sequence libraries
7/

4 11 45

Reposted by Roland Faure

Zamin Iqbal @zaminiqbal.bsky.social · Apr 9

So glad this is finally out. The method has been instrumental in allowing us to compress the AllTheBacteria data - ~2 million bacterial genomes shrink from 3Terabytes (gzipped) to 100Gb using phylogenetic compression. Great work by @brinda.eu

Michael Baym @baym.lol · Apr 9

Our latest paper, in which @brinda.eu (along with @zaminiqbal.bsky.social and others) introduces phylogenetic compression for storage and search of enormous microbial genome libraries, was published today in @naturemethods.bsky.social:

rdcu.be/eg4OA

1/

Efficient and robust search of microbial genomes via phylogenetic compression

Nature Methods - Phylogenetic compression achieves performant and lossless compression of massive collections of microbial genomes, facilitating fast BLAST-like search and versatile alignment tasks.

rdcu.be

4 51 130

Reposted by Roland Faure

Ryan Wick @rrwick.bsky.social · Mar 27

Do you (like me) create a bunch of conda environments, then later forget what they're for, when they were last updated, or which tools are in them?

If so, you might this little project: github.com/rrwick/conda...

GitHub - rrwick/condaenvlist: a simple tool for listing conda environments with descriptions

a simple tool for listing conda environments with descriptions - rrwick/condaenvlist

github.com

1 40 78

Roland Faure @rfaure.bsky.social · Mar 7

So glad to have participated in #DSB2025, what a great workshop! For some mysterious reason it was the first time I attended after 3 years of sequence research. Thanks to all participants & organizers 😃

1 2

Reposted by Roland Faure

Igor Martayan @imartayan.bsky.social · Dec 13

Ragnar's made some incredible optimizations on the computation of minimizers, can't wait to see how these improvements will benefit bioinfo tools!

Ragnar {Groot Koerkamp} @curiouscoding.nl · Dec 13

Nice result to end the day (night*):
After discussions with @imartayan.bsky.social, the SIMD minimizer code now also does proper canonical (revcomp) minimizers:

~1ns/bp for fwd minis
+0.4ns/bp with collect and dedup
+0.6ns/bp with canonical hashes.

Super happy how it's only 2x slower in the end!

2 4

Roland Faure @rfaure.bsky.social · Dec 13

Really cool work!

1

Reposted by Roland Faure

Pierre Peterlongo @pierrepeterlongo.bsky.social · Dec 4

Amazing ideas here www.biorxiv.org/content/bior... from
@yoann.bsky.social
and collaborators.

Reorganize minimizers to allow kmers dichotomic search. That's brilliant.

#bioinformatics 🧬🖥️

Toy example of the AAAAAAA bucket associated to four super-k-mer turned into their interleaved representation.

2 5 23

Roland Faure @rfaure.bsky.social · Dec 4

So glad to have successfully defended my Ph.D. last week 😀 Work on producing haplotype-resolved metagenomic assemblies using noisy long reads (HairSplitter) and high-fidelity long reads (Alice assembler, unpublished yet).
Thanks to my advisors Dominique Lavenier and Jean-François Flot ❤️

1 3 9

Roland Faure @rfaure.bsky.social · Dec 2

Congrats @firtinac.bsky.social ! I enjoyed thouroughly reading the BLEND paper 😄

1 1