Roland Faure
@rfaure.bsky.social
54 followers 130 following 11 posts
Sequence bioinfomatician, algorithms, methods. Postdoc in Institut Pasteur in Rayan Chikhi's lab
Posts Media Videos Starter Packs
rfaure.bsky.social
🧵6/ 6
Since MSRs sketches are sequence, they are super easy to use. I think they could be useful for many other problems, e.g. SNP calling, pangenome graphs, indexing, etc.
rfaure.bsky.social
🧵5/6
The sketching makes assembly extremely fast: a gut metagenome sample of 138Gbp of sequencing data was assembled in less that 2h and 10G RAM on 8 threads ⚡. And thanks to MSRs, *highly similar strains are not collapsed*
rfaure.bsky.social
🧵4/6
Two key properties that make MSRs sketches really cool:
👉 They are alignable sequences: you can just feed them in existing assembler
👉 MSR sketches can *keep all the SNPs*, i.e. two highly similar sequences are (almost) always reduced to different sketches -> useful to separate similar strains
rfaure.bsky.social
🧵3/ 6
MSRs have been defined by @lblassel.bsky.social @rayanchikhi.bsky.social and @pashadag.bsky.social in pmc.ncbi.nlm.nih.gov/articles/PMC....
Take a sequence, a value of k, and stream all k-mers through a function that output either a base or the empty character, and you got your sketch
rfaure.bsky.social
🧵2/6
Conceptually, the assembler is on the same lines as metaMDBG:
1. sketching reads
2. assembly procedure on the sketches
3. reversing to base-space to obtain the final assembly
The main difference is the sketching scheme: we introduce *Mapping-friendly Sequence Reductions (MSR) sketching*
Reposted by Roland Faure
rayanchikhi.bsky.social
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...
Reposted by Roland Faure
jimshaw.bsky.social
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
biorxiv-bioinfo.bsky.social
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
rfaure.bsky.social
Congrats! Nice results 🎉
Reposted by Roland Faure
jlipovac.bsky.social
I am happy to share our new preprint introducing MADRe - a pipeline for Metagenomic Assembly-Driven Database Reduction, enabling accurate and computationally efficient strain-level metagenomic classification.

🔗https://www.biorxiv.org/content/10.1101/2025.05.12.653324v1
1/9
Reposted by Roland Faure
camillemrcht.bsky.social
Starting #RECOMBseq with @rayanchikhi.bsky.social 's keynote. Here stressing our responsibility as scientists to enable access to a common good: genomic data
Reposted by Roland Faure
baym.lol
Side note: you could, speaking purely theoretically, also fit every microbe onto an SD card, which is within the weight limit for a carrier pigeon. For some distances, it would be faster than the internet for transmitting sequence libraries
7/
Reposted by Roland Faure
zaminiqbal.bsky.social
So glad this is finally out. The method has been instrumental in allowing us to compress the AllTheBacteria data - ~2 million bacterial genomes shrink from 3Terabytes (gzipped) to 100Gb using phylogenetic compression. Great work by @brinda.eu
Reposted by Roland Faure
rrwick.bsky.social
Do you (like me) create a bunch of conda environments, then later forget what they're for, when they were last updated, or which tools are in them?

If so, you might this little project: github.com/rrwick/conda...
GitHub - rrwick/condaenvlist: a simple tool for listing conda environments with descriptions
a simple tool for listing conda environments with descriptions - rrwick/condaenvlist
github.com
rfaure.bsky.social
So glad to have participated in #DSB2025, what a great workshop! For some mysterious reason it was the first time I attended after 3 years of sequence research. Thanks to all participants & organizers 😃
Reposted by Roland Faure
imartayan.bsky.social
Ragnar's made some incredible optimizations on the computation of minimizers, can't wait to see how these improvements will benefit bioinfo tools!
curiouscoding.nl
Nice result to end the day (night*):
After discussions with @imartayan.bsky.social, the SIMD minimizer code now also does proper canonical (revcomp) minimizers:

~1ns/bp for fwd minis
+0.4ns/bp with collect and dedup
+0.6ns/bp with canonical hashes.

Super happy how it's only 2x slower in the end!
rfaure.bsky.social
Really cool work!
Reposted by Roland Faure
pierrepeterlongo.bsky.social
Amazing ideas here www.biorxiv.org/content/bior... from
@yoann.bsky.social
and collaborators.

Reorganize minimizers to allow kmers dichotomic search. That's brilliant.

#bioinformatics 🧬🖥️
Toy example of the AAAAAAA bucket associated to four super-k-mer turned into their interleaved representation.
rfaure.bsky.social
So glad to have successfully defended my Ph.D. last week 😀 Work on producing haplotype-resolved metagenomic assemblies using noisy long reads (HairSplitter) and high-fidelity long reads (Alice assembler, unpublished yet).
Thanks to my advisors Dominique Lavenier and Jean-François Flot ❤️
rfaure.bsky.social
Congrats @firtinac.bsky.social ! I enjoyed thouroughly reading the BLEND paper 😄