Roland Faure
rfaure.bsky.social
Roland Faure
@rfaure.bsky.social
Sequence bioinfomatician, algorithms, methods.
Postdoc in Institut Pasteur in Rayan Chikhi's lab
Sorry for the first figure, got a problem of background, here it is:
December 4, 2025 at 2:07 PM
Coming up with a name was pretty hard, we had a lot of good candidates. We settled on SNooPy, which is a reference to SNPs and to the fact that the tool is 100% python. We thought about metaSNooPy but this went too far 😅. The github: github.com/RolandFaure/SNooPy
GitHub - RolandFaure/SNooPy: metagenomic SNP caller
metagenomic SNP caller. Contribute to RolandFaure/SNooPy development by creating an account on GitHub.
github.com
December 4, 2025 at 1:18 PM
Tests show that:
1/ SNooPy has the best recall in our tests
2/ Using genomic long-read SNP callers does not (always) work well: most tools have very low recall, but DeepVariant perform much better than other tested methods
3/ The recall of all tools is still far from 100%
December 4, 2025 at 1:18 PM
We propose a new statistical framework. The idea to distinguish artefacts from SNPs is to look at several loci simultaneously: artefacts will occur on random reads, while SNPs will occur systematically on the reads that come from the same strain.
December 4, 2025 at 1:18 PM
Existing long-read SNP callers (DeepVariant, longshot...) have been developed for diploid genomes. Deep-learning methods are trained on [human] genomic data. Statistical methods contain assumption that do not hold for metagenomics.
December 4, 2025 at 1:18 PM
🧵6/ 6
Since MSRs sketches are sequence, they are super easy to use. I think they could be useful for many other problems, e.g. SNP calling, pangenome graphs, indexing, etc.
October 3, 2025 at 2:51 PM
🧵5/6
The sketching makes assembly extremely fast: a gut metagenome sample of 138Gbp of sequencing data was assembled in less that 2h and 10G RAM on 8 threads ⚡. And thanks to MSRs, *highly similar strains are not collapsed*
October 3, 2025 at 2:51 PM
🧵4/6
Two key properties that make MSRs sketches really cool:
👉 They are alignable sequences: you can just feed them in existing assembler
👉 MSR sketches can *keep all the SNPs*, i.e. two highly similar sequences are (almost) always reduced to different sketches -> useful to separate similar strains
October 3, 2025 at 2:51 PM
🧵3/ 6
MSRs have been defined by @lblassel.bsky.social @rayanchikhi.bsky.social and @pashadag.bsky.social in pmc.ncbi.nlm.nih.gov/articles/PMC....
Take a sequence, a value of k, and stream all k-mers through a function that output either a base or the empty character, and you got your sketch
October 3, 2025 at 2:51 PM
🧵2/6
Conceptually, the assembler is on the same lines as metaMDBG:
1. sketching reads
2. assembly procedure on the sketches
3. reversing to base-space to obtain the final assembly
The main difference is the sketching scheme: we introduce *Mapping-friendly Sequence Reductions (MSR) sketching*
October 3, 2025 at 2:51 PM
Congrats! Nice results 🎉
May 16, 2025 at 2:08 PM
Reposted by Roland Faure
Side note: you could, speaking purely theoretically, also fit every microbe onto an SD card, which is within the weight limit for a carrier pigeon. For some distances, it would be faster than the internet for transmitting sequence libraries
7/
April 9, 2025 at 9:10 PM