Pierre Peterlongo
@pierrepeterlongo.bsky.social
350 followers 84 following 38 posts
Inria Senior researcher. Head of the https://team.inria.fr/genscale/ at Inria and Irisa. Algorithmics for sequencing data analyses, genomics and metagenomics.
Posts Media Videos Starter Packs
Pinned
pierrepeterlongo.bsky.social
❗ I clearly consider this result as THE most important result achieved over this last decade for exploiting and democratizing genomic data.
I think there will be a "before" and an "after" logan and logan-search
github.com/IndexThePlan...
logan-search.org
Have a look at this thread
rayanchikhi.bsky.social
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...
Reposted by Pierre Peterlongo
jimshaw.bsky.social
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
biorxiv-bioinfo.bsky.social
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
pierrepeterlongo.bsky.social
❗ I clearly consider this result as THE most important result achieved over this last decade for exploiting and democratizing genomic data.
I think there will be a "before" and an "after" logan and logan-search
github.com/IndexThePlan...
logan-search.org
Have a look at this thread
rayanchikhi.bsky.social
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...
pierrepeterlongo.bsky.social
🤝 Amazing collaboration with @jermp.bsky.social, @yhhshb.bsky.social, @robp.bsky.social, Victor Levallois, and Bertrand Le Gal, and the help of ‪@yoann.bsky.social‬. 8/8
pierrepeterlongo.bsky.social
🌊 On metagenomic data, other tools such as kmindex are good alternatives. At the same time, Kaminari consistently ranks as one of the fastest tools across all data types, generating the smallest indexes (or the lower FPR). 7/8
pierrepeterlongo.bsky.social
💾 For fixed False Positive rates, it uses up to 37x less space than COBS while being an order of magnitude faster to build and query. 6/8
pierrepeterlongo.bsky.social
📊 Experimental results show Kaminari's superiority in index size and query performance across various genomic datasets. 5/8
pierrepeterlongo.bsky.social
🧬 Kaminari's design leverages properties of k-mer minimizers for compact space and fast query time, as inspired by the techniques proposed in Fulgor. 4/8
pierrepeterlongo.bsky.social
🔍 Key findings include:
- Use of minimizers and integer compression for indexing.
- Lower memory footprint and faster query times.
- Minimal impact of false positives on result ranking, using the Rank-Biased Overlap (RBO) metric.
2/8
pierrepeterlongo.bsky.social
📜 Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 🖥️ 1/8
pierrepeterlongo.bsky.social
Thanks guys for your precious feedback. I modified the code accordingly.
pierrepeterlongo.bsky.social
That's correct.
I just created this github.com/pierrepeterl... This is yet a new hll kmer counter, but hyper simple. And I did not find a way to accumulate the kmer counts for several input datasets.
GitHub - pierrepeterlongo/hyperloglog_kmer_counter
Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.
github.com
pierrepeterlongo.bsky.social
I added the notion of insertion order (mentioning your name). However, I don't get the point of the mergeability issue.
pierrepeterlongo.bsky.social
Note that the "conservative update" is also something we implemented (without describing it) in fimpera github.com/lrobidou/fim...
pierrepeterlongo.bsky.social
Thanks again for this pointer @benlangmead.bsky.social. What I described is the same idea, adapted when items are added on the fly, without their final abundance.
The technique in the "conservative update" is adapted when items are added simultaneously with their abundance.
pierrepeterlongo.bsky.social
HO! amazing results. The difference between you and a rust beginner.
You'll try to understand your code.
pierrepeterlongo.bsky.social
Results: slightly longer insertion time, but 2 to 3 times lower abundance overestimations.
pierrepeterlongo.bsky.social
In two words: increase only minimal stored values of a cBF when adding elements to this filter.
pierrepeterlongo.bsky.social
Maybe the simplest idea to decrease overestimations of a counting bloom filter. A trivial observation + 10 lines of code.
I'm surprised it has not been described before. Please comment if this is not the case.
Blog post here:
pierrepeterlongo.github.io/2025/03/17/m... 🧪🧬🖥️
pierrepeterlongo.bsky.social
Yes ntCard helps a lot and its precision is impressive on reads. Indeed I wanted exact number on genome.
pierrepeterlongo.bsky.social
I wanted something that used as little memory as possible. I don't want to count kmers, but only know the number of unique kmers. So jellyfish, KMC, ... are too advanced for this simple task.
pierrepeterlongo.bsky.social
Today I wanted to know the number of unique 27-mers in the hg38 human genome (spoiler there are 2.49 billion). I found no tool for doing this. So I wrote that github.com/pierrepeterl...

It may help.
Please use it / improve it.

🧬💻 #bioinformatics
GitHub - pierrepeterlongo/unique_kmer_counter: Count number of unique kmers from fasta or fasta.gz files
Count number of unique kmers from fasta or fasta.gz files - pierrepeterlongo/unique_kmer_counter
github.com