Igor Martayan
@imartayan.bsky.social
740 followers 310 following 59 posts
PhD student in algorithmic bioinformatics at @bonsaiseqbioinfo.bsky.social. Interested in randomized algorithms and space-efficient data structures https://igor.martayan.org
Posts Media Videos Starter Packs
Pinned
imartayan.bsky.social
I'm glad to announce that the simd-minimizers library is out! 🧬🖥️
@curiouscoding.nl and I have been optimizing the computation of minimizers down to the smallest detail.
The result is an order of magnitude faster than existing methods ; processing an entire human genome takes only 4s on my laptop! 🧵
Reposted by Igor Martayan
bedec.bsky.social
"OpenZL is our answer to the tension between the performance of format-specific compressors and the maintenance simplicity of a single executable binary."
engineering.fb.com/2025/10/06/d...
Reposted by Igor Martayan
signal.org
We are alarmed by reports that Germany is on the verge of a catastrophic about-face, reversing its longstanding and principled opposition to the EU’s Chat Control proposal which, if passed, could spell the end of the right to privacy in Europe. signal.org/blog/pdfs/ge...
signal.org
Reposted by Igor Martayan
recombconf.bsky.social
#RECOMB2026 is now accepting submissions and we'd love to see your best work!

📌 Abstract registration: Nov 7, 2025
📌 Full paper submission: Nov 14, 2025

📜 More info: recomb.org/recomb2026/call_for_papers.html
RECOMB 2026 | CALL FOR PAPERS
Call For Papers
recomb.org
Reposted by Igor Martayan
xian-chang.bsky.social
🦒Long read giraffe is out!🦒
Mapping long reads to pangenome graphs is ~10x faster than with GraphAligner, with veeery slightly better mapping accuracy, short variant calling, and SV genotyping than GraphAligner or Minimap2
biorxiv-bioinfo.bsky.social
Rapid, accurate long- and short-read mapping to large pangenome graphs with vg Giraffe https://www.biorxiv.org/content/10.1101/2025.09.29.678807v1
Reposted by Igor Martayan
biorxiv-bioinfo.bsky.social
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching https://www.biorxiv.org/content/10.1101/2025.09.29.679204v1
Reposted by Igor Martayan
curiouscoding.nl
Looking for people to test the latest version of simd-sketch.

It's now 2x as fast at sketching, and supports skipping over kmers containing N and other ambiguous bases (which is only ~35% slower).

'cargo install simd-sketch' is right there under your fingertips ;)

github.com/RagnarGrootK...
GitHub - RagnarGrootKoerkamp/simd-sketch: Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate.
Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate. - RagnarGrootKoerkamp/simd-sketch
github.com
Reposted by Igor Martayan
ebi.embl.org
There are millions of openly available microbial genomes, but searching them can be slow.

Until now 🥁

Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.

www.ebi.ac.uk/about/news/r...
🦠
How to rapidly search the world’s microbial DNA
By making the world’s microbial DNA easier to explore, LexicMap helps researchers track outbreaks, study antibiotic resistance, and understand microbial diversity.
www.ebi.ac.uk
Reposted by Igor Martayan
recombconf.bsky.social
#RECOMB2026 will be in Thessaloniki, Greece on May 26-29, 2026. Satellites on May 24-25. Save the date!

Το συνέδριο #RECOMB2026 θα πραγματοποιηθεί στη Θεσσαλονίκη, στις 26-29 Μαΐου 2026. Οι δορυφορικές εκδηλώσεις θα διεξαχθούν στις 24-25 Μαΐου 2026. Σημειώστε την ημερομηνία!
imartayan.bsky.social
Depends on how full it is I guess, negative queries are fast when you don't need probing
Reposted by Igor Martayan
justinwolfers.bsky.social
Critical part of the President's new $100,000 charge for H1-B visas: The Administration can also offer a $100,000 discount to any person, company, or industry that it wants. Replacing rules with arbitrary discretion.

Want visas? You know who to call and who to flatter.
Reposted by Igor Martayan
curiouscoding.nl
Minimap2 is very much the hammer in
"When all you have is a hammer, everything looks like a nail."
Reposted by Igor Martayan
bedec.bsky.social
Blogged about how zstd --long fills the gap between fast and slow-but-high-ratio genome compression methods log.bede.im/2025/09/12/z...
Reposted by Igor Martayan
zaminiqbal.bsky.social
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
Reposted by Igor Martayan
jimshaw.bsky.social
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
biorxiv-bioinfo.bsky.social
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
Reposted by Igor Martayan
mikael-salson.univ-lille.fr
A wonderful day for @bonsaiseqbioinfo.bsky.social, with @camillemrcht.bsky.social's and @npmalfoy.bsky.social's HDR defenses.
Congrats for the amazing work!

It's a great chance for the team to have you both!
Reposted by Igor Martayan
jermp.bsky.social
We are glad to announce that the next workshop “Data Structures in Bioinformatics” (DSB 2026) will take place in Venice, Italy, on *February 18-19*, 2026. dsb-meeting.github.io/DSB2026/ Book the dates! #DSB26
DSB 2026 Venice - February 18-19
Workshop Data Structures in Bioinformatics
dsb-meeting.github.io
Reposted by Igor Martayan
bedec.bsky.social
📣 Deacon 0.8.0 available on Bioconda
- Much faster search and depletion through improved work distribution on multicore systems. My fastq.gz benchmark now runs at 400Mbp/s on Apple M1.
- Dual default match thresholds for greater accuracy

Details: github.com/bede/deacon/...

1k downloads! 🐥
Release 0.8.0 · bede/deacon
Faster filtering on multicore systems through improved work allocation using the Paraseq library (@noamteyssier). Filtering at >1Gbp/s is possible with uncompressed long sequences, and >500Mbp/s is...
github.com
Reposted by Igor Martayan
curiouscoding.nl
So, the heap I invented over the weekend was introduced as the quickheap by Navarro and Paredes around 2006!
Basically: on each pop, do just enough quicksort to find the smallest element.

My implementation (the 1st??) is 2x to 4x faster than d-ary and binary heaps.

curiouscoding.nl/posts/quickh...
A grid of plots compaing the performance of a binary heap, 8-ary heap, 4-ary heap, quickheap, and radix heap, all taken from various rust crates.
The top 4 plots are for u32 values, the bottom 4 for u64 values.
Each column is a different type of input. The left is heapsort: insert n times then pop n times. The second column does groups of push followed by 4x (pop, push), and then the reverse. The last two columns have linearly/randomly increasing data.

On the first two data sets, the quickheap is by far the best, showing my less degradation of cache misses than the d-ary heaps.
Reposted by Igor Martayan
jltsiren.bsky.social
There was a workshop on 25 years of the FM-index and the CSA after SEA. I would have liked to attend, but I had other commitments. The invited speakers were Giovanni Manzini and Roberto Grossi, as the other purpose of the workshop was to present them Festschrifts for their 60th birthdays. 1/6
SEA 2025
regindex.github.io
Reposted by Igor Martayan
curiouscoding.nl
Little writeup on the speed of fasta parsers, at last.

Basically: both needletail and paraseq are process input linearly, and thus have a limit around 4 GB/s.

By giving each thread its own slice of the input file, we're limited by RAM bandwidth instead :)

curiouscoding.nl/posts/fasta-...
With 3 threads, the middle thread processes the reads starting in the middle third of the fasta file.
Reposted by Igor Martayan
robp.bsky.social
🧬🖥️ I am strongly of the opinion that bioinformatics needs to move away entirely from text-based and "loosely" structured file formats for essentially any type of data. File formats should be binary-first, and designed for *correct* and *efficient* machine parsing 1/3
imartayan.bsky.social
Congrats! Is it publicly available?
Reposted by Igor Martayan
robp.bsky.social
I talk a lot about Rust for building high-perf (& even non-perf critical) software, & scientific software in particular. I often discuss what's interesting to me, but wanted to offer the chance to those interested for me to answer their questions about Rust in science. Fire away with questions!🧬🖥️