Ragnar {Groot Koerkamp}
@curiouscoding.nl
910 followers 100 following 960 posts
PhD on high troughput bioinformatics @ ETH Zurich; IMO, ICPC, Xoogler, Rust, road-cycling, hiking, wild camping, photography
Posts Media Videos Starter Packs
Reposted by Ragnar {Groot Koerkamp}
robp.bsky.social
Have you recently completed (or finishing soon) a PhD in CS or a related discipline? Do you want to do research advancing the theory & practice of algorithmic genomics & build tools that people love to use? I'll be looking to hire a postdoc! Official ad coming soon:
docs.google.com/document/d/1...
Postdoc Description.docx
Title: Postdoctoral Associate Summary statement: The postdoctoral research associate is responsible for developing novel computational methodology for high-throughput sequence genomics tasks, as well ...
docs.google.com
Reposted by Ragnar {Groot Koerkamp}
wheelerlab.org
Dream postdoc - Rob's science is excellent (and he's pretty great, too)
robp.bsky.social
And it's posted! If you're interested and eligible, please consider applying through the UMD portal: umd.wd1.myworkdayjobs.com/en-US/UMCP/j....

If you're a PI working in algorithmic genomics (& you can recommend my lab to your top graduating students ;P), please let them know!
curiouscoding.nl
Bulk operations / batching is the future of high throughput libraries!

Nice to see boost doing this :)
boost.org
Learn about the implementation of high-performance bulk operations in Boost.Bloom:

bannalia.blogspot.com/2025/10/bulk-operations-in-boostbloom.html
boost.org/libs/bloom
Reposted by Ragnar {Groot Koerkamp}
manoelhortaribeiro.bsky.social
Computer Science is no longer just about building systems or proving theorems--it's about observation and experiments.

In my latest blog post, I argue it’s time we had our own "Econometrics," a discipline devoted to empirical rigor.

doomscrollingbabel.manoel.xyz/p/the-missin...
Reposted by Ragnar {Groot Koerkamp}
bedec.bsky.social
Deacon 0.11.0:
- Local server mode
- Ultra-careful handling of non-ACGT
- Faster indexing & index loading
- Denser index now stores k-mers not hashes
- xxHash & FxHash replaced with rapidhash::fast
- Bug fixes

Thanks @curiouscoding.nl (and others!) for contributions
github.com/bede/deacon/...
Release 0.11.0 · bede/deacon
Major release incorporating new features, fixes and peformance optimisations. Includes many PRs from @RagnarGrootKoerkamp, taking advantage of new features in simd-minimizers, packed-seq and parase...
github.com
Reposted by Ragnar {Groot Koerkamp}
benlangmead.bsky.social
I've added 7 videos to my Burrows-Wheeler indexing playlist (www.youtube.com/playlist?lis...), rounding out the r-index series and adding a 5-part series on the move structure. Now 27 videos in that playlist. I aim to add videos on prefix-free parsing, PBWT, Wheeler languages/automata in the future.
Burrows-Wheeler Indexing - YouTube
Videos on : (a) the Burrows-Wheeler Transform (BWT), (b) the FM Index, which uses the BWT to construct a full-text index, (c) Wheeler graphs, (d) r-index, an...
www.youtube.com
curiouscoding.nl
Any recommendations for a github CI plugin for performance monitoring?

Thinking of using github-actions-bench [0] which makes decent plots, in combination with maybe iai [1] for measuring CPU cycles (wall-time is too flaky in GH actions).

0: github.com/benchmark-ac...
1: github.com/bheisler/iai
GitHub - benchmark-action/github-action-benchmark: GitHub Action for continuous benchmarking to keep performance
GitHub Action for continuous benchmarking to keep performance - benchmark-action/github-action-benchmark
github.com
Reposted by Ragnar {Groot Koerkamp}
bedec.bsky.social
"OpenZL is our answer to the tension between the performance of format-specific compressors and the maintenance simplicity of a single executable binary."
engineering.fb.com/2025/10/06/d...
Reposted by Ragnar {Groot Koerkamp}
holtjma.bsky.social
I'm excited to share our pre-print about a new variant benchmarking tool we've been working on for the past few months!

Aardvark: Sifting through differences in a mound of variants
GitHub: github.com/PacificBiosc...

Some highlights in this thread:
1/N
curiouscoding.nl
that's me :)

The only way unimportant tasks get done is by always working from non-important/non-urgent to urgent.
Then once the deadline of the most urgent thing passes, the list is empty!

news.ycombinator.com/item?id=4548...
Structured Procrastination (1995) | Hacker News
news.ycombinator.com
curiouscoding.nl
Splitting into crates is usually better for compile times, but if you need code to be inlined across crates and then do full-lto on *all* crates instead of just the 3 that were split, yes it's going to be worse.
Reposted by Ragnar {Groot Koerkamp}
camillemrcht.bsky.social
French govt did cargo +nightly build --release -Z unstable-options
Reposted by Ragnar {Groot Koerkamp}
shubhendu.bsky.social
¹'²'³'⁴ contributed equally, †‡§¶ equal advising, ¤▪︎☆♡♧ joint second authors. Each order determined by coin flip with ⌈log⁡2(n!)⌉ repetitions for fairness.
curiouscoding.nl
I was considering to have a look at optimizing full-lto, but then I realized this is part of LLVM, not wild 🫤
curiouscoding.nl
Thin-lto is actually very good!
But I just want max perf in my benchmarks and it needs a lot of inlining now that pieces of code are split over 3 crates.
curiouscoding.nl
So thin-lto is like the multithreaded version of full lto. But why can't we have multithreaded full-lto?

Or like, multithreaded codegen units?

If the code is large, surely you can just lock&optimize a small part of it?

(Anyway, the [inline(always)] will continue until thin-lto improves.)
Reposted by Ragnar {Groot Koerkamp}
sunshowers.io
New blog post: Cancelling async #rustlang!

This is a written version of my talk at #RustConf 2025, where I talk about the joys and sorrows of future cancellations in Rust, with lessons from our work at @oxide.computer. Includes a video of the talk as well. Check it out!
Cancelling async Rust ꞏ sunshowers
Correctness in the face of cancellations: a written version of my talk at RustConf 2025.
sunshowers.io
curiouscoding.nl
Not sure if 'the small square also goes in the big square hole' joke, or deep mathematical result.
lucretiel.me
TIL that it’s possible to fill a square with a bunch of smaller squares of all unique sizes, something I would have bet solid money was impossible.
curiouscoding.nl
8.2GiB RAM is ~2.05GiB/thread = 2.2GB/thread.
2.9Gbp genome takes 2.9e9 * 3bit = 1.09 GB, or 2.18GB diploid, so surely that's what you're using here :)

Love it when the math works out so exactly :)
curiouscoding.nl
Yeah; using a `.fa` human genome gives me roughly 2s reading the input and 2s sketching it.
Using `.fa.gz` instead adds another 5s on top of that :/
curiouscoding.nl
I wonder how much time is spent on decompresion here; that might well be the bottleneck.

I get nearly 1GB/s sketching locally, so that's 3s/genome or 200*3s=600s=10min total on a single thread, or only 2.5min on 4 threads.
curiouscoding.nl
Simd-sketch perf should be independent of k.
10k 8-bit buckets is mostly equivalent to a 10k bottom-sketch, so you're getting some more accuracy here.

But since you only have 208 genomes, the 200^2 comparisons will take less than a second anyway.