Andre Kahles
@akkah21.bsky.social
27 followers 23 following 11 posts
Posts Media Videos Starter Packs
Reposted by Andre Kahles
robp.bsky.social
Hi bioinformatics, genomics and CS friends! Please help me spread the word. I'm hiring a postdoc! Come work on cutting edge method development in algorithmic genomics with me and my group at @umdscience.bsky.social! 🖥️🧬
robp.bsky.social
And it's posted! If you're interested and eligible, please consider applying through the UMD portal: umd.wd1.myworkdayjobs.com/en-US/UMCP/j....

If you're a PI working in algorithmic genomics (& you can recommend my lab to your top graduating students ;P), please let them know!
Thanks Rob! Much appreciated.
We invite you to try out Metagraph at metagraph.ethz.ch, learn more about our framework in the paper (nature.com/articles/s41...) or start building your own indexes from your own data (github.com/ratschlab/me...).
MetaGraph - Biological Sequence Search
Petabase-Scale Search for DNA, RNA & Amino acids
metagraph.ethz.ch
We would like to thank the bioinformatics community for years of support and openness. A special thanks to the Logan effort, whose contig set we use as input for one of our largest indexes.
While MetaGraph provides a lossless representation of the input k-mer set, it is not a lossless compression of the raw reads. To reach petabase scale, we remove noisy k-mers prior to indexing — a step that we show has only minimal impact on search sensitivity.
We show that MetaGraph indexes are both scalable and cost-efficient for querying. We Searching 1 Mbp of sequence against the entire SRA costs less than $1 on standard cloud infrastructure — making Petabase-scale biological data truly searchable and accessible.
Our indexes support fast exact matching as well as alignment with edits. Labels can represent sample metadata, coordinates or quantification values. We can store 10’000 human transcriptome samples in < 160 GB and return position-wise expression for any queried sequence.
We have already processed more than 10 Petabases of raw sequence data from the SRA and make the compressed indexes publicly available for search (metagraph.ethz.ch), download and cloud-based access.
At its core, MetaGraph represents all input sequences as labeled, succinct de Bruijn graphs — a highly compressed yet fully searchable structure. Each k-mer carries metadata labels that remain interactively queryable through a flexible API.
Modern biology produces vast amounts of raw sequencing data — genomes, transcriptomes, and protein sequences. MetaGraph provides a unified computational framework to index, query, and reason across this landscape of biological information.
The following thread describes the main ideas and results of this joint work with @gxxxr.bsky.social @karasikov.bsky.social @adamant-pwn.bsky.social @HarunMustafa416
After years of research and continuous refinement, we’re thrilled to share that our paper on the MetaGraph framework — enabling Petabase-scale search across sequencing data — has been published today in Nature (www.nature.com/articles/s41...)
Efficient and accurate search in petabase-scale sequence repositories - Nature
MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.
www.nature.com