Kristoffer Sahlin
@ksahlin.bsky.social
750 followers 150 following 23 posts
Assistant Professor at the Department of Mathematics, Stockholm University, and a Scilifelab Fellow. Algorithms, Modeling, Transcriptomics, Genomics. Amateur runner 5000m 18:48 | 10k 37:40 | HM 1:28:34 | M 3:39:06
Posts Media Videos Starter Packs
Reposted by Kristoffer Sahlin
Thank you folks for your feedback on our survey about Hash functions in genomic sequence analysis. We've updated the paper and you can see the new version here: tinyurl.com/4kk9ccmt.
Reposted by Kristoffer Sahlin
jimshaw.bsky.social
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
biorxiv-bioinfo.bsky.social
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
Reposted by Kristoffer Sahlin
pasteur.fr
Congratulations to Rayan Chiki, (Institut Pasteur) head of the “Sequence Bioinformatics” unit, for securing the ERC Proof of Concept 2025 for his project ENZYMINER! 👏

‪@rayan.chiki.bsky.social

#Bioinformatics
Reposted by Kristoffer Sahlin
hitseq.bsky.social
We have officially started #HitSeq track @hitseq.bsky.social at #ISMBECCB2025. Francisco de la Vega, introduces our first #keynote speaker Valentina Boeva @valboeva.bsky.social with her talk: "Learning variant effects on chromatin accessibility and 3D structure without matched Hi-C data"
Reposted by Kristoffer Sahlin
hitseq.bsky.social
Meet our amazing sponsor PacBio @pacbio.bsky.social for @hitseq.bsky.social track at #ISMBECCB2025 represented by Elizabeth Tseng with her talk "Bioinformatics analysis for long-read RNA sequencing: challenges and promises" #hitseq #iscb #sequencing #application #iverpool #uk
Reposted by Kristoffer Sahlin
longtrec.bsky.social
Dont miss any of our #LongTREC communications at #ISMBECCB2025. Download this flyer to make catching all the latest & hottest long-read transcriptomics research simple.

@anaconesa.bsky.social
Reposted by Kristoffer Sahlin
anaconesa.bsky.social
@hitseq.bsky.social is kicking off with our first keynote @valboeva.bsky.social talking about "Learning variant effects on chromatin accessibility and 3D structure without matched Hi-C data". #ISMBECCB2025
Reposted by Kristoffer Sahlin
longtrec.bsky.social
📽️ Next in the LongTREC Series: Mahmud Sami Aydin!
Sami is a Doctoral Candidate at @stockholm-uni.bsky.social , working under the supervision of @ksahlin.bsky.social .In this video, Sami shares his research and his role in the broader LongTREC collaboration across Europe.
#AlgorithmDevelopment
Reposted by Kristoffer Sahlin
npmalfoy.bsky.social
Paper alert!
We present Oreo a tools that reorder long reads datasets in a way to compress them efficiently with ANY universal compressor like gz, zstd, xz ...
TLDR: You can get state of the art compression WITHOUT a dedicated compressor/decompressor!
academic.oup.com/bioinformati...
A thread!
OReO: optimizing read order for practical compression
AbstractMotivation. Recent advances in high-throughput and third-generation sequencing technologies have created significant challenges in storing and mana
academic.oup.com
ksahlin.bsky.social
I worked with Thomas during a three months research visit during his PhD, and it resulted in a paper in NAR. I highly recommend him. doi.org/10.1093/nar/...
Reposted by Kristoffer Sahlin
camillemrcht.bsky.social
Thomas Baudeau defended his thesis on Studying the properties of viral long reads mapping methods - congrats docteur Baudeau you'll be deeply missed in the team. I'm very glad I got the chance to work with you. Thomas is also on the lookout for a postdoc 👀
Reposted by Kristoffer Sahlin
🧵1/n
Estimating mutation rates using k-mers is fast—but what happens when repeats dominate the genome?

In a new preprint, Haonan Wu, Antonio Blanca, and myself propose a *repeat-aware* estimator that's accurate even in centromeres.
biorxiv-bioinfo.bsky.social
A k-mer-based estimator of the substitution rate between repetitive sequences https://www.biorxiv.org/content/10.1101/2025.06.19.660607v1
Reposted by Kristoffer Sahlin
camillemrcht.bsky.social
Hey yeast lovers. Do you like pangenomes?
O'Donnel et al. 2023 produced T2T assemblies of different strains, including phased haplotypes for yeast.

Here I selected 10 phased haplotypes and the S288C reference,
and looked for the MST28 / YAR033W gene reported to contain SVs such as indels.

👇🏻👇🏻
ksahlin.bsky.social
IMO it matters a lot as a 'first impression'
ksahlin.bsky.social
I did only very minor impl. contributions, but from my (non-expert) view, I like that (1) it installs easily (also on a MacBook) and (2) no header files. Felt much easier to get started with than, e.g., C++. I never truly learned good .h/.cpp practices, and I could never get OpenMP/g++ working well
ksahlin.bsky.social
As for results, isONclust3 handles a 37M reads PacBio dataset from a revio machine in under 10h while other algorithms fail (>256Gb mem or >120h runtime). On the other datasets, isONclust3 has comparable or better accuracy than the other benchmarked tools.
ksahlin.bsky.social
The algorithm follows isONclust's algorithm in the general structure (greedy minimizer matching) but adds three key concepts: high confidence minimizers, on-the-fly cluster information update, and iterative (post-)cluster merging.
ksahlin.bsky.social
The motivation to develop this algorithm came from the inability of other algorithms to process recent large datasets (10-100M reads) from Revio or PromethION machines.
Reposted by Kristoffer Sahlin
camillemrcht.bsky.social
@tolyan.bsky.social is our very last speaker, on randstrobes ( high sensitivity seeds ) and their evolution the multi context seeds
ksahlin.bsky.social
Oh the good ol’ carnac/isonclust(1) times :)
Reposted by Kristoffer Sahlin
camillemrcht.bsky.social
2 in a row for @ksahlin.bsky.social (👋🏻👏🏻), first is @alexanderjpetri.bsky.social on de novo clustering of long read RNA, a problem that brings memories...