Stephan Köstlbacher
@stephkoe.bsky.social
470 followers 310 following 32 posts
Former PostDoc at Wageningen University with Thijs Ettema Studying ancient evolutionary transitions in prokaryotes using phylogenomics and structural modeling Looking for next step in academic or translational research he/him
Posts Media Videos Starter Packs
stephkoe.bsky.social
Never thought I’d do real E. coli research, so far I used if only for cloning & protist snacks in my MSc 😅

But here I am simulating chromosomes & shuffling motifs.

Watching @loreoliv.bsky.social and the lab turn those predictions into real data was pure magic.✨

Grateful to be part of this team!
Reposted by Stephan Köstlbacher
martinsteinegger.bsky.social
Folddisco finds similar (dis)continuous 3D motifs in large protein structure databases. Its efficient index enables fast uncharacterized active site annotation, protein conformational state analysis and PPI interface comparison. 1/9🧶🧬
📄 www.biorxiv.org/content/10.1...
🌐 search.foldseek.com/folddisco
stephkoe.bsky.social
It was a pleasure to work with you! 😊
stephkoe.bsky.social
Hey Felix, good question! Yeah it is different. In short: trimAl/ClipKit aim to remove uninformative sites. WitChi is a second step to remove misleading sites: those that can group unrelated taxa just because their sequence composition looks similar. Hope that helps!
stephkoe.bsky.social
And of course the great work!
stephkoe.bsky.social
Thanks for that beautiful summary, kassi :)
Reposted by Stephan Köstlbacher
kassipan.bsky.social
Excited to share our work on WitChi! 🛠️🖥️
We tested it on the GTDB r220 archaeal supermatrix (5,869 taxa & 10,101 cols) removing 55% of sites in <2h.

The phylogeny showed several interesting groupings with overall improved branch support:
#phylogenetics #ArchaeaSky #MSA #opensource #MEvoSky #MicroSky
Reposted by Stephan Köstlbacher
daanspeth.bsky.social
I'm happy to announce the latest release of the GlobDB, available at globdb.org.

The GlobDB is a database of "species dereplicated" microbial genomes, and as of release 226 contains twice the number of species-representative genomes (306,260) than the latest GTDB release.
home | GlobDB
globdb.org
stephkoe.bsky.social
It was a fun project :) Thanks for the support!
stephkoe.bsky.social
8.
GTDB r220 case study (led by @kassipan.bsky.social )
Applied WitChi to the archaeal GTDB r220 supermatrix:
• 5,869 taxa
• 55% of columns pruned
• Biased taxa: 95.1% → 2.3%
• Runtime: <2h on 4 cores
→ Known clades recovered — without using very complex C60 or CAT models
stephkoe.bsky.social
7.
Use witchi test to quantify bias per taxon:
• χ² scores
• Empirical p-values (via permutations)
• Z-scores to see how far taxa deviate from expectation
→ Great for screening MSAs or comparing compositional distortion across datasets.
stephkoe.bsky.social
6.
WitChi solves both problems:
🔹 Builds a null distribution using column permutations — no model, no tree
🔹 Recursively removes columns that distort the taxon-wise χ² profile
🎁 Bonus: 3 scoring strategies, including one capturing distribution-wide effects (Wasserstein)
⚡ Scales linearly with taxa
stephkoe.bsky.social
5.
Classical χ² pruning trims biased columns once — fast, but naive.
→ As alignment composition shifts, Δχ² must be updated — few tools do this.
BMGE’s stationary-based algorithm prunes iteratively and works well, but scales quadratically with taxa — not feasible for medium sized or large datasets.
stephkoe.bsky.social
4.
The problem:
χ² assumes taxa are independent and identically distributed samples.

In MSAs, they share history → correlated data.
So parametric χ² nulls are invalid.
Simulations help, but they need known models and trees — which bias distorts.
→ Slow, circular, rarely used.
stephkoe.bsky.social
3.
We often use χ² stats to detect bias — how much a taxon’s sequence composition deviates from expectation.
χ² pruning removes columns with strong bias signal.
But both steps rely on assumptions that don’t hold in real MSAs.
stephkoe.bsky.social
2.
What’s compositional bias?
When unrelated taxa convergently evolve similar sequence compositions (e.g. GC-rich, AT-rich), tree algorithms may group them by chemistry, not ancestry — a well-known artefact in deep phylogenies.
Fig modified from: doi.org/10.1007/978-...
Reposted by Stephan Köstlbacher
helm-linger.bsky.social
Excited to share our first paper on the symbiosis between chlamydiae and social amoebae showing in detail the adaptations of endosymbionts to the social life style of their host! 🎉
Reposted by Stephan Köstlbacher
stcmicrobeblog.bsky.social
a midsummer night's dream
or
understanding chromosome organization's deep evolutionary roots
#MicroSky #Archaea #ArchaeaSky #ProtistsOnSky et al.
borrowed from https://www.rsc.org.uk/a-midsummer-nights-dream/production-photos