Dmitry Kobak
@hippopedoid.bsky.social
260 followers 22 following 2 posts
Researcher at Tübingen University. Manifold learning, contrastive learning, scRNAseq data. Excess mortality. Born but to die and reas'ning but to err.
Posts Media Videos Starter Packs
hippopedoid.bsky.social
We spent a year writing this review of low-dim embeddings and arguing about things like epistemic roles and best practices :-) 20+ authors are all participants of the Dagstuhl seminar we held last year: www.dagstuhl.de/24122. Led by @alexandr.bsky.social and Cyril de Bodt.

arxiv.org/abs/2508.15929
Reposted by Dmitry Kobak
alexandr.bsky.social
one of the more fun topics stemmed from a discussion with @hippopedoid.bsky.social over what the oldest 2D PCA visualization we could find was. After some scouring, we settled on a 1960 paper from researchers at the Université de Montréal about turtle carapaces, which we recreated.
Recreated PCA figure from: P. Jolicoeur and J. E. Mosimann. Size and shape variation in the painted turtle. A principal component analysis.
Growth, 24:339–354, 1960.

The figures show PC1 vs PC2 and PC2 vs PC3, with colours and symbols reflecting the sex of the turtle.
Reposted by Dmitry Kobak
alexandr.bsky.social
Last year I met a bunch of great researchers who work with high-dimensional data at a Dagstuhl seminar. This week we put out a preprint about the history and philosophy of low-dimensional embedding methods, their applications, their challenges, and their possible future arxiv.org/abs/2508.15929
The participants of Dagstuhl Seminar 24122 standing on steps outside (from https://www.dagstuhl.de/24122) Multiple types of embeddings (UMAP, t-SNE, Laplacian Eigenmaps, PHATE, PCA, MDS) of Wikipedia text data labelled by a text summaries generated by an LLM. Methods like UMAP and t-SNE show cluster structure that reflect shared subject matter in text, whiel other methods show more continuous structure. Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of primate brain organoids at different time periods. Different methods highlight different aspects of development, such as clusters of similar cell types or time courses of cell development. Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of 1000 Genomes Project genotypes. Different methods reflect different aspects of demographic history of populations.
hippopedoid.bsky.social
Hi Ben, I've been trying to contact you by email but keep failing. Wanted to DM you here but your DMs are closed. If you either follow me or open your DMs, I'll message you! Cheers.