@rbalestr.bsky.social
52 followers 11 following 15 posts
Posts Media Videos Starter Packs
Pinned
rbalestr.bsky.social
Understanding the evolution of historical maps is key to track the development of civilizations (urbanization, environmental changes, ...). We show how to use Self Supervised Learning to do that without supervision!
arxiv.org/abs/2411.17425
(SSL workshop NeurIPS24)
rbalestr.bsky.social
We validate our pipeline on many noise levels/types across datasets, architectures, and SSL objectives, demonstrating that **data curriculum remains a fully under-explored axis of improvement for SSL pretraining**! Huge congrats to
Wenquan Lu, Jiaqi Zhang and Hugues Van Hassel
rbalestr.bsky.social
Our solution is to train a SSL denoiser only to create a data curriculum for the SSL method you are interested in. By first observing denoised samples and gradually going back to the original samples, the final SSL model performs better than the baseline!
rbalestr.bsky.social
With high levels of noise, it is standard to have a denoiser as part of the train/test preprocessing pipeline... but this has drawbacks e.g. adding a bias to your pipeline, cross-validation, sensitivity to distribution shifts... AI/SSL should strive for denoiser-free pipelines!
rbalestr.bsky.social
Want to use SOTA Self Supervised Learning (SSL) methods on noisy data? We provide a novel training curriculum that significantly improves test performance on clean and noisy samples! The approach is fully SSL and works on any method (DINOv2, MoCo, ...)
arxiv.org/abs/2505.12191
Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum
Self-Supervised Learning (SSL) has become a powerful solution to extract rich representations from unlabeled data. Yet, SSL research is mostly focused on clean, curated and high-quality datasets. As a...
arxiv.org
rbalestr.bsky.social
None of this would have been possible without the incredible work of co-authors Jeremy Budd, Javier Ideami, Benjamin Macdowall Rynne, and Keith Duggar
! And behind it all, MLST and the open Discord channel where we all met!
rbalestr.bsky.social
The spline connection offers closed-form solution for many questions we have been wondering around SAEs--and provides clear actionable solutions such as our PAM-SGD training algo. PAM-SGD is EM-like, relying on the partition and region assignment, outperforming typical Adam/SGD
rbalestr.bsky.social
The findings stem from expressing SAEs as splines (arxiv.org/abs/2408.04809) and doing a deep dive into their partition, constraints, and underlying geometry! We not only characterize their input space partition and geometry, but tie SAE to common methods such as k-means and PCA
rbalestr.bsky.social
Want better training and geometric insights for Sparse AutoEncoders (SAEs)? Search no more... We leverage spline theory to provide a new "EM-like" training algo (PAM-SGD) and to delve into SAE geometry with connections to PCA, k-means, and more...

arxiv.org/abs/2505.11836
rbalestr.bsky.social
Our work also raises a deeper question: which of the attention or MLP blocks have to be adapted to steer a model to your specific downstream application? Tons of open questions to explore!

This amazing work was led by
@pszwnzl.bsky.social , Wojciech Jasiński, Marek Śmieja and Bartosz Zielinski
rbalestr.bsky.social
That bias towards capturing details manifests itself in terms of different attention behavior within ViTs. From those findings, we propose a new token aggregator that can counter such attention bias without having to finetune the backbone -> gains in linear probe performance!
rbalestr.bsky.social
That finding comes from our previous study (arxiv.org/abs/2402.11337) proving why methods like MAE do not perform well without finetuning compared to joint embedding methods: they learn too many details about the data that are not useful for coarse scale (semantic) classification
Learning by Reconstruction Produces Uninformative Features For Perception
Input space reconstruction is an attractive representation learning paradigm. Despite interpretability of the reconstruction and generation, we identify a misalignment between learning by reconstructi...
arxiv.org
rbalestr.bsky.social
Learning by reconstruction captures uninformative details in your data. This “attention to details” biases the ViT’s attention. Our solution: a new token aggregator->improves (significantly) MAE linear probe perf. and (slightly) JEPAs like I-JEPA
arxiv.org/abs/2412.03215
rbalestr.bsky.social
This great work was led by great collaborators Xue Xia, Tao Zhang and Lorenz Hurni! And we will be presenting a poster at the SSL Workshop at NeurIPS2024!
rbalestr.bsky.social
We propose an approach that combines segmentation and association of geographic entities in historical maps using video instance segmentation (VIS). Combined with a novel method for generating synthetic videos from unlabeled historical maps, we produce SSL models with high acc.
rbalestr.bsky.social
Understanding the evolution of historical maps is key to track the development of civilizations (urbanization, environmental changes, ...). We show how to use Self Supervised Learning to do that without supervision!
arxiv.org/abs/2411.17425
(SSL workshop NeurIPS24)