Tom Aarsen
tomaarsen.com
Tom Aarsen
@tomaarsen.com
Sentence Transformers, SetFit & NLTK maintainer
Machine Learning Engineer at 🤗 Hugging Face
Instead of having to store fp32 embeddings, you only store binary index (32x smaller) and int8 embeddings (4x smaller). Beyond that, you only keep the binary index in memory, so you're also saving 32x on memory compared to a fp32 search index.

🧵
January 6, 2026 at 7:56 PM
4. Load int8 embeddings for the 40 top binary documents from disk.
5. Rescore the top 40 documents using the fp32 query embedding and the 40 int8 embeddings
6. Sort the 40 documents based on the new scores, grab the top 10
7. Load the titles/texts of the top 10 documents

🧵
January 6, 2026 at 7:56 PM
This is the inference strategy:
1. Embed your query using a dense embedding model into a 'standard' fp32 embedding
2. Quantize the fp32 embedding to binary: 32x smaller
3. Use an approximate (or exact) binary index to retrieve e.g. 40 documents (~20x faster than a fp32 index)

🧵
January 6, 2026 at 7:56 PM
🏎️ You can perform 200ms search over 40 million texts using just a CPU server, 8GB of RAM, and 45GB of disk space.

The trick: Binary search with int8 rescoring.

I'll show you a demo & how it works in the 🧵:
January 6, 2026 at 7:56 PM
Python 3.9 deprecation:
Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.

🧵
December 11, 2025 at 2:46 PM
Transformers v5:
This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!

Even my tests run on both Transformers v4 and v5.

🧵
December 11, 2025 at 2:46 PM
Similarity scores in Hard Negatives Mining:
When mining for hard negatives to create a strong training dataset, you can now pass `output_scores=True` to get similarity scores returned. This can be useful for some distillation losses!

🧵
December 11, 2025 at 2:46 PM
Multilingual NanoBEIR Support:
You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing `dataset_id`, e.g. `dataset_id="lightonai/NanoBEIR-de"` for the German benchmark.

🧵
December 11, 2025 at 2:46 PM
CrossEncoder multi-processing:
Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure:
just `device=["cuda:0", "cuda:1"]` or `device=["cpu"]*4` on the `predict`/`rank` calls.

🧵
December 11, 2025 at 2:46 PM
🔥I've just published Sentence Transformers v5.2.0!

It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more.

Details in 🧵
December 11, 2025 at 2:46 PM
We see an increasing desire from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.

🧵
October 22, 2025 at 1:04 PM
🤗 Sentence Transformers is joining @hf.co! 🤗

This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer!

Details in 🧵
October 22, 2025 at 1:04 PM
The MTEB team has just released MTEB v2, an upgrade to their evaluation suite for embedding models!

Their blogpost covers all changes, including easier evaluation, multimodal support, rerankers, new interfaces, documentation, dataset statistics, a migration guide, etc.

🧵
October 20, 2025 at 2:36 PM
The benchmark is multilingual (20 languages) and covers various domains (general, legal, healthcare, code, etc.), and it's already available on MTEB right now.

There's also an English only version available.

🧵
October 1, 2025 at 3:52 PM
With RTEB, we can see the differences between public and private benchmarks, displayed in this figure here.

This would be an indication of whether the model is capable of generalizing nicely.

🧵
October 1, 2025 at 3:52 PM
We're announcing a new update to MTEB: RTEB

It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting.

Details in our blogpost below 🧵
October 1, 2025 at 3:52 PM
- Add FLOPS calculation to SparseEncoder evaluators for determining a performance/speed tradeoff
- Add support for Knowledgeable Passage Retriever (KPR) models
- Multi-GPU processing with 'model.encode()' now works with 'convert_to_tensor'

🧵
September 22, 2025 at 11:42 AM
- `model.encode()` now throws an error if an unused keyword argument is passed
- a new `model.get_model_kwargs()` method for checking which custom model-specific keyword arguments are supported for this model

🧵
September 22, 2025 at 11:42 AM
🐛 I've just released Sentence Transformers v5.1.1!

It's a small patch release that makes the project more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining.

Details in 🧵
September 22, 2025 at 11:42 AM
I'm very much looking forward to seeing embedding models based on mmBERT!
I already trained a basic Sentence Transformer model myself as I was too curious 👀

🧵
September 9, 2025 at 2:54 PM
Additionally: the ModernBERT-based mmBERT is much faster than the alternatives due to its architectural benefits. Easily up to 2x throughput in common scenarios.

🧵
September 9, 2025 at 2:54 PM
- Consistently outperforms equivalently sized models on all Multilingual tasks (XTREME, classification, MTEB v2 Multilingual after finetuning)

E.g. see the picture for MTEB v2 Multilingual performance.
🧵
September 9, 2025 at 2:54 PM
Evaluation details:
- Very competitive with ModernBERT at equivalent sizes on English (GLUE, MTEB v2 English after finetuning)

E.g. see the picture for MTEB v2 English performance.

🧵
September 9, 2025 at 2:54 PM
Training Details:
- Trained on 1833 languages incl. DCLM, FineWeb2, etc
- 3 training phases: 2.3T tokens on 60 languages, 600B tokens on 110 languages, and 100B tokens on all 1833 languages.
- Also uses model merging and clever transitions between the three training phases.

🧵
September 9, 2025 at 2:54 PM
Model details:
- 2 model sizes: 42M non-embed (140M total) and 110M non-embed (307M total)
- Uses the ModernBERT architecture + Gemma2 multilingual tokenizer (so: flash attention, alternating global/local attention, sequence packing, etc.)
- Max. seq. length of 8192 tokens

🧵
September 9, 2025 at 2:54 PM