Tom Aarsen
tomaarsen.com
Tom Aarsen
@tomaarsen.com
Sentence Transformers, SetFit & NLTK maintainer
Machine Learning Engineer at 🤗 Hugging Face
Open to recommendations!
January 11, 2026 at 11:03 AM
It would be simple to add a sparse component here as well: e.g. bm25s for a BM25 variant or an inference-free SparseEncoder with e.g. 'splade-index'.

In short: your retrieval doesn't need to be so expensive!
January 6, 2026 at 7:56 PM
By loading e.g. 4x as many documents with the binary index and rescoring those with int8, you restore ~99% of the performance of the fp32 search, compared to ~97% when using purely the binary index: huggingface.co/blog/embeddi...

🧵
Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
January 6, 2026 at 7:56 PM
For context: common fp32 retrieval on this problem would cost 180GB of RAM, 180GB of disk space for embeddings, and would likely be 20-25x slower.

Binary retrieval with int8 rescoring costs just ~6GB of RAM and ~45GB of disk space for embeddings.

🧵
January 6, 2026 at 7:56 PM
Instead of having to store fp32 embeddings, you only store binary index (32x smaller) and int8 embeddings (4x smaller). Beyond that, you only keep the binary index in memory, so you're also saving 32x on memory compared to a fp32 search index.

🧵
January 6, 2026 at 7:56 PM
This requires embedding all of your documents once, and using those embeddings for:
- A binary index, I used a IndexBinaryFlat for exact and IndexBinaryIVF for approximate
- A int8 "view", i.e. a way to load the int8 embeddings from disk efficiently given a document ID

🧵
January 6, 2026 at 7:56 PM
4. Load int8 embeddings for the 40 top binary documents from disk.
5. Rescore the top 40 documents using the fp32 query embedding and the 40 int8 embeddings
6. Sort the 40 documents based on the new scores, grab the top 10
7. Load the titles/texts of the top 10 documents

🧵
January 6, 2026 at 7:56 PM
This is the inference strategy:
1. Embed your query using a dense embedding model into a 'standard' fp32 embedding
2. Quantize the fp32 embedding to binary: 32x smaller
3. Use an approximate (or exact) binary index to retrieve e.g. 40 documents (~20x faster than a fp32 index)

🧵
January 6, 2026 at 7:56 PM
Feel free to try the demo over 40 million texts from Wikipedia if you want to get hands-on first (no login or anything like that): huggingface.co/spaces/sente...

🧵
Quantized Retrieval - a Hugging Face Space by sentence-transformers
Find relevant Wikipedia articles by typing questions or topics in plain English. Enter your search query and get back the most related articles with their titles and content snippets. The system us...
huggingface.co
January 6, 2026 at 7:56 PM
I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
December 11, 2025 at 2:46 PM
Python 3.9 deprecation:
Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.

🧵
December 11, 2025 at 2:46 PM
Transformers v5:
This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!

Even my tests run on both Transformers v4 and v5.

🧵
December 11, 2025 at 2:46 PM
Similarity scores in Hard Negatives Mining:
When mining for hard negatives to create a strong training dataset, you can now pass `output_scores=True` to get similarity scores returned. This can be useful for some distillation losses!

🧵
December 11, 2025 at 2:46 PM
Multilingual NanoBEIR Support:
You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing `dataset_id`, e.g. `dataset_id="lightonai/NanoBEIR-de"` for the German benchmark.

🧵
December 11, 2025 at 2:46 PM
CrossEncoder multi-processing:
Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure:
just `device=["cuda:0", "cuda:1"]` or `device=["cpu"]*4` on the `predict`/`rank` calls.

🧵
December 11, 2025 at 2:46 PM
Ooh, I'd love a blogpost on what embedding model/search system they're using. My guess: LLM summarization followed with hybrid search with EmbeddingGemma or gemini-embeddings and BM25.
November 18, 2025 at 6:20 PM
That choice ended up being very valuable for the embedding & information retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.

I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!
October 22, 2025 at 1:04 PM
I would like to thank the @ukplab.bsky.social, and especially Nils Reimers and @igurevych.bsky.social, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to lead the project.

🧵
October 22, 2025 at 1:04 PM
We see an increasing desire from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.

🧵
October 22, 2025 at 1:04 PM
Today, the @ukplab.bsky.social is transferring the project to @hf.co.

Sentence Transformers will remain a community-driven, open-source project, with the same Apache 2.0 license as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged!

🧵
October 22, 2025 at 1:04 PM
Read our full announcement for more details and quotes from UKP and Hugging Face leadership: huggingface.co/blog/sentenc...

🧵
Sentence Transformers is joining Hugging Face!
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
October 22, 2025 at 1:04 PM