Lightnews — Scholar-powered news

Tom Aarsen

@tomaarsen.com

Open to recommendations!

January 11, 2026 at 11:03 AM

Tom Aarsen

@tomaarsen.com

It would be simple to add a sparse component here as well: e.g. bm25s for a BM25 variant or an inference-free SparseEncoder with e.g. 'splade-index'.

In short: your retrieval doesn't need to be so expensive!

January 6, 2026 at 7:56 PM

Tom Aarsen

@tomaarsen.com

By loading e.g. 4x as many documents with the binary index and rescoring those with int8, you restore ~99% of the performance of the fp32 search, compared to ~97% when using purely the binary index: huggingface.co/blog/embeddi...

🧵

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

January 6, 2026 at 7:56 PM

Tom Aarsen

@tomaarsen.com

For context: common fp32 retrieval on this problem would cost 180GB of RAM, 180GB of disk space for embeddings, and would likely be 20-25x slower.

Binary retrieval with int8 rescoring costs just ~6GB of RAM and ~45GB of disk space for embeddings.

🧵

January 6, 2026 at 7:56 PM

Tom Aarsen

@tomaarsen.com

Instead of having to store fp32 embeddings, you only store binary index (32x smaller) and int8 embeddings (4x smaller). Beyond that, you only keep the binary index in memory, so you're also saving 32x on memory compared to a fp32 search index.

🧵

January 6, 2026 at 7:56 PM

Tom Aarsen

@tomaarsen.com

This requires embedding all of your documents once, and using those embeddings for:
- A binary index, I used a IndexBinaryFlat for exact and IndexBinaryIVF for approximate
- A int8 "view", i.e. a way to load the int8 embeddings from disk efficiently given a document ID

🧵

January 6, 2026 at 7:56 PM

Tom Aarsen

@tomaarsen.com

4. Load int8 embeddings for the 40 top binary documents from disk.
5. Rescore the top 40 documents using the fp32 query embedding and the 40 int8 embeddings
6. Sort the 40 documents based on the new scores, grab the top 10
7. Load the titles/texts of the top 10 documents

🧵

January 6, 2026 at 7:56 PM

Tom Aarsen

@tomaarsen.com

This is the inference strategy:
1. Embed your query using a dense embedding model into a 'standard' fp32 embedding
2. Quantize the fp32 embedding to binary: 32x smaller
3. Use an approximate (or exact) binary index to retrieve e.g. 40 documents (~20x faster than a fp32 index)

🧵

January 6, 2026 at 7:56 PM

Tom Aarsen

@tomaarsen.com

Feel free to try the demo over 40 million texts from Wikipedia if you want to get hands-on first (no login or anything like that): huggingface.co/spaces/sente...

🧵

Quantized Retrieval - a Hugging Face Space by sentence-transformers

Find relevant Wikipedia articles by typing questions or topics in plain English. Enter your search query and get back the most related articles with their titles and content snippets. The system us...

huggingface.co

January 6, 2026 at 7:56 PM

Tom Aarsen

@tomaarsen.com

I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!

December 11, 2025 at 2:46 PM

Tom Aarsen

@tomaarsen.com

Check out the full changelog for more details: github.com/huggingface/...

And don't forget to ⭐ the project if you haven't yet.

🧵

Release v5.2.0 - CrossEncoder multi-processing, multilingual NanoBEIR evaluators, similarity score in `mine_hard_negatives`, Transformers v5 support · huggingface/sentence-transformers

This minor release introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support, Python 3.9 d...

github.com

December 11, 2025 at 2:46 PM

Tom Aarsen

@tomaarsen.com

Python 3.9 deprecation:
Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.

🧵

December 11, 2025 at 2:46 PM

Tom Aarsen

@tomaarsen.com

Transformers v5:
This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!

Even my tests run on both Transformers v4 and v5.

🧵

December 11, 2025 at 2:46 PM

Tom Aarsen

@tomaarsen.com

Similarity scores in Hard Negatives Mining:
When mining for hard negatives to create a strong training dataset, you can now pass `output_scores=True` to get similarity scores returned. This can be useful for some distillation losses!

🧵

December 11, 2025 at 2:46 PM

Tom Aarsen

@tomaarsen.com

Multilingual NanoBEIR Support:
You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing `dataset_id`, e.g. `dataset_id="lightonai/NanoBEIR-de"` for the German benchmark.

🧵

December 11, 2025 at 2:46 PM

Tom Aarsen

@tomaarsen.com

CrossEncoder multi-processing:
Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure:
just `device=["cuda:0", "cuda:1"]` or `device=["cpu"]*4` on the `predict`/`rank` calls.

🧵

December 11, 2025 at 2:46 PM

Tom Aarsen

@tomaarsen.com

Ooh, I'd love a blogpost on what embedding model/search system they're using. My guess: LLM summarization followed with hybrid search with EmbeddingGemma or gemini-embeddings and BM25.

November 18, 2025 at 6:20 PM

Tom Aarsen

@tomaarsen.com

That choice ended up being very valuable for the embedding & information retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.

I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

I would like to thank the @ukplab.bsky.social, and especially Nils Reimers and @igurevych.bsky.social, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to lead the project.

🧵

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

We see an increasing desire from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.

🧵

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

Today, the @ukplab.bsky.social is transferring the project to @hf.co.

Sentence Transformers will remain a community-driven, open-source project, with the same Apache 2.0 license as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged!

🧵

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

Read our full announcement for more details and quotes from UKP and Hugging Face leadership: huggingface.co/blog/sentenc...

🧵

Sentence Transformers is joining Hugging Face!

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

October 22, 2025 at 1:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news