Lightnews — Scholar-powered news

Antoine Chaffin

@nohtow.bsky.social

27, French CS Engineer 💻, PhD in ML 🎓🤖 — Guiding generative models for better synthetic data and building multimodal representations @LightOn

Posts Replies Media Videos

Antoine Chaffin

@nohtow.bsky.social

PyLate makes downstream usage easy, but also facilitate training!
You can reproduce this SOTA training with <80 LoC and 2 hours of training and it'll run NanoBEIR during training, report it to W&B and create an informative model card!
Link to the gist: gist.github.com/NohTow/3030f...

April 30, 2025 at 2:42 PM

Antoine Chaffin

@nohtow.bsky.social

It is also the first model to outperform ColBERT-small on BEIR
While it is bigger, it is still a very lightweight model and benefits from the efficiency of ModernBERT!
Also, it has only been trained on MS MARCO (for late interaction) and should thus generalize pretty well!

April 30, 2025 at 2:42 PM

Antoine Chaffin

@nohtow.bsky.social

Among all those LLM releases, here is an important retrieval release:
To overcome limitations of awesome ModernBERT-based dense models, today @lightonai.bsky.social is releasing GTE-ModernColBERT, the very first state-of-the-art late-interaction (multi-vectors) model trained using PyLate 🚀

April 30, 2025 at 2:42 PM

Antoine Chaffin

@nohtow.bsky.social

Obviously, it comes at a slightly higher cost, but it is also trained with Matryoshka capabilities to reduce the footprint of embeddings
Notably, the performance with dimension 256 is only slightly worse than the base version with full dimension 768

January 14, 2025 at 3:32 PM

Antoine Chaffin

@nohtow.bsky.social

Model link: huggingface.co/lightonai/mo...
ModernBERT-embed-large is trained using the same (two-stage training) recipe as its smaller sibling and expectedly increases the performance, reaching +1.22 in MTEB average

January 14, 2025 at 3:32 PM

Antoine Chaffin

@nohtow.bsky.social

ModernBERT-embed-base is awesome because it allows to use ModernBERT-base for various tasks out-of-the-box
But the large variant of ModernBERT is also awesome...
So today, @lightonai.bsky.social is releasing ModernBERT-embed-large, the larger and more capable iteration of ModernBERT-embed!

January 14, 2025 at 3:32 PM

Antoine Chaffin

@nohtow.bsky.social

Besides versatility, this also highlighted the potential of PyLate
The ModernBERT-base checkpoint achieves 51.3 of BEIR average
This means that we beat e5 in a <45 minutes training on MS MARCO only (using only half of the memory of our 8x100)

December 19, 2024 at 5:36 PM

Antoine Chaffin

@nohtow.bsky.social

Starting with my beloved PyLate
We have a lot of experiments on ColBERT models in the paper, with tons of different base models
PyLate handled it all, even models using half-baked remote code
This was a really cool stress test and I am really happy it went so smoothly

December 19, 2024 at 5:36 PM

Antoine Chaffin

@nohtow.bsky.social

We also carefully designed the shapes of the models to optimize inference on common hardware
Coupled to unpadding through the full processing flash attention, this allows ModernBERT to be two to three times faster than most encoders on long-context on a RTX 4090

December 19, 2024 at 5:36 PM

Antoine Chaffin

@nohtow.bsky.social

There are a lot of details about the architecture in the BP and the paper, but one key architecture design is the use of alternating attention, i.e, attending to the full input only every few layers
This enables fast processing of long sequences while maintaining accuracy

December 19, 2024 at 5:36 PM

Antoine Chaffin

@nohtow.bsky.social

These various improvements coupled to a 2T tokens training results in fast and accurate models that handle long context (and code!)
ModernBERT yields state-of-the-art results on various tasks, including IR (short and long context text as well as code) and NLU

December 19, 2024 at 5:36 PM

Antoine Chaffin

@nohtow.bsky.social

I am thrilled to announce the release of ModernBERT, the long-awaited BERT replacement!

There might be a few LLM releases per week, but there is only one drop-in replacement that brings Pareto improvements over the 6 years old BERT while going at lightspeed

December 19, 2024 at 5:36 PM

Antoine Chaffin

@nohtow.bsky.social

Wild statement ngl

December 16, 2024 at 6:24 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news