Wissam Antoun
@wissamantoun.bsky.social
89 followers 7 following 14 posts
PhD at ALMAnaCH/Inria Paris, @aubmindlab Alumni Interested in AI, NLP, Video Games wissamantoun.com
Posts Media Videos Starter Packs
wissamantoun.bsky.social
⚠️ Finetuning stability matters.

ModernBERT exhibits instabilities in downstream fine-tuning tasks.

While DeBERTaV3 offers more stable training dynamics.
wissamantoun.bsky.social
Data quality matters?

High-quality pretraining data accelerates convergence but offers minimal gains in final performance.

We suggest that current benchmarks may be saturated, limiting their ability to distinguish model improvements.
wissamantoun.bsky.social
Key takeaway:

When trained on identical data, DeBERTaV3 outperforms ModernBERT in benchmark tasks.

ModernBERT's strength is faster training and inference, but it doesn't surpass DeBERTaV3 in accuracy on NLU tasks.
wissamantoun.bsky.social
ModernBERT or DeBERTaV3?

What's driving performance: architecture or data?

To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects.

Here are our findings:
wissamantoun.bsky.social
Finally, this was all possible with the help of my colleagues and supervisors at ALMAnaCH @Inria: Francis Kulumba,Rian Touchent, Éric de la Clergerie, Benoît Sagot @zehavoc.bsky.social
Bon appétit!
[8/8]
wissamantoun.bsky.social
This work was partially funded by the DINUM @Numerique_Gouv through the AllIAnce program (alliance.numerique.gouv.fr/les-produits...)

Access to compute resources was granted by Stéphane Requena and Genci on the Jean Zay

For more details please check our new paper (arxiv.org/abs/2411.08868)
[7/8]
Camembert 2.0
Un outil d’intelligence artificielle utilisant des photos panoramiques du terrain pour y détecter panneaux de signalisation ou infrastructures publiques. Cette […]
alliance.numerique.gouv.fr
wissamantoun.bsky.social
As part of our transparent release, we make public all our pretrained models, the training checkpoints, best-performing finetunes, pre-training data, and the codebase all on HuggingFace @huggingface
Model Link: huggingface.co/almanach?sea...
[6/8]
almanach (ALMAnaCH (Inria))
NLP, Digital Humanities
huggingface.co
wissamantoun.bsky.social
Check the improved performance across a variety of general and domain-specific French NLP tasks.

The new models vastly outperform their predecessors and even match domain-specific finetunes🧑‍⚕️.

[5/8]
wissamantoun.bsky.social
CamemBERTa-v2 uses the Replaced Token Detection (RTD) from DeBERTaV3, while CamemBERT-v2 uses MLM with 40% masking.

RTD’s efficiency allowed us to train for 1 epoch vs. 3 for MLM.

Pre-training had 2 phases: seq length 512, then 1024 on long docs.

[4/8]
wissamantoun.bsky.social
A newly built tokenizer based on WordPiece:
- 32,768 tokens
- addition of newline and tab characters
- support emojis with zero-width-joiner
- numbers are split into two digits tokens
- support French elisions

[3/8]
wissamantoun.bsky.social
The new update includes:

- Much larger pretraining dataset: 275B tokens (previously ~32B) from French CulturaX, scientific articles from HAL, and Wikipedia.

Only 1 epoch was needed for CamemBERTa-v2 while the CamemBERT-v2 model was trained for 3 epochs or 825B tokens.

[2/8]
wissamantoun.bsky.social
CamemBERT 2.0: A Smarter French 🇫🇷 Language Model Aged to Perfection 👌

We release a much-needed update for the previous. SOTA French encoder LM.

We introduce two new models CamemBERTa-v2 and CamemBERT-v2, based on the DeBERTaV3 and RoBERTa recipe.

So what's new?

[1/8]