D'où cette campagne organisée au collège 👇🏻
2/2
On vous explique tout, avec évidemment une infographie ⤵️ @leparisien.fr
1/13
www.leparisien.fr/so...
D'où cette campagne organisée au collège 👇🏻
2/2
We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data
(TLDR: we cheat and get good scores)
@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social
- Our 24B base model stands out: it outperforms open counterparts in generic generation tasks in both French and English.
- However, benchmark scores initially lagged, prompting us to investigate why some datasets seem to boost benchmarks without improving real-world generation.
- Our 24B base model stands out: it outperforms open counterparts in generic generation tasks in both French and English.
- However, benchmark scores initially lagged, prompting us to investigate why some datasets seem to boost benchmarks without improving real-world generation.