Cohere Labs
banner
cohereforai.bsky.social
Cohere Labs
@cohereforai.bsky.social
@Cohere.com's non-profit research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together. https://cohere.com/research
And Research Engineer, @shivalika.bsky.social : The Leaderboard Illusion. 😶‍🌫️

This paper reveals systematic biases and transparency gaps in the Chatbot Arena leaderboard.

www.youtube.com/watch?v=URho...
NeurIPS 2025 in San Diego. The Leaderboard Illusion: How LLM Rankings Are Gamed
YouTube video by Women in AI Research WiAIR
www.youtube.com
December 29, 2025 at 3:59 PM
Sr Research Scientist, @juliakreutzer.bsky.social: Treasure Hunt paper. 🗺️

This work introduces a method to improve model performance by adding markers to tokens of the pretraining data, enabling real-time targeting of the long tail using training-time markers.

www.youtube.com/watch?v=K3BU...
NeurIPS 2025 in San Diego. Treasure Hunt
YouTube video by Women in AI Research WiAIR
www.youtube.com
December 29, 2025 at 3:59 PM
... @markusfreitag.bsky.social, Roman Grundkiewicz, @yupenghou.bsky.social, @phikoehn.bsky.social, @juliakreutzer.bsky.social, Saab Mansour, @sted19.bsky.social, Lorenzo Proietti, Parker Riley, Eduardo Sánchez, @patuchen.bsky.social, Mariya Shmatova, @zouharvi.bsky.social
October 30, 2025 at 5:51 PM
You can find all details in our paper www2.statmt.org/wmt25/pdf/20... or discuss with us next week at the WMT Conference at #EMNLP2025.

Led by @kocmitom.bsky.social, Ekaterina Artemova, Eleftherios Avramidis, Eleftheria Briakou, @pinzhen.bsky.social, @mziizm.bsky.social...
www2.statmt.org
October 30, 2025 at 5:51 PM
⚖️ LLM-as-a-judge: mixed reliability.

Top systems reach ~95% pairwise accuracy open-ended and summarization tasks.
Smaller ones barely beat coin-flip territory at ~55%.
October 30, 2025 at 5:51 PM
🤖Naturalness is still a significant challenge.

Across open-ended generation and cross lingual summarization, the biggest weakness isn’t coherence or accuracy, but it is sounding like a native speaker. Many outputs still feel robotic or translated.
October 30, 2025 at 5:51 PM
🧠English isn’t always easiest.

Models like Gemini 2.5 Pro and Claude 4 sometimes did better in Korean, German, or Spanish than in English when solving reasoning tasks.
October 30, 2025 at 5:51 PM
🧩Linguistic reasoning remains the toughest nut. 🥥

Even top models scored below 50% on linguistic reasoning tasks, showing that structured linguistic deduction is still an open challenge.
October 30, 2025 at 5:51 PM
🌐 Language coverage matters.

Models don’t support all languages equally, and this skews rankings. Smaller open models especially struggle with broad coverage, affecting their aggregate ranking ⚠️
October 30, 2025 at 5:51 PM
🧩 Linguistic reasoning on unseen languages
📝 Open-ended generation testing naturalness and usefulness
📘 Cross-lingual summarization
🔁 Machine translation
🧑‍⚖️ LLM-as-a-Judge evaluating outputs of other models

All backed by human evals and public releases of data + outputs!
github.com/wmt-conferen...
October 30, 2025 at 5:51 PM
Cohere Labs x EMNLP 2025: "When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs"

Congrats to authors Ammar Khairi, Daniel D'souza, Ye Shen, @juliakreutzer.bsky.social, @sarahooker.bsky.social

📜 arxiv.org/abs/2506.20544
October 29, 2025 at 6:31 PM
Cohere Labs x EMNLP 2025 "When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning"

Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet Üstün, Nigel Collier.

📜 arxiv.org/abs/2502.19158
October 29, 2025 at 6:31 PM
Cohere Labs x EMNLP 2025: "The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It"

Congrats to authors @yongzx.bsky.social , Beyza Ermis, @mziizm.bsky.social, Stephen Bach, @juliakreutzer.bsky.social.

📜 arxiv.org/abs/2505.24119
October 29, 2025 at 6:31 PM
Cohere Labs x EMNLP 2025: "Nexus: Adaptive Upcycling to Efficiently Pretrain Mixture of Experts"

Congrats to authors Nikolas Gritsch, Qizhen Zhang, @acyrl.bsky.social, @sarahooker.bsky.social and Ahmet Üstün.

📜 arxiv.org/abs/2408.15901
October 29, 2025 at 6:31 PM
We're excited to hear from speakers including Ivan Zhang, Joelle Pineau, Marzieh Fadaee, Shayne Longpre and 20+ other presenters who will share insights on open science, collaborative research, and community-driven innovation.

Learn more and register now: https://tinyurl.com/CohereLabsConnect
October 24, 2025 at 10:00 AM
Join us for inspiring keynotes, lightning talks, and interactive sessions that bring together curious minds from around the world. Throughout the conference, we’ll:

🔬 Showcase cutting-edge research
💡 Highlight meaningful collaborations
🤝 Inspire new partnerships
October 24, 2025 at 10:00 AM
arxiv.org
October 23, 2025 at 2:45 PM