Lightnews — Scholar-powered news

Cohere Labs

@cohereforai.bsky.social

500 followers 12 following 210 posts

@Cohere.com's non-profit research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together. https://cohere.com/research

Posts Replies Media Videos

Cohere Labs

@cohereforai.bsky.social

And Research Engineer, @shivalika.bsky.social : The Leaderboard Illusion. 😶‍🌫️

This paper reveals systematic biases and transparency gaps in the Chatbot Arena leaderboard.

www.youtube.com/watch?v=URho...

NeurIPS 2025 in San Diego. The Leaderboard Illusion: How LLM Rankings Are Gamed

YouTube video by Women in AI Research WiAIR

www.youtube.com

December 29, 2025 at 3:59 PM

Cohere Labs

@cohereforai.bsky.social

Sr Research Scientist, @juliakreutzer.bsky.social: Treasure Hunt paper. 🗺️

This work introduces a method to improve model performance by adding markers to tokens of the pretraining data, enabling real-time targeting of the long tail using training-time markers.

www.youtube.com/watch?v=K3BU...

NeurIPS 2025 in San Diego. Treasure Hunt

YouTube video by Women in AI Research WiAIR

www.youtube.com

December 29, 2025 at 3:59 PM

Cohere Labs

@cohereforai.bsky.social

... @markusfreitag.bsky.social, Roman Grundkiewicz, @yupenghou.bsky.social, @phikoehn.bsky.social, @juliakreutzer.bsky.social, Saab Mansour, @sted19.bsky.social, Lorenzo Proietti, Parker Riley, Eduardo Sánchez, @patuchen.bsky.social, Mariya Shmatova, @zouharvi.bsky.social

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

You can find all details in our paper www2.statmt.org/wmt25/pdf/20... or discuss with us next week at the WMT Conference at #EMNLP2025.

Led by @kocmitom.bsky.social, Ekaterina Artemova, Eleftherios Avramidis, Eleftheria Briakou, @pinzhen.bsky.social, @mziizm.bsky.social...

www2.statmt.org

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

⚖️ LLM-as-a-judge: mixed reliability.

Top systems reach ~95% pairwise accuracy open-ended and summarization tasks.
Smaller ones barely beat coin-flip territory at ~55%.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🤖Naturalness is still a significant challenge.

Across open-ended generation and cross lingual summarization, the biggest weakness isn’t coherence or accuracy, but it is sounding like a native speaker. Many outputs still feel robotic or translated.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🧠English isn’t always easiest.

Models like Gemini 2.5 Pro and Claude 4 sometimes did better in Korean, German, or Spanish than in English when solving reasoning tasks.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🧩Linguistic reasoning remains the toughest nut. 🥥

Even top models scored below 50% on linguistic reasoning tasks, showing that structured linguistic deduction is still an open challenge.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🌐 Language coverage matters.

Models don’t support all languages equally, and this skews rankings. Smaller open models especially struggle with broad coverage, affecting their aggregate ranking ⚠️

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🧩 Linguistic reasoning on unseen languages
📝 Open-ended generation testing naturalness and usefulness
📘 Cross-lingual summarization
🔁 Machine translation
🧑‍⚖️ LLM-as-a-Judge evaluating outputs of other models

All backed by human evals and public releases of data + outputs!
github.com/wmt-conferen...

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

Cohere Labs x EMNLP 2025: "When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs"

Congrats to authors Ammar Khairi, Daniel D'souza, Ye Shen, @juliakreutzer.bsky.social, @sarahooker.bsky.social

📜 arxiv.org/abs/2506.20544

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

Cohere Labs x EMNLP 2025 "When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning"

Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet Üstün, Nigel Collier.

📜 arxiv.org/abs/2502.19158

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

Cohere Labs x EMNLP 2025: "The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It"

Congrats to authors @yongzx.bsky.social , Beyza Ermis, @mziizm.bsky.social, Stephen Bach, @juliakreutzer.bsky.social.

📜 arxiv.org/abs/2505.24119

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

Cohere Labs x EMNLP 2025: "Nexus: Adaptive Upcycling to Efficiently Pretrain Mixture of Experts"

Congrats to authors Nikolas Gritsch, Qizhen Zhang, @acyrl.bsky.social, @sarahooker.bsky.social and Ahmet Üstün.

📜 arxiv.org/abs/2408.15901

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

We're excited to hear from speakers including Ivan Zhang, Joelle Pineau, Marzieh Fadaee, Shayne Longpre and 20+ other presenters who will share insights on open science, collaborative research, and community-driven innovation.

Learn more and register now: https://tinyurl.com/CohereLabsConnect

October 24, 2025 at 10:00 AM

Cohere Labs

@cohereforai.bsky.social

Join us for inspiring keynotes, lightning talks, and interactive sessions that bring together curious minds from around the world. Throughout the conference, we’ll:

🔬 Showcase cutting-edge research
💡 Highlight meaningful collaborations
🤝 Inspire new partnerships

October 24, 2025 at 10:00 AM

Cohere Labs

@cohereforai.bsky.social

📜Paper link: arxiv.org/pdf/2510.19806

Led by: David Mora, Viraat Aryabumi, @weiyinko-ml.bsky.social, @sarahooker.bsky.social, @juliakreutzer.bsky.social, and
@mziizm.bsky.social.

arxiv.org

October 23, 2025 at 2:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news