Cohere Labs
@cohereforai.bsky.social
460 followers 12 following 170 posts
@Cohere.com's non-profit research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together. https://cohere.com/research
Posts Media Videos Starter Packs
Pinned
cohereforai.bsky.social
We are committed to making meaningful progress in machine learning research through open collaboration. Follow this 🧵to stay on top of our research contributions.
Reposted by Cohere Labs
juliakreutzer.bsky.social
Let's do the venue justice. Very excited for today's multilingual workshops at #COLM2025 💙
In Montreal 140 languages are spoken
cohereforai.bsky.social
In the afternoon, you can find Julia at the MELT workshop (Multilingual and Equitable Language Technologies), where she will talk about optimizing multilinguality post training.
cohereforai.bsky.social
Today at COLM, Cohere Labs Sr Research Scientist, @juliakreutzer.bsky.social will be presenting at 2 workshops.

First, the Multilingual Data Quality Signals workshop, bringing together researchers across disciplines to discuss & present research on data quality signals in multilingual data.
cohereforai.bsky.social
Today at COLM, we are excited to share our work Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation, during Poster Session 4, 4:30 - 6:30pm.

Come connect with paper authors @juliakreutzer.bsky.social and @kocmitom.bsky.social.
Reposted by Cohere Labs
juliakreutzer.bsky.social
💡A collaborative➕diverse team is key. In real life as in the LLM world 💪🦾
Check out our latest work that builds on this insight. 👇
cohereforai.bsky.social
Is Best-of-N really the best use of your inference compute?

Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
cohereforai.bsky.social
We are excited to present FusioN as a plug-and-play replacement to Best-of-N, shifting from a monolithic selection framework to collaborative synthesis one that embraces the diverse strengths of today’s leading open LLMs.
cohereforai.bsky.social
How does FusioN use the same sample pool more effectively than BoN?

🧩While BoN picks just one sample per problem, FusioN synthesises one output from all samples – treating them as collaborators whose strengths can be integrated, not competitors in a zero-sum game.
cohereforai.bsky.social
Want the wisdom-of-the-crowd in 1 model?

🧑‍🎓🧑🏽‍🎓👨🏾‍🎓Fusion-of-N distills multiple teachers into richer synthetic data than BoN, training students that achieve bigger downstream gains, even surpassing teachers on multilingual factual reasoning 🌎
cohereforai.bsky.social
Test-time scaling doesn't need to waste samples, Fusion-of-N turns every sample into signal; outperforming BoN across tasks, languages and models. 🚀

Fusion-of-N boosts CommandA win-rates vs Gemini-2.5 Pro +8.3% across 11 languages – a +4.0% improvement over BoN 🥇
cohereforai.bsky.social
Fusion-of-N uses an LLM (the fusor) to merge multiple candidate answers into one 💎

Instead of selecting only one response, Fusion-of-N creates an even better answer by integrating insights across all samples 🏅
cohereforai.bsky.social
Is Best-of-N really the best use of your inference compute?

Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
cohereforai.bsky.social
We’re not your average lab. We’re a hybrid research environment dedicated to revolutionizing the ML space.

And we’re hiring a Senior Research Scientist to co-create with us.

If you believe in research as a shared, global effort — this is your chance.
cohereforai.bsky.social
Led by: Srishti Gureja, Elena Tommasone, Jingyi He, @sarahooker.bsky.social, Matthias Galle, and @mziizm.bsky.social

📄 Paper: https://arxiv.org/abs/2509.20837
cohereforai.bsky.social
🔹 The future of synthetic training hinges on rethinking verification. It’s calibrated verification: complex, diverse test suites combined with flexible signals that break the Verification Ceiling and improve code LLMs.
cohereforai.bsky.social
🔹 We also find that LLMs can serve as soft verifiers. Their judgments recover useful data and often match or surpass formal unit tests selection.
cohereforai.bsky.social
🔹 Relaxing verification thresholds boosts performance but only with sufficiently complex test suites. Correctness still matters, but how we define it is the real issue.
cohereforai.bsky.social
We find:

🔹 Rigid verification risks biasing toward easy problems, while richer correctness signals preserve both quality and diversity.
cohereforai.bsky.social
What if the way we verify synthetic code is limiting model performance?

In our latest work we uncover the Verification Ceiling Problem: strict “all tests must pass” rules throw away useful data, while weak tests let errors through.
Reposted by Cohere Labs
mziizm.bsky.social
I'm excited to share that I'll be stepping into the role of Head of @cohereforai.bsky.social. It's an honor and a responsibility to lead such an extraordinary group of researchers pushing the boundaries of AI research.
Reposted by Cohere Labs
cjpberry.bsky.social
Papers In The Park 14. Last one of the season! Still great weather. Surprising. Anthony is presenting the “Why Language Models Hallucinate”.

Thanks to @cohereforai.bsky.social for the copies and pizza.
cohereforai.bsky.social
🚨 Rare opportunity: Cohere Labs is hiring a Research Scientist!

If you’re passionate about studying fundamental AI problems and working in a globally collaborative, open-science environment, this is for you.

Apply here: jobs.ashbyhq.com/cohere/7ec9e...
Reposted by Cohere Labs
cjpberry.bsky.social
It’s papers in the park 7! Thanks to @cohereforai.bsky.social for the papers and the pizza, and to Alvin and Anthony for organizing.

It’s easily one of funnest paper reads in the city!