Sara Hooker
sarahooker.bsky.social
Sara Hooker
@sarahooker.bsky.social
I lead Cohere For AI. Formerly Research
Google Brain. ML Efficiency, LLMs,
@trustworthy_ml.
Reposted by Sara Hooker
⚠️ Leaderboard Illusion: "We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release & retract scores if desired..the ability of these providers to choose the best score leads to biased Arena scores"

Paper out now!🔻
May 5, 2025 at 8:08 PM
It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
April 30, 2025 at 2:55 PM
Reposted by Sara Hooker
1/ Science is only as strong as the benchmarks it relies on.

So how fair—and scientifically rigorous—is today’s most widely used evaluation benchmark?

We took a deep dive into Chatbot Arena to find out. 🧵
April 30, 2025 at 12:53 PM
Reposted by Sara Hooker
This has been a topic close to my heart for a long time.

We have an awesome lineup of speakers who have made deep contributions to open-source in ML, e.g. @sarahooker.bsky.social , @chrisrackauckas.bsky.social, Matt Johnson, Tri Dao, @stellaathena.bsky.social, Evan Shelhamer.
Tired of your open-source ML work not getting the academic recognition it deserves? 🤔 Submit to the first-ever CodeML workshop at #ICML2025! It focuses on new libraries, improvements to established ones, best practices, retrospectives, and more.
codeml-workshop.github.io/codeml2025/
CODEML Workshop
Championing Open-source Development in Machine Learning.
codeml-workshop.github.io
April 16, 2025 at 8:42 PM
Reposted by Sara Hooker
Today we are releasing Kaleidoscope 🎉

A comprehensive multimodal & multilingual benchmark for VLMs! It contains real questions from exams in different languages.

🌍 20,911 questions and 18 languages
📚 14 subjects (STEM → Humanities)
📸 55% multimodal questions
April 10, 2025 at 10:31 AM
It is rare I get to completely disconnect. Very grateful for this week in Patagonia.
March 19, 2025 at 9:48 PM
Reposted by Sara Hooker
We're particularly proud to release Aya Vision 8B - it's compact 🐭 and efficient 🐎, outperforming models up to 11x its size 📈.

Releasing open weights helps to make breakthroughs in VLMs accessible to the research community.
March 5, 2025 at 5:56 PM
Reposted by Sara Hooker
Just 2 days after launch, Aya Vision is trending on
@hf.co 🔥🔥

We launched open-weights with the goal of making VLM breakthroughs accessible to the research community - so exciting to see such a positive response.

huggingface.co/CohereForAI/...
March 6, 2025 at 5:10 PM
Reposted by Sara Hooker
Love this post by @sarahooker.bsky.social on that other platform: "The first step of any meaningful pursuit is to severely underestimate its difficulty."
March 3, 2025 at 7:51 PM
Reposted by Sara Hooker
Introducing ✨ Aya Vision ✨ - an open-weights model to connect our world through language and vision

Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. 🌿
March 4, 2025 at 2:01 PM
Reposted by Sara Hooker
👀
February 27, 2025 at 11:00 AM
Reposted by Sara Hooker
An important topic in AI is the climate impacts of the energy-intensive computing hardware needed to train and deploy AI models ⚡

Our policy primer explores ways to move towards more sustainable AI. 🌱

📜 cohere.com/research/pap...
February 25, 2025 at 5:42 PM
Reposted by Sara Hooker
Does more compute equate with greater risk?⚡️What is our track record predicting what risks emerge with scale? 📈

In this work led by Sara Hooker, we seek to understand the viability of compute thresholds ⚖️ as a way to mitigate risk. 🦺

arxiv.org/abs/2407.05694
February 11, 2025 at 3:11 PM
Reposted by Sara Hooker
In this work, we ask "How does model merging stack up when optimizing language models for diverse multitask learning?" 📚🧩

📜https://arxiv.org/abs/2410.10801
February 18, 2025 at 4:38 PM
Reposted by Sara Hooker
Aya Expanse, our open-weight 32B model, outperforms drastically larger models including Claude, Mistral Large 2, & Llama 405B on Scale's Private Multilingual Protocol.

We are proud to work on global AI that is efficient and accessible 🔥
January 22, 2025 at 2:22 PM
Reposted by Sara Hooker
Our paper is accepted to ICLR!
INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (arxiv.org/abs/2411.19799)
A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances.
A collaborative effort led by @cohereforai.bsky.social, with contributors worldwide.
/1
January 23, 2025 at 4:07 PM
Reposted by Sara Hooker
In this cross-institutional work, we introduce technical governance for AI and 100+ 🔢 open technical problems 🔧.

We provide a taxonomy of open problem areas in TAIG organized by governance capacities and governance targets.

📜https://arxiv.org/pdf/2407.14981
February 12, 2025 at 2:54 PM
Reposted by Sara Hooker
The C4AI Research Grant program is proud to have supported a project focused on building LLM tools for teachers 🧑‍🏫

This project focused on adapting educational materials to students’ skill levels, ensuring more effective and responsible AI integration in classrooms.
February 13, 2025 at 4:15 PM
Many people have asked me about the France Action Summit.

I think a summit is typically most valuable as a catalyst, not as a solution in itself.

But, will share some observations.
February 13, 2025 at 9:08 AM
Reposted by Sara Hooker
Boris Gamazaychikov, @salesforce.com Head of #AI #Sustainability announced the AI Energy Score we launched at the AI Action Summit in Paris. 🌍 This offers a standardized way to measure & compare the energy efficiency of AI models. 🫶
www.linkedin.com/posts/bgamaz...
www.linkedin.com
February 11, 2025 at 12:35 AM
Reposted by Sara Hooker
"Anyone who is serious about what the next generation of models is knows it can't be the current"

Thanks to @baratunde.com for hosting Head of Cohere For AI, @sarahooker.bsky.social on the latest episode of Life with Machines.

Check out their full conversation on YouTube:
youtu.be/-BsobAoOJvk
Is AI on the Verge of a Meltdown? | Sara Hooker (Ep. 8)
YouTube video by Baratunde Thurston
youtu.be
January 28, 2025 at 8:05 PM
Reposted by Sara Hooker
On Scale AI's private multilingual protocol, Aya Expanse is indexed as the best open-weights model.

Additionally, in some languages we're outperforming:
🔒proprietary models
🐘larger models
⛰️models built by more researchers with more infrastructure

Lots to be proud of today.
January 21, 2025 at 6:39 PM
As the @cohereforai.bsky.social joins the Bluesky family — we will be sharing paper gems from when we first started as a lab.

This paper is part of a larger research agenda where we have focused on how to better represent the long tail = making AI work for almost all real world distributions.
How can we mitigate the disparate effect of compression 🗜️on model performance for low-resource languages 💬?

Check out our cross-institutional collaboration discusses intriguing & previously unknown generalisation properties of compression.

📜Learn more: arxiv.org/abs/2211.02738
January 18, 2025 at 4:57 AM
Last year we published a fantastic cross-institutional survey on efficiency techniques for language models.

Comprehensive and a good starting pointing for researchers working on efficiency.
How do we do more 🐘 with less 🐁?

In an era of ever larger models, work on efficiency is ever more important. This cross-institutional collaboration provides a survey of the field for practitioners and researchers alike ⚙️.

📜Learn more: arxiv.org/pdf/2209.000...
January 17, 2025 at 5:23 AM