Lightnews — Scholar-powered news

Reposted by Sara Hooker

Princeton Center for Information Technology Policy

@princetoncitp.bsky.social

⚠️ Leaderboard Illusion: "We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release & retract scores if desired..the ability of these providers to choose the best score leads to biased Arena scores"

Paper out now!🔻

May 5, 2025 at 8:08 PM

Sara Hooker

@sarahooker.bsky.social

It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.

April 30, 2025 at 2:55 PM

Reposted by Sara Hooker

Marzieh Fadaee

@mziizm.bsky.social

1/ Science is only as strong as the benchmarks it relies on.

So how fair—and scientifically rigorous—is today’s most widely used evaluation benchmark?

We took a deep dive into Chatbot Arena to find out. 🧵

April 30, 2025 at 12:53 PM

Reposted by Sara Hooker

Jonathan Wenger

@jwenger.bsky.social

This has been a topic close to my heart for a long time.

We have an awesome lineup of speakers who have made deep contributions to open-source in ML, e.g. @sarahooker.bsky.social , @chrisrackauckas.bsky.social, Matt Johnson, Tri Dao, @stellaathena.bsky.social, Evan Shelhamer.

Frank Schneider @fsschneider.bsky.social · Apr 16

Tired of your open-source ML work not getting the academic recognition it deserves? 🤔 Submit to the first-ever CodeML workshop at #ICML2025! It focuses on new libraries, improvements to established ones, best practices, retrospectives, and more.
codeml-workshop.github.io/codeml2025/

CODEML Workshop

Championing Open-source Development in Machine Learning.

codeml-workshop.github.io

April 16, 2025 at 8:42 PM

Reposted by Sara Hooker

Isra Salazar

@israsalazar.bsky.social

Today we are releasing Kaleidoscope 🎉

A comprehensive multimodal & multilingual benchmark for VLMs! It contains real questions from exams in different languages.

🌍 20,911 questions and 18 languages
📚 14 subjects (STEM → Humanities)
📸 55% multimodal questions

April 10, 2025 at 10:31 AM

Sara Hooker

@sarahooker.bsky.social

It is rare I get to completely disconnect. Very grateful for this week in Patagonia.

March 19, 2025 at 9:48 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

We're particularly proud to release Aya Vision 8B - it's compact 🐭 and efficient 🐎, outperforming models up to 11x its size 📈.

Releasing open weights helps to make breakthroughs in VLMs accessible to the research community.

March 5, 2025 at 5:56 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

Just 2 days after launch, Aya Vision is trending on
@hf.co 🔥🔥

We launched open-weights with the goal of making VLM breakthroughs accessible to the research community - so exciting to see such a positive response.

huggingface.co/CohereForAI/...

March 6, 2025 at 5:10 PM

Reposted by Sara Hooker

(((Steve Chapman)))

@stevechapman.bsky.social

Love this post by @sarahooker.bsky.social on that other platform: "The first step of any meaningful pursuit is to severely underestimate its difficulty."

March 3, 2025 at 7:51 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

Introducing ✨ Aya Vision ✨ - an open-weights model to connect our world through language and vision

Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. 🌿

March 4, 2025 at 2:01 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

👀

February 27, 2025 at 11:00 AM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

An important topic in AI is the climate impacts of the energy-intensive computing hardware needed to train and deploy AI models ⚡

Our policy primer explores ways to move towards more sustainable AI. 🌱

📜 cohere.com/research/pap...

February 25, 2025 at 5:42 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

Does more compute equate with greater risk?⚡️What is our track record predicting what risks emerge with scale? 📈

In this work led by Sara Hooker, we seek to understand the viability of compute thresholds ⚖️ as a way to mitigate risk. 🦺

arxiv.org/abs/2407.05694

February 11, 2025 at 3:11 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

In this work, we ask "How does model merging stack up when optimizing language models for diverse multitask learning?" 📚🧩

📜https://arxiv.org/abs/2410.10801

February 18, 2025 at 4:38 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

Aya Expanse, our open-weight 32B model, outperforms drastically larger models including Claude, Mistral Large 2, & Llama 405B on Scale's Private Multilingual Protocol.

We are proud to work on global AI that is efficient and accessible 🔥

January 22, 2025 at 2:22 PM

Reposted by Sara Hooker

Jekaterina Novikova

@j-novikova-nlp.bsky.social

Our paper is accepted to ICLR!
INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (arxiv.org/abs/2411.19799)
A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances.
A collaborative effort led by @cohereforai.bsky.social, with contributors worldwide.
/1

January 23, 2025 at 4:07 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

In this cross-institutional work, we introduce technical governance for AI and 100+ 🔢 open technical problems 🔧.

We provide a taxonomy of open problem areas in TAIG organized by governance capacities and governance targets.

📜https://arxiv.org/pdf/2407.14981

February 12, 2025 at 2:54 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

The C4AI Research Grant program is proud to have supported a project focused on building LLM tools for teachers 🧑‍🏫

This project focused on adapting educational materials to students’ skill levels, ensuring more effective and responsible AI integration in classrooms.

February 13, 2025 at 4:15 PM

Sara Hooker

@sarahooker.bsky.social

Many people have asked me about the France Action Summit.

I think a summit is typically most valuable as a catalyst, not as a solution in itself.

But, will share some observations.

February 13, 2025 at 9:08 AM

Reposted by Sara Hooker

Kathy Baxter

@baxterkb.bsky.social

Boris Gamazaychikov, @salesforce.com Head of #AI #Sustainability announced the AI Energy Score we launched at the AI Action Summit in Paris. 🌍 This offers a standardized way to measure & compare the energy efficiency of AI models. 🫶
www.linkedin.com/posts/bgamaz...

www.linkedin.com

February 11, 2025 at 12:35 AM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

"Anyone who is serious about what the next generation of models is knows it can't be the current"

Thanks to @baratunde.com for hosting Head of Cohere For AI, @sarahooker.bsky.social on the latest episode of Life with Machines.

Check out their full conversation on YouTube:
youtu.be/-BsobAoOJvk

Is AI on the Verge of a Meltdown? | Sara Hooker (Ep. 8)

YouTube video by Baratunde Thurston

youtu.be

January 28, 2025 at 8:05 PM

Reposted by Sara Hooker

Cohere Labs

@cohereforai.bsky.social

On Scale AI's private multilingual protocol, Aya Expanse is indexed as the best open-weights model.

Additionally, in some languages we're outperforming:
🔒proprietary models
🐘larger models
⛰️models built by more researchers with more infrastructure

Lots to be proud of today.

January 21, 2025 at 6:39 PM

Sara Hooker

@sarahooker.bsky.social

As the @cohereforai.bsky.social joins the Bluesky family — we will be sharing paper gems from when we first started as a lab.

This paper is part of a larger research agenda where we have focused on how to better represent the long tail = making AI work for almost all real world distributions.

Cohere Labs @cohereforai.bsky.social · Jan 17

How can we mitigate the disparate effect of compression 🗜️on model performance for low-resource languages 💬?

Check out our cross-institutional collaboration discusses intriguing & previously unknown generalisation properties of compression.

📜Learn more: arxiv.org/abs/2211.02738

January 18, 2025 at 4:57 AM

Sara Hooker

@sarahooker.bsky.social

Last year we published a fantastic cross-institutional survey on efficiency techniques for language models.

Comprehensive and a good starting pointing for researchers working on efficiency.

Cohere Labs @cohereforai.bsky.social · Jan 16

How do we do more 🐘 with less 🐁?

In an era of ever larger models, work on efficiency is ever more important. This cross-institutional collaboration provides a survey of the field for practitioners and researchers alike ⚙️.

📜Learn more: arxiv.org/pdf/2209.000...

January 17, 2025 at 5:23 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news