Shayne Longpre
@shaynelongpre.bsky.social
4.2K followers 320 following 81 posts
PhD @ MIT. Prev: Google Deepmind, Apple, Stanford. 🇨🇦 Interests: AI/ML/NLP, Data-centric AI, transparency & societal impact
Posts Media Videos Starter Packs
shaynelongpre.bsky.social
@seungonekim.bsky.social, who led the effort, is one of the best young AI researchers I’ve ever worked with.

He has done some of the best research on fine-grained, scalable, and human-aligned LLM-as-a-judge evaluation.

➡️ Flask
➡️ Prometheus 1 & 2
➡️ Multilingual Prometheus
➡️ KMMLU
➡️ BigGen Bench
shaynelongpre.bsky.social
Delighted to see BigGen Bench paper receive the 🏆best paper award 🏆at NAACL 2025!

BigGen Bench introduces fine-grained, scalable, & human-aligned evaluations:

📈 77 hard, diverse tasks
🛠️ 765 exs w/ ex-specific rubrics
📋 More human-aligned than previous rubrics
🌍 10 languages, by native speakers

1/
Reposted by Shayne Longpre
sarahooker.bsky.social
It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
shaynelongpre.bsky.social
🛬 in Singapore for #ICLR2025!

DM me to catch up—but only if you have a local food/bar/event rec!
shaynelongpre.bsky.social
We analyzed 4,000 datasets, 800+ sources, 600+ languages, & 67 countries.

Most surprising to me is despite some growth in language/geographic coverage, representation hasn’t significantly improved in a decade.

Check out the paper: arxiv.org/pdf/2412.17847

2/
arxiv.org
shaynelongpre.bsky.social
Thrilled our global data ecosystem audit was accepted to #ICLR2025!

Empirically, it shows:

1️⃣ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024).

2️⃣ YouTube is now 70%+ of speech/video data but could block third-party collection.

3️⃣ <0.2% of data from Africa/South America.

1/
Reposted by Shayne Longpre
knightcolumbia.org
📍EVENT: Day 2 of our “AI and Democracy” symposium will be kicking off shortly. Programming will begin with welcome remarks from George Deodatis @columbiaseas.bsky.social at 9:30am ET. #AIDemocraticFreedoms

Watch the full event on our livestream here:
www.youtube.com/watch?v=X1gj...
Artificial Intelligence and Democratic Freedoms (Day 2)
YouTube video by Knight First Amendment Institute
www.youtube.com
Reposted by Shayne Longpre
mziizm.bsky.social
Very excited to release Kaleidoscope—a multilingual, multimodal evaluation set for VLMs, built as part of our open-science initiative!

🌍 18 languages (high-, mid-, low-)
📚 21k questions (55% require image understanding)
🧪 STEM, social science, reasoning, and practical skills
Reposted by Shayne Longpre
knightcolumbia.org
Panel 1: Regulating AI in a Time of Democratic Upheaval starts in approximately 5 minutes.

Panelists: @atoosakz.bsky.social, @randomwalker.bsky.social, @alondra.bsky.social, and Deirdre K. Mulligan.
Moderator: @shaynelongpre.bsky.social.
#AIDemocraticFreedoms
shaynelongpre.bsky.social
🪲AI bug bounties: As AI systems are given more control/autonomy, the surface area for possible flaws grows. Organizations will increasingly rely on community help to identify and address vulnerabilities, multilingually, and across application stacks.

🧵/
shaynelongpre.bsky.social
🤖Agents in the browser: Expect more asynchronous software/account usage on our behalf. Speed and usability are key—Operator, for example, still feels slow and clunky right now.

9/
shaynelongpre.bsky.social
🔍User experience & interfaces: Especially for coding, the competitive advantage from the interface (e.g., dynamic multi-turn code editing in OpenAI or Anthropic playgrounds), and the interoperability with existing tools and applications, may become more important than the models themselves.

8/
shaynelongpre.bsky.social
Looking ahead to 2025, I expect a few other trends to emerge more prominently:

7/
shaynelongpre.bsky.social
🐞Web data wars & exclusivity: More websites are restricting AI crawlers with robots.txt, ToS, lawsuits, and other anti-crawling measures. AI developers frequently circumvent these restrictions or negotiate exclusive deals for key data, dividing up access on the web.

6/
shaynelongpre.bsky.social
🏅Who leads? A once two-horse race now features many players—Google, OpenAI, Anthropic, Meta, xAI, Deepseek, Mistral, new startups, and API wrappers all competing in the Chatbot Arena. The performance gap between open and closed, domestic and foreign, continues to narrow.

5/
shaynelongpre.bsky.social
⚙️Efficiency: AI systems are more efficient, affordable, and accessible. Test-time reasoning has unlocked greater capabilities from smaller models. Deepseek demonstrated once the “right recipe” is found, training is cheaper than expected.

4/
shaynelongpre.bsky.social
⏩Model capabilities: 2024 benchmarks improved significantly in science/math (GPQA), coding (SWE-Bench), tool use, and video gen.

🔐Privacy and security concerns: Organizations are increasingly focused on using their internal, sensitive data with AI, which can be at odds with protecting it.

3/
shaynelongpre.bsky.social
📈Rising adoption: 78% of organizations reported using AI in some form, up from 55% the previous year.

💰Private investment: The US hit $109B, dwarfing China’s $9B and the UK’s $5B.

2/
shaynelongpre.bsky.social
This week, @stanfordhai.bsky.social released the 2025 AI Index. It’s well worth reading to understand the evolving ecosystem of AI. Some highlights that stood out to me:

1/
shaynelongpre.bsky.social
Excited to speak at the workshop on Technical AI Governance in Vancouver this summer!

#ICML2025
taig-icml.bsky.social
📣We’re thrilled to announce the first workshop on Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Join us (& this stellar list of speakers) in bringing together technical & policy experts to shape the future of AI governance! www.taig-icml.com
Reposted by Shayne Longpre
andytseng.bsky.social
#AI is evolving fast, and so are its flaws. A fresh approach to finding and reporting AI bugs is long overdue. Great initiative by @shaynelongpre.bsky.social and team, transparency and accountability in AI development are essential! #AISafety #ResponsibleAI #AIEthics #MIT
Researchers Propose a Better Way to Report Dangerous AI Flaws
After identifying major flaws in popular AI models, researchers are pushing for a new system to identify and report bugs.
www.wired.com
Reposted by Shayne Longpre