Lightnews — Scholar-powered news

Shayne Longpre @shaynelongpre.bsky.social · May 6

@seungonekim.bsky.social, who led the effort, is one of the best young AI researchers I’ve ever worked with.

He has done some of the best research on fine-grained, scalable, and human-aligned LLM-as-a-judge evaluation.

➡️ Flask
➡️ Prometheus 1 & 2
➡️ Multilingual Prometheus
➡️ KMMLU
➡️ BigGen Bench

3

Shayne Longpre @shaynelongpre.bsky.social · May 6

Paper: arxiv.org/pdf/2406.05761
Code: github.com/prometheus-e...

2/

arxiv.org

1 2

Shayne Longpre @shaynelongpre.bsky.social · May 6

Delighted to see BigGen Bench paper receive the 🏆best paper award 🏆at NAACL 2025!

BigGen Bench introduces fine-grained, scalable, & human-aligned evaluations:

📈 77 hard, diverse tasks
🛠️ 765 exs w/ ex-specific rubrics
📋 More human-aligned than previous rubrics
🌍 10 languages, by native speakers

1/

1 3 21

Reposted by Shayne Longpre

Sara Hooker @sarahooker.bsky.social · Apr 30

It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.

3 9 40

Reposted by Shayne Longpre

Knight First Amendment Institute @knightcolumbia.org · Apr 30

How should regulatory proposals adapt to the prevalence of general-purpose AI when the global geopolitical order is being reconfigured? @atoosakz.bsky.social, Deirdre K. Mulligan, @randomwalker.bsky.social, @alondra.bsky.social, & @shaynelongpre.bsky.social weigh in:
youtu.be/cRsbjGFPJaM?...

Day 1 Opening Remarks and Panel 1: Regulating AI in Democratic Upheaval (AI & Democratic Freedoms)

YouTube video by Knight First Amendment Institute

youtu.be

2 4

Shayne Longpre @shaynelongpre.bsky.social · Apr 22

🛬 in Singapore for #ICLR2025!

DM me to catch up—but only if you have a local food/bar/event rec!

4

Shayne Longpre @shaynelongpre.bsky.social · Apr 14

Also, check out the MIT Tech Review article: www.technologyreview.com/2024/12/18/1...

Thank you to the team and advisors!

🧵/

This is where the data to build AI comes from

New findings show how the sources of data are concentrating power in the hands of the most powerful tech companies.

www.technologyreview.com

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 14

We analyzed 4,000 datasets, 800+ sources, 600+ languages, & 67 countries.

Most surprising to me is despite some growth in language/geographic coverage, representation hasn’t significantly improved in a decade.

Check out the paper: arxiv.org/pdf/2412.17847

2/

arxiv.org

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 14

Thrilled our global data ecosystem audit was accepted to #ICLR2025!

Empirically, it shows:

1️⃣ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024).

2️⃣ YouTube is now 70%+ of speech/video data but could block third-party collection.

3️⃣ <0.2% of data from Africa/South America.

1/

1 4 12

Reposted by Shayne Longpre

Knight First Amendment Institute @knightcolumbia.org · Apr 11

📍EVENT: Day 2 of our “AI and Democracy” symposium will be kicking off shortly. Programming will begin with welcome remarks from George Deodatis @columbiaseas.bsky.social at 9:30am ET. #AIDemocraticFreedoms

Watch the full event on our livestream here:
www.youtube.com/watch?v=X1gj...

Artificial Intelligence and Democratic Freedoms (Day 2)

YouTube video by Knight First Amendment Institute

www.youtube.com

1 4 13

Reposted by Shayne Longpre

Marzieh Fadaee @mziizm.bsky.social · Apr 10

Very excited to release Kaleidoscope—a multilingual, multimodal evaluation set for VLMs, built as part of our open-science initiative!

🌍 18 languages (high-, mid-, low-)
📚 21k questions (55% require image understanding)
🧪 STEM, social science, reasoning, and practical skills

1 4 10

Reposted by Shayne Longpre

Knight First Amendment Institute @knightcolumbia.org · Apr 10

Panel 1: Regulating AI in a Time of Democratic Upheaval starts in approximately 5 minutes.

Panelists: @atoosakz.bsky.social, @randomwalker.bsky.social, @alondra.bsky.social, and Deirdre K. Mulligan.
Moderator: @shaynelongpre.bsky.social.
#AIDemocraticFreedoms

1 2 4

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

🪲AI bug bounties: As AI systems are given more control/autonomy, the surface area for possible flaws grows. Organizations will increasingly rely on community help to identify and address vulnerabilities, multilingually, and across application stacks.

🧵/

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

🤖Agents in the browser: Expect more asynchronous software/account usage on our behalf. Speed and usability are key—Operator, for example, still feels slow and clunky right now.

9/

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

🔍User experience & interfaces: Especially for coding, the competitive advantage from the interface (e.g., dynamic multi-turn code editing in OpenAI or Anthropic playgrounds), and the interoperability with existing tools and applications, may become more important than the models themselves.

8/

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

Looking ahead to 2025, I expect a few other trends to emerge more prominently:

7/

1 1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

🐞Web data wars & exclusivity: More websites are restricting AI crawlers with robots.txt, ToS, lawsuits, and other anti-crawling measures. AI developers frequently circumvent these restrictions or negotiate exclusive deals for key data, dividing up access on the web.

6/

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

🏅Who leads? A once two-horse race now features many players—Google, OpenAI, Anthropic, Meta, xAI, Deepseek, Mistral, new startups, and API wrappers all competing in the Chatbot Arena. The performance gap between open and closed, domestic and foreign, continues to narrow.

5/

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

⚙️Efficiency: AI systems are more efficient, affordable, and accessible. Test-time reasoning has unlocked greater capabilities from smaller models. Deepseek demonstrated once the “right recipe” is found, training is cheaper than expected.

4/

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

⏩Model capabilities: 2024 benchmarks improved significantly in science/math (GPQA), coding (SWE-Bench), tool use, and video gen.

🔐Privacy and security concerns: Organizations are increasingly focused on using their internal, sensitive data with AI, which can be at odds with protecting it.

3/

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

📈Rising adoption: 78% of organizations reported using AI in some form, up from 55% the previous year.

💰Private investment: The US hit $109B, dwarfing China’s $9B and the UK’s $5B.

2/

1

Shayne Longpre @shaynelongpre.bsky.social · Apr 9

This week, @stanfordhai.bsky.social released the 2025 AI Index. It’s well worth reading to understand the evolving ecosystem of AI. Some highlights that stood out to me:

1/

1 1 4

Shayne Longpre @shaynelongpre.bsky.social · Apr 1

Excited to speak at the workshop on Technical AI Governance in Vancouver this summer!

#ICML2025

Technical AI Governance @ ICML 2025 @taig-icml.bsky.social · Apr 1

📣We’re thrilled to announce the first workshop on Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Join us (& this stellar list of speakers) in bringing together technical & policy experts to shape the future of AI governance! www.taig-icml.com

2 8

Reposted by Shayne Longpre

Andy Tseng @andytseng.bsky.social · Mar 17

#AI is evolving fast, and so are its flaws. A fresh approach to finding and reporting AI bugs is long overdue. Great initiative by @shaynelongpre.bsky.social and team, transparency and accountability in AI development are essential! #AISafety #ResponsibleAI #AIEthics #MIT

Researchers Propose a Better Way to Report Dangerous AI Flaws

After identifying major flaws in popular AI models, researchers are pushing for a new system to identify and report bugs.

www.wired.com

1 2 6

Reposted by Shayne Longpre

WIRED @wired.com · Mar 13

After identifying major flaws in popular AI models, researchers are pushing for a new system to identify and report bugs.

Researchers Propose a Better Way to Report Dangerous AI Flaws

After identifying major flaws in popular AI models, researchers are pushing for a new system to identify and report bugs.

wrd.cm

6 17 110