Lightnews — Scholar-powered news

Reposted by Mohit Iyyer

Mor Naaman

@informor.bsky.social

Well this is sure to be a blockbuster AI article... @jennarussell.bsky.social et al are kicking ass and taking names in journalism, both individuals and organizations.

"AI use in American newspapers is widespread, uneven, and rarely disclosed"
arxiv.org/abs/2510.18774

October 23, 2025 at 1:53 PM

Reposted by Mohit Iyyer

Jenna Russell

@jennarussell.bsky.social

AI is already at work in American newsrooms.

We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea.

Here's what we learned about how AI is influencing local and national journalism:

October 22, 2025 at 3:24 PM

Mohit Iyyer

@miyyer.bsky.social

Tired of AI slop? Our work on "Frankentexts" shows how LLMs can stitch together random fragments of human writing into coherent, relevant responses to arbitrary prompts.

Frankentexts are weirdly creative, and they also pose problems for AI detectors: are they AI? human? More 👇

Chau Minh Pham @chautmpham.bsky.social · Jun 3

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?

🧟 You get what we call a Frankentext!

💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.

June 3, 2025 at 4:16 PM

Mohit Iyyer

@miyyer.bsky.social

Llama 4's massive context window is impressive! However, the best Llama model for long-context understanding over books is still Llama 3.1 405B. Llama 4 Scout is especially bad at our NoCha benchmark, performing below random chance.

April 8, 2025 at 1:48 AM

Mohit Iyyer

@miyyer.bsky.social

Thinking about paying $20k/month for a "PhD-level AI agent"? You might want to wait until their web browsing skills are on par with those of human PhD students 😛 Check out our new BEARCUBS benchmark, which shows web agents struggle to perform simple multimodal browsing tasks!

Yixiao Song @yixiaosong.bsky.social · Mar 12

Introducing 🐻 BEARCUBS 🐻, a “small but mighty” dataset of 111 QA pairs designed to assess computer-using web agents in multimodal interactions on the live web!
✅ Humans achieve 85% accuracy
❌ OpenAI Operator: 24%
❌ Anthropic Computer Use: 14%
❌ Convergence AI Proxy: 13%

March 12, 2025 at 4:08 PM

Mohit Iyyer

@miyyer.bsky.social

New synthetic benchmark for multilingual long-context LLMs! Surprisingly, English and Chinese are not the top-performing languages (it's Polish!). We also observe a widening gap between high and low-resource languages as context size increases. Check out the paper for more 👇

Yekyung Kim @yekyung.bsky.social · Mar 5

Is the needle-in-a-haystack test still meaningful given the giant green heatmaps in modern LLM papers?

We create ONERULER 💍, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!

Our analysis across 26 languages 🧵👇

March 5, 2025 at 6:44 PM

Mohit Iyyer

@miyyer.bsky.social

How can we generate synthetic data for a task that requires global reasoning over a long context (e.g., verifying claims about a book)? LLMs aren't good at *solving* such tasks, let alone generating data for them. Check out our paper for a compression-based solution!

Chau Minh Pham @chautmpham.bsky.social · Feb 21

⚠️Current methods for generating instruction-following data fall short for long-range reasoning tasks like narrative claim verification.

We present CLIPPER ✂️, a compression-based pipeline that produces grounded instructions for ~$0.5 each, 34x cheaper than human annotations.

February 21, 2025 at 4:37 PM

Mohit Iyyer

@miyyer.bsky.social

Lots of recent work focuses on 𝐚𝐮𝐭𝐨𝐦𝐚𝐭𝐢𝐜 detection of LLM-generated text. But how well do 𝐡𝐮𝐦𝐚𝐧𝐬 fare? TLDR: ppl who frequently use ChatGPT for writing tasks are elite at spotting AI text! See our paper for more (and congrats to @jennarussell.bsky.social on her first paper!!)

Jenna Russell @jennarussell.bsky.social · Jan 28

People often claim they know when ChatGPT wrote something, but are they as accurate as they think?

Turns out that while general population is unreliable, those who frequently use ChatGPT for writing tasks can spot even "humanized" AI-generated text with near-perfect accuracy 🎯

January 28, 2025 at 3:12 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news