Lightnews — Scholar-powered news

Reposted by Jenna Russell

Chau Minh Pham @chautmpham.bsky.social · Jun 3

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?

🧟 You get what we call a Frankentext!

💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.

1 7 33

Reposted by Jenna Russell

Shana Gadarian @sgadarian.bsky.social · Apr 8

International students will stop coming to American universities if their visas are going to be at risk. This will make our intellectual community poorer and also make tuition more expensive for domestic students.

Jaweed Kaleem @jaweedkaleem.bsky.social · Apr 8

UPDATED: At least 83 students -- at campuses for University of California, California State University and Stanford -- have had their visas revoked as of Monday evening.

Jaweed Kaleem @jaweedkaleem.bsky.social · Apr 7

LATEST: At least 45 student visas across the state have been revoked by the Trump administration, California universities report, as numbers grow. A lawsuit has been filed in a Los Angeles federal court against DHS and Kristi Noem www.latimes.com/california/s...

7 160 590

Reposted by Jenna Russell

John Skiles Skinner @skiles.blue · Mar 21

There is a quasi-religion in Silicon Valley that views AI as godlike. This faith has always been parallel to Evangelical Christianity: salvation (transhumanism), the rapture (the technological singularity), and demons (Roko's Basilisk)

Lately the AI faith has fully fused with Christian Nationalism.

100 1.4K 6K

Reposted by Jenna Russell

Yixiao Song @yixiaosong.bsky.social · Mar 12

Introducing 🐻 BEARCUBS 🐻, a “small but mighty” dataset of 111 QA pairs designed to assess computer-using web agents in multimodal interactions on the live web!
✅ Humans achieve 85% accuracy
❌ OpenAI Operator: 24%
❌ Anthropic Computer Use: 14%
❌ Convergence AI Proxy: 13%

1 5 12

Reposted by Jenna Russell

Yekyung Kim @yekyung.bsky.social · Mar 5

Is the needle-in-a-haystack test still meaningful given the giant green heatmaps in modern LLM papers?

We create ONERULER 💍, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!

Our analysis across 26 languages 🧵👇

1 5 14

Reposted by Jenna Russell

Chau Minh Pham @chautmpham.bsky.social · Feb 21

⚠️Current methods for generating instruction-following data fall short for long-range reasoning tasks like narrative claim verification.

We present CLIPPER ✂️, a compression-based pipeline that produces grounded instructions for ~$0.5 each, 34x cheaper than human annotations.

1 8 21

Jenna Russell @jennarussell.bsky.social · Jan 29

Also, the non experts have a range of LLM usage. Having a writing background is key, and a fact many are missing.

1

Jenna Russell @jennarussell.bsky.social · Jan 29

Hi Shane. We originally used 5 people, only 1 of whom could detect AI-generated text. I then searched out people who I thought could be experts and they had to pass multiple rounds of testing to be included in the study. Details in appendix. Nonexpert performance is already widely known.

Jenna Russell @jennarussell.bsky.social · Jan 29

This is a great question - we didn’t dive deeper than choosing articles from American publications. There were a few mentions where experts mentioned this awkward phrasing and thought it could be a non-native speaker, but still knew it was a human!

1 1

Jenna Russell @jennarussell.bsky.social · Jan 29

It would be very interesting to see if every language had their own set of “AI vocab” words 🤣

1

Jenna Russell @jennarussell.bsky.social · Jan 29

I think importantly is user who do writing tasks like editing/publishing! It’s the mix of having great language skills and frequent usage. Alot of ppl who just use LLMs a lot are way worse detectors than they think they’ll be.

4

Jenna Russell @jennarussell.bsky.social · Jan 28

Link found in last post of thread 😀 (but putting it here again) arxiv.org/abs/2501.15654

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text

In this paper, we study how well humans can detect text generated by commercial LLMs (GPT-4o, Claude, o1). We hire annotators to read 300 non-fiction English articles, label them as either human-writt...

arxiv.org

1 2

Jenna Russell @jennarussell.bsky.social · Jan 28

📎 Paper: arxiv.org/abs/2501.15654
👩‍💻 Code & Data: github.com/jenna-russe...

Thanks to my amazing coauthors @markar.bsky.social and @miyyer.bsky.social and the support of UMass NLP

GitHub - jenna-russell/human_detectors

Contribute to jenna-russell/human_detectors development by creating an account on GitHub.

github.com

2 9

Jenna Russell @jennarussell.bsky.social · Jan 28

We're releasing our dataset of articles and expert annotations! 📂✨
We hope this helps users of automatic detectors understand not just if a text is AI-generated, but why. 🤖📖

1 3

Jenna Russell @jennarussell.bsky.social · Jan 28

Can LLMs mimic human expert detectors? 🤔

We prompted LLMs to imitate our expert annotators. The results show promise, outperforming detectors like Binoculars and RADAR. 🚀 However, LLMs still fall short of matching our human experts and advanced detectors like Pangram. ⚖️👥

1 2

Jenna Russell @jennarussell.bsky.social · Jan 28

What they get wrong: ❌

Sometimes, humans get tripped up by:
📚 Common "AI vocab" words in human-written texts
✍️ Grammar mistakes they assume "AI wouldn’t make"
🌀🗣️ One expert was often fooled by o1's use of informal language - like slang, contractions, and colloquialisms.

1 2 7

Jenna Russell @jennarussell.bsky.social · Jan 28

What experts get right: ✅

They spot telltale signs of AI, like:
📚 "AI Vocab" (delve, crucial, vibrant ...)
🔄 Predictable sentence structure
🗨️ Quotes that feel too polished

For human-written content, they look for:
🎨 Creativity
🎭 Stylistic quirks
🌊 A natural & clear flow

1 3 13

Jenna Russell @jennarussell.bsky.social · Jan 28

Across GPT-4o, Claude, and o1 articles, experts correctly identified 99.3% of AI-generated content without misclassifying any human-written articles.🕵️‍♀️

Among automatic detectors, Pangram significantly outperformed the rest, missing only a few more texts than the experts. 🔍⚡

1 10

Jenna Russell @jennarussell.bsky.social · Jan 28

We asked experts - ranging from beginners to ChatGPT pros - to decide if articles were human-written or AI-generated.

They highlighted potential clues 🔎 in the text and explained why they made their decisions. 🧠

1 1 9

Jenna Russell @jennarussell.bsky.social · Jan 28

People often claim they know when ChatGPT wrote something, but are they as accurate as they think?

Turns out that while general population is unreliable, those who frequently use ChatGPT for writing tasks can spot even "humanized" AI-generated text with near-perfect accuracy 🎯

10 66 190

Jenna Russell @jennarussell.bsky.social · Jan 6

I love that there are other cosmere fans NLP!! I’m halfway thru SA5, but my unpopular opinion is that I LOVED Elantris and could barely finish Sunlit Man or Yumi. Tress is one of my favs, highly recommend.

2 2

Jenna Russell @jennarussell.bsky.social · Nov 24

🙋‍♀️

1

Reposted by Jenna Russell

brendan o’connor (going to COLM) @brenocon.bsky.social · Nov 19

We're hiring new #nlp faculty this year!

Asst or Assoc Professors in NLP at UMass CICS --
careers.umass.edu/amherst/en-u...

Details - Assistant/Associate Professor - Natural Language Processing (NLP) | Human Resources | UMass Amherst

careers.umass.edu

1 34 66

Reposted by Jenna Russell

Marzena Karpinska ✈️ COLM'25 @markar.bsky.social · Nov 11

If you are at #EMNLP2024 you should really check this work from our lab: github.com/Yixiao-Song/... (poster: Tue 4:00-5:30)
If you aren't you should still read the paper! It's a great metric to use and build upon!

GitHub - Yixiao-Song/VeriScore

Contribute to Yixiao-Song/VeriScore development by creating an account on GitHub.

github.com

1 2 8

Reposted by Jenna Russell

Yapei Chang @yapeichang.bsky.social · Nov 10

🌊Heading to #EMNLP2024 tmr, presenting PostMark on Tue. morning! 🔗 arxiv.org/abs/2406.14517

Aside from this, I'd love to chat about:
• long-context training
• realistic & hard eval
• synthetic data
• tbh any cool projects people are working on

Also, I'm on the lookout for a summer 2025 internship!

4 6