Jenna Russell
@jennarussell.bsky.social
930 followers 390 following 16 posts
CS PhD Student @ UMD Undergrad @ Cornell https://jenna-russell.github.io/
Posts Media Videos Starter Packs
Reposted by Jenna Russell
chautmpham.bsky.social
🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?

🧟 You get what we call a Frankentext!

💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.
Reposted by Jenna Russell
sgadarian.bsky.social
International students will stop coming to American universities if their visas are going to be at risk. This will make our intellectual community poorer and also make tuition more expensive for domestic students.
jaweedkaleem.bsky.social
UPDATED: At least 83 students -- at campuses for University of California, California State University and Stanford -- have had their visas revoked as of Monday evening.
jaweedkaleem.bsky.social
LATEST: At least 45 student visas across the state have been revoked by the Trump administration, California universities report, as numbers grow. A lawsuit has been filed in a Los Angeles federal court against DHS and Kristi Noem www.latimes.com/california/s...
Reposted by Jenna Russell
skiles.blue
There is a quasi-religion in Silicon Valley that views AI as godlike. This faith has always been parallel to Evangelical Christianity: salvation (transhumanism), the rapture (the technological singularity), and demons (Roko's Basilisk)

Lately the AI faith has fully fused with Christian Nationalism.
Reposted by Jenna Russell
yixiaosong.bsky.social
Introducing 🐻 BEARCUBS 🐻, a “small but mighty” dataset of 111 QA pairs designed to assess computer-using web agents in multimodal interactions on the live web!
✅ Humans achieve 85% accuracy
❌ OpenAI Operator: 24%
❌ Anthropic Computer Use: 14%
❌ Convergence AI Proxy: 13%
Reposted by Jenna Russell
yekyung.bsky.social
Is the needle-in-a-haystack test still meaningful given the giant green heatmaps in modern LLM papers?

We create ONERULER 💍, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!

Our analysis across 26 languages 🧵👇
Reposted by Jenna Russell
chautmpham.bsky.social
⚠️Current methods for generating instruction-following data fall short for long-range reasoning tasks like narrative claim verification.

We present CLIPPER ✂️, a compression-based pipeline that produces grounded instructions for ~$0.5 each, 34x cheaper than human annotations.
jennarussell.bsky.social
Also, the non experts have a range of LLM usage. Having a writing background is key, and a fact many are missing.
jennarussell.bsky.social
Hi Shane. We originally used 5 people, only 1 of whom could detect AI-generated text. I then searched out people who I thought could be experts and they had to pass multiple rounds of testing to be included in the study. Details in appendix. Nonexpert performance is already widely known.
jennarussell.bsky.social
This is a great question - we didn’t dive deeper than choosing articles from American publications. There were a few mentions where experts mentioned this awkward phrasing and thought it could be a non-native speaker, but still knew it was a human!
jennarussell.bsky.social
It would be very interesting to see if every language had their own set of “AI vocab” words 🤣
jennarussell.bsky.social
I think importantly is user who do writing tasks like editing/publishing! It’s the mix of having great language skills and frequent usage. Alot of ppl who just use LLMs a lot are way worse detectors than they think they’ll be.
jennarussell.bsky.social
We're releasing our dataset of articles and expert annotations! 📂✨
We hope this helps users of automatic detectors understand not just if a text is AI-generated, but why. 🤖📖
jennarussell.bsky.social
Can LLMs mimic human expert detectors? 🤔

We prompted LLMs to imitate our expert annotators. The results show promise, outperforming detectors like Binoculars and RADAR. 🚀 However, LLMs still fall short of matching our human experts and advanced detectors like Pangram. ⚖️👥
jennarussell.bsky.social
What they get wrong: ❌

Sometimes, humans get tripped up by:
📚 Common "AI vocab" words in human-written texts
✍️ Grammar mistakes they assume "AI wouldn’t make"
🌀🗣️ One expert was often fooled by o1's use of informal language - like slang, contractions, and colloquialisms.
jennarussell.bsky.social
What experts get right: ✅

They spot telltale signs of AI, like:
📚 "AI Vocab" (delve, crucial, vibrant ...)
🔄 Predictable sentence structure
🗨️ Quotes that feel too polished

For human-written content, they look for:
🎨 Creativity
🎭 Stylistic quirks
🌊 A natural & clear flow
jennarussell.bsky.social
Across GPT-4o, Claude, and o1 articles, experts correctly identified 99.3% of AI-generated content without misclassifying any human-written articles.🕵️‍♀️

Among automatic detectors, Pangram significantly outperformed the rest, missing only a few more texts than the experts. 🔍⚡
jennarussell.bsky.social
We asked experts - ranging from beginners to ChatGPT pros - to decide if articles were human-written or AI-generated.

They highlighted potential clues 🔎 in the text and explained why they made their decisions. 🧠
jennarussell.bsky.social
People often claim they know when ChatGPT wrote something, but are they as accurate as they think?

Turns out that while general population is unreliable, those who frequently use ChatGPT for writing tasks can spot even "humanized" AI-generated text with near-perfect accuracy 🎯
jennarussell.bsky.social
I love that there are other cosmere fans NLP!! I’m halfway thru SA5, but my unpopular opinion is that I LOVED Elantris and could barely finish Sunlit Man or Yumi. Tress is one of my favs, highly recommend.
Reposted by Jenna Russell
markar.bsky.social
If you are at #EMNLP2024 you should really check this work from our lab: github.com/Yixiao-Song/... (poster: Tue 4:00-5:30)
If you aren't you should still read the paper! It's a great metric to use and build upon!
GitHub - Yixiao-Song/VeriScore
Contribute to Yixiao-Song/VeriScore development by creating an account on GitHub.
github.com
Reposted by Jenna Russell
yapeichang.bsky.social
🌊Heading to #EMNLP2024 tmr, presenting PostMark on Tue. morning! 🔗 arxiv.org/abs/2406.14517

Aside from this, I'd love to chat about:
• long-context training
• realistic & hard eval
• synthetic data
• tbh any cool projects people are working on

Also, I'm on the lookout for a summer 2025 internship!