Lightnews — Scholar-powered news

Reposted by Joachim Baumann

MilaNLP Lab

@milanlp.bsky.social

For our weekly reading group, @joachimbaumann.bsky.social presented the upcoming PNAS article "The potential existential threat of large language models to online survey research" by @
@seanjwestwood.bsky.social.

November 20, 2025 at 11:54 AM

Reposted by Joachim Baumann

Aleksandra Urman

@aurman21.bsky.social

Google AI overviews now reach over 2B users worldwide. But how reliable are they on high stakes topics - for instance, pregnancy and baby care?

We have a new paper - led by Desheng Hu, now accepted at @icwsm.bsky.social - exploring that and finding many issues

Preprint: arxiv.org/abs/2511.12920
🧵👇

Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy

Google Search increasingly surfaces AI-generated content through features like AI Overviews (AIO) and Featured Snippets (FS), which users frequently rely on despite having no control over their presen...

arxiv.org

November 19, 2025 at 4:58 PM

Reposted by Joachim Baumann

Dallas Card

@dallascard.bsky.social

Trying an experiment in good old-fashioned blogging about papers: dallascard.github.io/granular-mat...

Language Model Hacking - Granular Material

dallascard.github.io

November 16, 2025 at 7:51 PM

Reposted by Joachim Baumann

Clint Claessen

@clint0475.bsky.social

Next Wednesday, we are very excited to have
@joachimbaumann.bsky.social, who will present co-authored work on "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". Paper and information on how to join ⬇️

November 8, 2025 at 1:31 PM

Reposted by Joachim Baumann

Tiancheng Hu

@tiancheng.bsky.social

Can AI simulate human behavior? 🧠
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)

October 28, 2025 at 4:54 PM

Joachim Baumann

@joachimbaumann.bsky.social

Cool paper by @eddieyang.bsky.social, confirming our LLM hacking findings (arxiv.org/abs/2509.08825):
✓ LLMs are brittle data annotators
✓ Downstream conclusions flip frequently: LLM hacking risk is real!
✓ Bias correction methods can help but have trade-offs
✓ Use human expert whenever possible

October 21, 2025 at 8:02 AM

Reposted by Joachim Baumann

Claire Gillan

@clairegillan.bsky.social

Looks interesting! We have been facing this exact issue - finding big inconsistencies across different LLMs rating the same text.

Joachim Baumann @joachimbaumann.bsky.social · Sep 12

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

September 25, 2025 at 2:58 PM

Reposted by Joachim Baumann

Social Computing Group - UZH

@scg-uzh.bsky.social

About last week’s internal hackathon 😏
Last week, we -- the (Amazing) Social Computing Group, held an internal hackathon to work on our informally called “Cultural Imperialism” project.

September 17, 2025 at 8:24 AM

Reposted by Joachim Baumann

Johannes B. Gruber

@jbgruber.bsky.social

If you feel uneasy using LLMs for data annotation, you are right (if not, you should). It offers new chances for research that is difficult with traditional #NLP/#textasdata methods, but the risk of false conclusions is high!

Experiment + *evidence-based* mitigation strategies in this preprint 👇

Joachim Baumann @joachimbaumann.bsky.social · Sep 12

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

September 15, 2025 at 1:05 PM

Joachim Baumann

@joachimbaumann.bsky.social

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

September 12, 2025 at 10:33 AM

Joachim Baumann

@joachimbaumann.bsky.social

Breaking my social media silence because this news is too good not to share! 🎉
Just joined @milanlp.bsky.social as a Postdoc, working with the amazing @dirkhovy.bsky.social on large language models and computational social science!

July 29, 2025 at 12:07 PM

Reposted by Joachim Baumann

MilaNLP Lab

@milanlp.bsky.social

🎉 The @milanlp.bsky.social lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀

July 16, 2025 at 12:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news