Lightnews — Scholar-powered news

Xiaoyan Bai

@elenal3ai.bsky.social

Will be at #NeurIPS2025 presenting “Concept Incongruence”!

🦄🦆 Curious about a unicorn duck? Stop by, get one, and chat with us!

We made a new demo for detecting hidden conflicts in system prompts to spot “concept incongruence” for safer prompts.

🔗: github.com/ChicagoHAI/d...

🗓️ Dec 3 11AM - 2PM

November 24, 2025 at 7:18 PM

Xiaoyan Bai

@elenal3ai.bsky.social

Research agents are getting smarter. They can write convincing PhD-level reports 🧑‍🔬

But has anyone checked if the way they find their results makes any sense?

Our framework, MechEvalAgents, verifies the science, not just the story 🤖

1/n🧵

November 20, 2025 at 9:46 PM

Reposted by Xiaoyan Bai

Haokun Liu

@haokunliu.bsky.social

We're launching a weekly competition where the community decides which research ideas get implemented. Every week, we'll take the top 3 ideas from IdeaHub, run experiments with AI agents, and share everything: code, successes, and failures.

It's completely free and we'll try out ideas for you!

November 10, 2025 at 9:32 PM

Reposted by Xiaoyan Bai

Lexing Xie

@lexingxie.bsky.social

Identifying human morals and values in language is crucial for analysing lots of human- and AI-generated text.

We introduce "MoVa: Towards Generalizable Classification of Human Morals and Values" - to be presented at @emnlpmeeting.bsky.social oral session next Thu #CompSocialScience #LLMs
🧵 (1/n)

October 30, 2025 at 12:20 AM

Xiaoyan Bai

@elenal3ai.bsky.social

🕸️ Here’s a network showing how much different models predict each other as the author of some text!

October 28, 2025 at 1:55 AM

Xiaoyan Bai

@elenal3ai.bsky.social

❓ Does an LLM know thyself? 🪞
Humans pass the mirror test at ~18 months 👶
But what about LLMs? Can they recognize their own writing—or even admit authorship at all?
In our new paper, we put 10 state-of-the-art models to the test. Read on 👇
1/n 🧵

October 27, 2025 at 5:36 PM

Xiaoyan Bai

@elenal3ai.bsky.social

In our new work, we reverse-engineer two models: a standard fine-tuned (SFT), and an implicit chain-of-thought (ICoT) model to see why models struggle with multi-digit multiplication.

👉Check out the paper here: arxiv.org/abs/2510.00184
🎉Big thanks to all my amazing collaborators!

October 24, 2025 at 7:04 PM

Reposted by Xiaoyan Bai

chenhaotan.bsky.social

@chenhaotan.bsky.social

AI can accelerate scientific discovery, but only if we get the scientist–AI interaction right.

The dream of “autonomous AI scientists” is tempting:
machines that generate hypotheses, run experiments, and write papers. But science isn’t just automation.

cichicago.substack.com/p/the-mirage...
🧵

The Mirage of Autonomous AI Scientists

Science as AI’s killer application cannot succeed without scientist-AI interaction: Introducing Hypogenic.ai.

cichicago.substack.com

October 23, 2025 at 6:55 PM

Reposted by Xiaoyan Bai

Dang Nguyen

@divingwithorcas.bsky.social

HR Simulator™: a game where you gaslight, deflect, and “let’s circle back” your way to victory.
Every email a boss fight, every “per my last message” a critical hit… or maybe you just overplayed your hand 🫠
Can you earn Enlightened Bureaucrat status?

(link below!)

September 26, 2025 at 6:41 PM

Reposted by Xiaoyan Bai

chenhaotan.bsky.social

@chenhaotan.bsky.social

🚀 We’re thrilled to announce the upcoming AI & Scientific Discovery online seminar! We have an amazing lineup of speakers.

This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.

ai-scientific-discovery.github.io

September 25, 2025 at 6:28 PM

Reposted by Xiaoyan Bai

chenhaotan.bsky.social

@chenhaotan.bsky.social

As AI becomes increasingly capable of conducting analyses and following instructions, my prediction is that the role of scientists will increasingly focus on identifying and selecting important problems to work on ("selector"), and effectively evaluating analyses performed by AI ("evaluator").

September 16, 2025 at 3:07 PM

Reposted by Xiaoyan Bai

chenhaotan.bsky.social

@chenhaotan.bsky.social

We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL. The workshop will explore how AI can advance scientific discovery. Please use this Google form to indicate your interest (corrected link):

forms.gle/MFcdKYnckNno...

More in the 🧵! Please share! #MLSky 🧠

Program Committee Interest for the Second Workshop on AI & Scientific Discovery

We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL (Annual meetings of The Association for Computational Linguistics, the European Language Resource Association and Internat...

forms.gle

August 29, 2025 at 4:00 PM

Xiaoyan Bai

@elenal3ai.bsky.social

⚡️Ever asked an LLM-as-Marilyn Monroe about the 2020 election? Our paper calls this concept incongruence, common in both AI and how humans create and reason.
🧠Read my blog to learn what we found, why it matters for AI safety and creativity, and what's next: cichicago.substack.com/p/concept-in...

July 31, 2025 at 7:06 PM

Reposted by Xiaoyan Bai

chenhaotan.bsky.social

@chenhaotan.bsky.social

Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering.

This is holding us back. 🧵and new paper with @ari-holtzman.bsky.social .

July 9, 2025 at 8:07 PM

Reposted by Xiaoyan Bai

chenhaotan.bsky.social

@chenhaotan.bsky.social

When you walk into the ER, you could get a doc:
1. Fresh from a week of not working
2. Tired from working too many shifts

@oziadias.bsky.social has been both and thinks that they're different! But can you tell from their notes? Yes we can! Paper @natcomms.nature.com www.nature.com/articles/s41...

July 2, 2025 at 7:22 PM

Xiaoyan Bai

@elenal3ai.bsky.social

Humbled to receive an honorable mention🌟

chenhaotan.bsky.social @chenhaotan.bsky.social · Jun 24

Congratulations to all best poster awards and honorable mentions!

June 25, 2025 at 8:56 AM

Reposted by Xiaoyan Bai

chenhaotan.bsky.social

@chenhaotan.bsky.social

Since @elenal3ai.bsky.social cannot make it, I presented the poster on concept incongruence: arxiv.org/abs/2505.14905

June 23, 2025 at 7:18 PM

Xiaoyan Bai

@elenal3ai.bsky.social

I am glad that you found our paper entertaining! This is a great point for my follow-up thread on the implications of concept incongruence. Our main goal is to raise awareness and provide clarity around concept incongruence.

Leshem (Legend) Choshen @EMNLP @lchoshen.bsky.social · May 27

Highly entertaining paper and writeup, but does it really matter? Is it important that models can't abstain on counterfactuals?
Or that the leak information?

Xiaoyan Bai @elenal3ai.bsky.social · May 27

🚨 New paper alert 🚨

Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️

1/n 🧵

May 28, 2025 at 12:56 PM

Xiaoyan Bai

@elenal3ai.bsky.social

🚨 New paper alert 🚨

Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️

1/n 🧵

May 27, 2025 at 1:59 PM

Reposted by Xiaoyan Bai

Mourad Heddaya

@mheddaya.bsky.social

🧑‍⚖️How well can LLMs summarize complex legal documents? And can we use LLMs to evaluate?

Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!

May 1, 2025 at 7:25 PM

Reposted by Xiaoyan Bai

Haokun Liu

@haokunliu.bsky.social

🚀🚀🚀Excited to share our latest work: HypoBench, a systematic benchmark for evaluating LLM-based hypothesis generation methods!

There is much excitement about leveraging LLMs for scientific hypothesis generation, but principled evaluations are missing - let’s dive into HypoBench together.

April 28, 2025 at 7:35 PM

Reposted by Xiaoyan Bai

chenhaotan.bsky.social

@chenhaotan.bsky.social

Encourage your students to submit posters and register! Limited free housing is provided for student participants only, on a first-come (i.e., request)-first-serve basis.

We are also actively looking for sponsors. Reach out if you are interested!

Please repost! Help spread the words!

chenhaotan.bsky.social @chenhaotan.bsky.social · Apr 21

The Midwest Machine Learning Symposium will happen in Chicago on June 23-4 on the University of Chicago campus (midwest-ml.org/2025/). We have an amazing lineup of speakers:@profsanjeevarora.bsky.social from Princeton, Heng Ji from UIUC, Tuomas Sandholm from CMU, @ravenben.bsky.social from UChicago.

April 21, 2025 at 3:12 PM

Reposted by Xiaoyan Bai

Dang Nguyen

@divingwithorcas.bsky.social

1/n

You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?

April 14, 2025 at 7:55 PM

Reposted by Xiaoyan Bai

Julia Mendelsohn

@jmendelsohn2.bsky.social

New preprint!
Metaphors shape how people understand politics, but measuring them (& their real-world effects) is hard.

We develop a new method to measure metaphor & use it to study dehumanizing metaphor in 400K immigration tweets Link: bit.ly/4i3PGm3

#NLP #NLProc #polisky #polcom #compsocialsci
🐦🐦

Screenshot of top half of first page of paper. The paper is titled: "When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models". The authors are Julia Mendelsohn (University of Chicago) and Ceren Budak (University of Michigan). The top right corner contains a visual showing the sentence "They want immigrants to pour into and infest this country". The caption says: Figure 1: Dehumanizing sentence likening immigrants to the source domain concepts of Water and Vermin via the words "pour" and "infest".

The abstract text on the left reads: Metaphor, discussing one concept in terms of another, is abundant in politics and can shape how people understand important issues. We develop a computational approach to measure metaphorical language, focusing on immigration discourse on social media. Grounded in qualitative social science research, we identify seven concepts evoked in immigration discourse (e.g. "water" or "vermin"). We propose and evaluate a novel technique that leverages both word-level and document-level signals to measure metaphor with respect to these concepts. We then study the relationship between metaphor, political ideology, and user engagement in 400K US tweets about immigration. While conservatives tend to use dehumanizing metaphors more than liberals, this effect varies widely across concepts. Moreover, creature-related metaphor is associated with more retweets, especially for liberal authors. Our work highlights the potential for computational methods to complement qualitative approaches in understanding subtle and implicit language in political discourse.

February 20, 2025 at 7:59 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news