Xuhui Zhou
@nlpxuhui.bsky.social
1.8K followers 150 following 22 posts
PhD student @ltiatcmu.bsky.social. Previously, @ai2.bsky.social, @uwnlp.bsky.social, @appleinc.bsky.social, @ucberkeleyofficial.bsky.social; Social Intelligence in language +X. He/Him.🐳
Posts Media Videos Starter Packs
Reposted by Xuhui Zhou
nlpxuhui.bsky.social
Wonderful collaborations with Zhe Su, Anubha Kabra, Sanketh Rangreji, @jmendelsohn2.bsky.social , @faeze_brh
, @maartensap.bsky.social
nlpxuhui.bsky.social
🔄 Multi-turn interactive setup is crucial - models often begin with equivocation but shift to falsification when pressed for clear answers 🧠 Stronger models like GPT-4o showed the greatest shift when prompted to deceive (40% increase in falsification; alarming) 6/
nlpxuhui.bsky.social
⚠️ Even when explicitly instructed to be truthful, models STILL lied - GPT-4o still falsified info 15% of the time! 📉 The tradeoff is real: more honest models completed their goals 15% less often 5/
nlpxuhui.bsky.social
💼 In business scenarios (selling defective products), models were either completely honest OR completely deceptive 🌐 In public image scenarios (reputation management), behaviors were more ambiguous and complex 4/
nlpxuhui.bsky.social
And what we found: 📊 ALL tested models (GPT-4o, LLaMA-3, Mixtral) were truthful less than 50% of the time in conflict scenarios 🤔 Models prefer "partial lies" like equivocation over outright falsification - they'll dodge questions before explicitly lying 3/
nlpxuhui.bsky.social
Obviously this is a pressing issue now: x.com/deedydas/sta...; x.com/DanHendrycks... And here, we put LLMs into a multi-turn dialogue environment mimic the realistic setting where users constantly try to seek info from LLMs 2/
nlpxuhui.bsky.social
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
Reposted by Xuhui Zhou
joelmire.bsky.social
Reward models for LMs are meant to align outputs with human preferences—but do they accidentally encode dialect biases? 🤔

Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉

Paper: arxiv.org/abs/2502.12858 (1/10)
Screenshot of Arxiv paper title, "Rejected Dialects: Biases Against African American Language in Reward Models," and author list: Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, and Maarten Sap.
Reposted by Xuhui Zhou
zhuhao.me
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩‍⚖️ ?

With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
nlpxuhui.bsky.social
10/ A huge congrats to Sanidhya Vijayvargiya
and thanks to our amazing collaborators and advisors for this project
@akhilayerukola.bsky.social
@maartensap.bsky.social
@gneubig.bsky.social
from @ltiatcmu.bsky.social
!🙏
nlpxuhui.bsky.social
9/ Open-weight models need better interaction strategies to resolve tasks, while Claude models perform well but require stronger prompting to engage.
This study sets the state-of-the-art in handling ambiguity in real-world SWE tasks.
🔗 Repo: t.co/QD2A8N4R4J
nlpxuhui.bsky.social
8/ Not all LLMs ask the right questions. ❓🤖
🔹 Llama 3.1 70B asks generic, low-impact questions.
🔹 Claude Haiku 3.5 picks up keywords directly from the input to ask questions.
🔹 Claude Sonnet 3.5 often explores the code first, leading to smarter interactions. 🔍💡
nlpxuhui.bsky.social
7/ Claude models ask fewer but smarter questions, extracting more info and boosting performance. 📈
Meanwhile, DeepSeek-V2 can overwhelm users with too many questions. 🤯
nlpxuhui.bsky.social
6/ Without compulsory interaction, LLMs struggle to distinguish clear vs. vague instructions, either over-interacting or under-interacting despite prompt tweaks. 🔄
Only Claude Sonnet 3.5 can make this distinction to a limited degree with the right prompt. 🔍
nlpxuhui.bsky.social
5/ Our findings? LLMs default to non-interactive behavior unless forced to interact. But when they clarify vague inputs, performance drastically improves—proving the power of effective communication. 💬🤝
nlpxuhui.bsky.social
4/ How do LLMs handle ambiguity? We break it down into 3 key steps:
🔑 (a) Using interactivity to boost performance in ambiguous scenarios
💡 (b) Detecting ambiguity effectively
❓ (c) Asking the right questions
nlpxuhui.bsky.social
3/ How much does interaction actually help LLMs in coding tasks? 🤖💡
We put them to the test on SWE-Bench Verified across three distinct settings to measure the impact. 📊
nlpxuhui.bsky.social
LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀

(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
nlpxuhui.bsky.social
Looking forward to contributing to more socially aware and effective AI agents in 2025. 🤖✨
nlpxuhui.bsky.social
It's time to think about jointly optimizing human-AI communication and tool use!

All Hands' open-source approach and its bold, curious team make it the perfect playground for this exploration. Can't wait to dive in with @gneubig.bsky.social, Xingyao and the amazing team!
nlpxuhui.bsky.social
Excited to share that I'm joining All Hands AI (www.all-hands.dev) this summer as a research intern! 🚀

AI agents are becoming incredibly powerful, but their true potential lies in how they interact with and assist humans in meaningful ways.
All Hands AI
www.all-hands.dev
Reposted by Xuhui Zhou
williamheld.com
I like the BlueSky approach to "verification". If you own a domain, you can make a DNS record to turn it into your BlueSky handle!

bsky.social/about/blog/4...
How to verify your Bluesky account - Bluesky
Here's how to verify your Bluesky account by setting your website as your username.
bsky.social