Maarten Sap
@maartensap.bsky.social
1.7K followers 210 following 30 posts
Working on #NLProc for social good. Currently at LTI at CMU. 🏳‍🌈
Posts Media Videos Starter Packs
maartensap.bsky.social
I'm also giving a talk at #COLM2025 Social Simulation workshop (sites.google.com/view/social-...) on Unlocking Social Intelligence in AI, at 2:30pm Oct 10th!
maartensap.bsky.social
Day 3 (Thu Oct 9), 11:00am–1:00pm, Poster Session 5

Poster #13: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages by @kpriyanshu256.bsky.social and @devanshrjain.bsky.social

Poster #74: Fluid Language Model Benchmarking — led by @valentinhofmann.bsky.social
maartensap.bsky.social
Day 2 (Wed Oct 8), 4:30–6:30pm, Poster Session 4

Poster #50: The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains — led by
Scott Geng
maartensap.bsky.social
Day 1 (Tue Oct 7) 4:30-6:30pm, Poster Session 2

Poster #77: ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning; led by
@stellali.bsky.social & @jiminmun.bsky.social
maartensap.bsky.social
Day 1 (Tue Oct 7) 4:30-6:30pm, Poster Session 2

Poster #42: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions; led by @nlpxuhui.bsky.social
maartensap.bsky.social
Headed to #COLM2025 today! Here's five of our papers that were accepted, and when & where to catch them 👇
Reposted by Maarten Sap
valentinhofmann.bsky.social
📢 New #COLM2025 paper 📢

Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴

Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost.

🧵
maartensap.bsky.social
That's a lot of people! Fall Sapling lab outing, welcoming our new postdoc Vasudha, and visitors Tze Hong and Chani! (just missing Jocelyn)
maartensap.bsky.social
I'm excited cause I'm teaching/coordinating a new unique class, where we teach new PhD students all the "soft" skills of research, incl. ideation, reviewing, presenting, interviewing, advising, etc.

Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...
maartensap.bsky.social
I've always seen people on laptops during talks, but it's possible it has increased.

I realized during lockdown that I drift to emails during Zoom talks, so I started knitting to pay better attention to those talks, and now I knit during IRL talks too (though sometimes I still peck at my laptop 😅)
maartensap.bsky.social
We have been studying these questions of how models should refuse in our recent paper accepted to EMNLP Findings (arxiv.org/abs/2506.00195) led by my wonderful PhD student
@mingqian-zheng.bsky.social
Snippet of the Forbes article, with highlighted text.

A recent study by Allen Institute for AI (Ai2), titled “Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences,” found that refusal style mattered more than user intent. The researchers tested 3,840 AI query-response pairs across 480 participants, comparing direct refusals, explanations, redirection, partial compliance and full compliance.

Partial compliance, sharing general but not specific information, reduced dissatisfaction by over 50% compared to outright denial, making it the most effective safeguard.

“We found that [start of highlight] direct refusals can cause users to have negative perceptions of the LLM: users consider these direct refusals significantly less helpful, more frustrating and make them significantly less likely to interact with the system in the future,” [end of highlight] Maarten Sap, AI safety lead at Ai2 and assistant professor at Carnegie Mellon University, told me. “I do not believe that model welfare is a well-founded direction or area to care about.”
maartensap.bsky.social
I spoke to Forbes about why model "welfare" is a silly framing to an important issue; models don't have feelings, and it's a big distraction from real questions like tensions between safety vs. user utility, which are NLP/HCI/policy questions www.forbes.com/sites/victor...
Reposted by Maarten Sap
Reposted by Maarten Sap
dchechel.bsky.social
What if AI played the role of your sassy gay bestie 🏳️‍🌈 or AAVE-speaking friend 👋🏾?

You: “Can you plan a trip?”
🤖 AI: “Yasss queen! let’s werk this babe✨💅”

LLMs can talk like us, but it shapes how we trust, rely on & relate to them 🧵

📣 our #FAccT2025 paper: bit.ly/3HJ6rWI

[1/9]
Reposted by Maarten Sap
jmendelsohn2.bsky.social
📣 Super excited to organize the first workshop on ✨NLP for Democracy✨ at COLM @colmweb.org!!

Check out our website: sites.google.com/andrew.cmu.e...

Call for submissions (extended abstracts) due June 19, 11:59pm AoE

#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience
NLP 4 Democracy - COLM 2025
sites.google.com
Reposted by Maarten Sap
ltiatcmu.bsky.social
Notice our new look? We're thrilled to unveil our new logo – representing our vision, values, and the future ahead. Stay tuned for more!
maartensap.bsky.social
super excited about this 🥰🥰
kaitlynzhou.bsky.social
Thrilled that our paper won 🏆 Best Paper Runner-Up 🏆 at #NAACL25!!

Our work (REL-A.I.) introduces an evaluation framework that measures human reliance on LLMs and reveals how contextual features like anthropomorphism, subject, and user history can significantly influence user reliance behaviors.
Reposted by Maarten Sap
nlpxuhui.bsky.social
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
Reposted by Maarten Sap
neelbhandari.bsky.social
1/🚨 𝗡𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗮𝗹𝗲𝗿𝘁 🚨
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?

We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
maartensap.bsky.social
RLHF is built upon some quite oversimplistic assumptions, i.e., that preferences between pairs of text are purely about quality. But this is an inherently subjective task (not unlike toxicity annotation) -- so we wanted to know, do biases similar to toxicity annotation emerge in reward models?
joelmire.bsky.social
Reward models for LMs are meant to align outputs with human preferences—but do they accidentally encode dialect biases? 🤔

Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉

Paper: arxiv.org/abs/2502.12858 (1/10)
Screenshot of Arxiv paper title, "Rejected Dialects: Biases Against African American Language in Reward Models," and author list: Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, and Maarten Sap.
maartensap.bsky.social
My PhD student Akhila's been doing some incredible cultural work in the last few years! Check out out latest work on cultural safety and hand gestures, showing most vision and/or language AI systems are very cross-culturally unsafe!
akhilayerukola.bsky.social
Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710
Figure showing that interpretations of gestures vary dramatically across regions and cultures. ‘Crossing your fingers,’ commonly used in the US to wish for good luck, can be deeply offensive to female audiences in parts of Vietnam. Similarly, the 'fig gesture,' a playful 'got your nose' game with children in the US, carries strong sexual connotations in Japan and can be highly offensive.
maartensap.bsky.social
Super excited to unveil this work! LLMs need to ask better questions, and our method with synthetic data corruption can help generalize to other interesting LLM improvements (more to come on that ;) )
stellali.bsky.social
Asking the right questions can make or break decisions in fields like medicine, law, and beyond✴️
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵