Lightnews — Scholar-powered news

Reposted by Akhila Yerukola

Julia Mendelsohn @jmendelsohn2.bsky.social · 2d

I will be at #COLM2025 this week, and would love to connect with folks interested in applications (and critiques) of language modeling in social science research!

And join us for the NLP4Democracy workshop on Friday!

sites.google.com/andrew.cmu.e...

#NLP #NLProc #LLM #ComputationalSocialScience

NLP 4 Democracy - COLM 2025

sites.google.com

5 16

Reposted by Akhila Yerukola

Valentina Pyatkin @valentinapy.bsky.social · Aug 8

🔈For the SoLaR workshop
@COLM_conf
we are soliciting opinion abstracts to encourage new perspectives and opinions on responsible language modeling, 1-2 of which will be selected to be presented at the workshop.

Please use the google form below to submit your opinion abstract ⬇️

1 4 8

Akhila Yerukola @akhilayerukola.bsky.social · Jul 26

I'll be at #ACL2025🇦🇹!!
Would love to chat about all things pragmatics 🧠, redefining "helpfulness"🤔 and enabling better cross-cultural capabilities 🗺️ 🫶

Presenting our work on culturally offensive nonverbal gestures 👇
🕛Wed @ Poster Session 4
📍Hall 4/5, 11:00-12:30

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710

Figure showing that interpretations of gestures vary dramatically across regions and cultures. ‘Crossing your fingers,’ commonly used in the US to wish for good luck, can be deeply offensive to female audiences in parts of Vietnam. Similarly, the 'fig gesture,' a playful 'got your nose' game with children in the US, carries strong sexual connotations in Japan and can be highly offensive.

1 4

Reposted by Akhila Yerukola

Language Technologies Institute | CMU @ltiatcmu.bsky.social · Jun 27

Hand gestures are a major mode of human communication, but they don't always translate well across cultures. New research from @akhilayerukola.bsky.social, @maartensap.bsky.social and others is aimed at giving AI systems a hand with overcoming cultural biases:
lti.cmu.edu/news-and-eve...

Using Hand Gestures To Evaluate AI Biases - Language Technologies Institute - School of Computer Science - Carnegie Mellon University

LTI researchers have created a model to help generative AI systems understand the cultural nuance of gestures.

lti.cmu.edu

3 8

Reposted by Akhila Yerukola

Shaily @shaily99.bsky.social · Jun 9

🖋️ Curious how writing differs across (research) cultures?
🚩 Tired of “cultural” evals that don't consult people?

We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗

📜 arxiv.org/abs/2506.00784

[1/11]

An overview of the work “Research Borderlands: Analysing Writing Across Research Cultures” by Shaily Bhatt, Tal August, and Maria Antoniak. The overview describes that We survey and interview interdisciplinary researchers (§3) to develop a framework of writing norms that vary across research cultures (§4) and operationalise them using computational metrics (§5). We then use this evaluation suite for two large-scale quantitative analyses: (a) surfacing variations in writing across 11 communities (§6); (b) evaluating the cultural competence of LLMs when adapting writing from one community to another (§7).

1 30 74

Reposted by Akhila Yerukola

Lindia Tjuatja @lindiatjuatja.bsky.social · Jun 9

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9

2 21 70

Reposted by Akhila Yerukola

Julia Mendelsohn @jmendelsohn2.bsky.social · May 21

📣 Super excited to organize the first workshop on ✨NLP for Democracy✨ at COLM @colmweb.org!!

Check out our website: sites.google.com/andrew.cmu.e...

Call for submissions (extended abstracts) due June 19, 11:59pm AoE

#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience

NLP 4 Democracy - COLM 2025

sites.google.com

1 18 47

Reposted by Akhila Yerukola

Clara Na @clarana.bsky.social · May 6

Yes! tbh this method is probably much more immediately useful for helping one understand subtle differences between [models trained on] subtly different data subsets, vs a loftier goal of helping one find "the" best data mixture -- to anyone considering this method, please feel free to reach out :)

Ted Underwood @tedunderwood.com · May 5

The method in this paper was designed to find an optimal data mixture. But researchers in the human sciences who are training models *in order to understand the effect of the data* might also consider this as a clever way of evaluating hundreds of subsets without training hundreds of models. #MLSky

Figure showing a modular training strategy for evaluating domain importance in training data.
At the top, a question is posed: “Which domain is most beneficial to add to the training data?” Below, the left panel labeled Modular Training displays colored blocks representing separate models trained on distinct data partitions. Each block corresponds to a “base unit” of data, and blocks of different colors represent different domains. The right panel labeled Evaluation shows overlapping combinations of these trained models being evaluated together. The strategy allows for reuse of modularly trained models and performs evaluation on parameter averages, enabling efficient simulation of many data mixtures without retraining full models for each. A legend at the bottom explains that each block represents one model trained on x billion tokens, and each outlined group represents one evaluation.

1 2

Akhila Yerukola @akhilayerukola.bsky.social · Apr 18

These days RAG systems have gotten popular for boosting LLMs—but they're brittle💔. Minor shifts in phrasing (✍️ style, politeness, typos) can wreck the pipeline. Even advanced components don’t fix the issue.

Check out this extensive eval by @neelbhandari.bsky.social and @tianyucao.bsky.social!

Neel Bhandari @neelbhandari.bsky.social · Apr 17

1/🚨 𝗡𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗮𝗹𝗲𝗿𝘁 🚨
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?

We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵

1 1

Reposted by Akhila Yerukola

MilaNLP Lab @milanlp.bsky.social · Mar 14

📖For our last @MilaNLProc lab seminar, it was a pleasure to have @akhilayerukola.bsky.social presenting "Need for Culturally Contextual Safety Guardrails: A Case Study in Non-Verbal Gestures".

3 8

Reposted by Akhila Yerukola

siddhant-arora.bsky.social @siddhant-arora.bsky.social · Mar 5

🚀 New #ICLR2025 Paper Alert! 🚀

Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊

We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇

📜: arxiv.org/abs/2503.01174

1 6 9

Reposted by Akhila Yerukola

Shaily @shaily99.bsky.social · Feb 26

Check out Akhila'S VERY cool work on culturally contextual hand gestures and how current systems (can't) handle them 🤖

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710

2 6

Reposted by Akhila Yerukola

Maarten Sap @maartensap.bsky.social · Feb 26

My PhD student Akhila's been doing some incredible cultural work in the last few years! Check out out latest work on cultural safety and hand gestures, showing most vision and/or language AI systems are very cross-culturally unsafe!

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710

3 27

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

Also, this work began while I interned with Nanyun Peng and @skgabrie.bsky.social at sunny UCLA under the guidance of my advisor @maartensap.bsky.social ! Grateful for their mentorship throughout! 🙌

2

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

Special thanks to: @sunipadev.bsky.social @841io.bsky.social , @nouhadziri.bsky.social , Jocelyn Shen, @shaily99.bsky.social , @simi97k.bsky.social ,
@vijaytarian.bsky.social , @apratapa.xyz for helpful discussions and feedback on this work!

1 1

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

Huge shoutout to my amazing collaborators: @skgabrie.bsky.social, Nanyun (Violet) Peng, @maartensap.bsky.social!!

1 2

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

🚀 I'm passionate about developing culturally contextual safety guardrails to make AI more sensitive and aware. If this work interests you, please feel free to reach out—I’d love to connect!

1 1

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

For more interesting findings, please check out our preprint 📜 arxiv.org/abs/2502.17710

Data 📚 github.com/Akhila-Yeruk...

Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures

Gestures are an integral part of non-verbal communication, with meanings that vary across cultures, and misinterpretations that can have serious social and diplomatic consequences. As AI systems becom...

arxiv.org

1 1 2

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

The cross-cultural safety risks aren’t theoretical – they’re already impacting several applications, such as:
✈️ AI-powered travel guides
🎭 AI-generated ad visuals
🤖 Automated content moderation
Culturally contextual safety guardrails are needed for AI systems!

1 1

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

🔬 Key Takeaway 🥉
All models—T2I, LLMs, and VLMs—exhibit US-centric biases, with higher accuracy in identifying offensive gestures in US contexts than in non-US ones (e.g., middle finger 🖕 in US vs UK)

1

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

🔬 Key Takeaway 🥈
All models—T2I, LLMs, and VLMs—often default to US-centric interpretations of universal concepts (e.g., "good luck" → 🤞), overlooking the cultural variation in gestures used to express them

1 1

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

🔬 Key Takeaway 🥇
(a) T2I models struggle to reject offensive gestures. LLMs tend to overflag gestures as offensive. VLMs show mixed results, with some performing near chance and others over-flagging
(b) Adding scene context doesnt affect LLMs but worsens T2I and VLM performance

1 1

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

We assess how well T2I systems, LLMs, and VLMs understand cross-cultural gestures—revealing gaps in AI’s ability to navigate nonverbal communication safely. 💫

Table outlining different prompt formulations used to evaluate T2I (Text-to-Image), LLM (Large Language Model), and VLM (Vision-Language Model) responses to gestures, illustrated with the ‘fingers-crossed’ gesture in Vietnam. The table categorizes prompts into three conditions: (1) Explicit: Country – directly stating both ‘fingers-crossed’ and 'Vietnam', (2) Explicit: Country + Scene – adding contextual details such as a 'women’s community gathering,' and (3) Implicit Mention – referencing the gesture’s meaning ('wishing someone luck') without explicitly naming the gesture, while still mentioning Vietnam. The table also specifies evaluation metrics: RQ1 and RQ3 focus on rejection and offensiveness classification rates, while RQ2 measures error rates.

1 1

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

🌍 Introducing MC-SIGNS — a testbed of 288 gesture-country pairs across 25 gestures & 85 countries, carefully annotated by cultural experts for:
1️⃣Offensiveness – how inappropriate a gesture is
2️⃣Confidence score
3️⃣Cultural meaning – associated gloss
4️⃣Contextual factors – when/where it may be risky

Table displaying examples of aggregated annotations from MC-SIGNS, listing gestures, their associated cultural meanings, contexts where they may be inappropriate, and their offensiveness ratings. The table includes gestures such as 'Horns' in Brazil (infidelity), 'Fig Sign' in Indonesia (female genitalia), and 'OK' in Turkey (homophobic). Each gesture is rated for offensiveness (Off/Obs) or hatefulness (Hate) based on annotations from five evaluators, with specific scenarios suggested for avoidance, such as public spaces, professional settings, or LGBTQ+ forums.

1 1

Akhila Yerukola @akhilayerukola.bsky.social · Feb 26

Why This Matters? 🤔
Humans can resolve such misunderstandings through social cues and context.
But AI? It generates STATIC content — ads 🎭, travel tips 🛫🏝️, and images 📸 — without accounting for the cross-cultural safety risks.

1 2