Akhila Yerukola
@akhilayerukola.bsky.social
390 followers 230 following 18 posts
PhD student at CMU LTI; Interested in pragmatics and cross-cultural understanding; intern @ Allen Institute for AI |Prev: Senior Research Engineer @ Samsung Research America | Masters @ Stanford https://akhila-yerukola.github.io/
Posts Media Videos Starter Packs
Pinned
akhilayerukola.bsky.social
Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710
Figure showing that interpretations of gestures vary dramatically across regions and cultures. ‘Crossing your fingers,’ commonly used in the US to wish for good luck, can be deeply offensive to female audiences in parts of Vietnam. Similarly, the 'fig gesture,' a playful 'got your nose' game with children in the US, carries strong sexual connotations in Japan and can be highly offensive.
Reposted by Akhila Yerukola
jmendelsohn2.bsky.social
I will be at #COLM2025 this week, and would love to connect with folks interested in applications (and critiques) of language modeling in social science research!

And join us for the NLP4Democracy workshop on Friday!

sites.google.com/andrew.cmu.e...

#NLP #NLProc #LLM #ComputationalSocialScience
NLP 4 Democracy - COLM 2025
sites.google.com
Reposted by Akhila Yerukola
valentinapy.bsky.social
🔈For the SoLaR workshop
@COLM_conf
we are soliciting opinion abstracts to encourage new perspectives and opinions on responsible language modeling, 1-2 of which will be selected to be presented at the workshop.

Please use the google form below to submit your opinion abstract ⬇️
akhilayerukola.bsky.social
I'll be at #ACL2025🇦🇹!!
Would love to chat about all things pragmatics 🧠, redefining "helpfulness"🤔 and enabling better cross-cultural capabilities 🗺️ 🫶

Presenting our work on culturally offensive nonverbal gestures 👇
🕛Wed @ Poster Session 4
📍Hall 4/5, 11:00-12:30
akhilayerukola.bsky.social
Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710
Figure showing that interpretations of gestures vary dramatically across regions and cultures. ‘Crossing your fingers,’ commonly used in the US to wish for good luck, can be deeply offensive to female audiences in parts of Vietnam. Similarly, the 'fig gesture,' a playful 'got your nose' game with children in the US, carries strong sexual connotations in Japan and can be highly offensive.
Reposted by Akhila Yerukola
shaily99.bsky.social
🖋️ Curious how writing differs across (research) cultures?
🚩 Tired of “cultural” evals that don't consult people?

We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗

📜 arxiv.org/abs/2506.00784

[1/11]
An overview of the work “Research Borderlands: Analysing Writing Across Research Cultures” by Shaily Bhatt, Tal August, and Maria Antoniak. The overview describes that We  survey and interview interdisciplinary researchers (§3) to develop a framework of writing norms that vary across research cultures (§4) and operationalise them using computational metrics (§5). We then use this evaluation suite for two large-scale quantitative analyses: (a) surfacing variations in writing across 11 communities (§6); (b) evaluating the cultural competence of LLMs when adapting writing from one community to another (§7).
Reposted by Akhila Yerukola
lindiatjuatja.bsky.social
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9
Reposted by Akhila Yerukola
jmendelsohn2.bsky.social
📣 Super excited to organize the first workshop on ✨NLP for Democracy✨ at COLM @colmweb.org!!

Check out our website: sites.google.com/andrew.cmu.e...

Call for submissions (extended abstracts) due June 19, 11:59pm AoE

#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience
NLP 4 Democracy - COLM 2025
sites.google.com
Reposted by Akhila Yerukola
clarana.bsky.social
Yes! tbh this method is probably much more immediately useful for helping one understand subtle differences between [models trained on] subtly different data subsets, vs a loftier goal of helping one find "the" best data mixture -- to anyone considering this method, please feel free to reach out :)
tedunderwood.com
The method in this paper was designed to find an optimal data mixture. But researchers in the human sciences who are training models *in order to understand the effect of the data* might also consider this as a clever way of evaluating hundreds of subsets without training hundreds of models. #MLSky
Figure showing a modular training strategy for evaluating domain importance in training data.
At the top, a question is posed: “Which domain is most beneficial to add to the training data?” Below, the left panel labeled Modular Training displays colored blocks representing separate models trained on distinct data partitions. Each block corresponds to a “base unit” of data, and blocks of different colors represent different domains. The right panel labeled Evaluation shows overlapping combinations of these trained models being evaluated together. The strategy allows for reuse of modularly trained models and performs evaluation on parameter averages, enabling efficient simulation of many data mixtures without retraining full models for each. A legend at the bottom explains that each block represents one model trained on x billion tokens, and each outlined group represents one evaluation.
akhilayerukola.bsky.social
These days RAG systems have gotten popular for boosting LLMs—but they're brittle💔. Minor shifts in phrasing (✍️ style, politeness, typos) can wreck the pipeline. Even advanced components don’t fix the issue.

Check out this extensive eval by @neelbhandari.bsky.social and @tianyucao.bsky.social!
neelbhandari.bsky.social
1/🚨 𝗡𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗮𝗹𝗲𝗿𝘁 🚨
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?

We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
Reposted by Akhila Yerukola
milanlp.bsky.social
📖For our last @MilaNLProc lab seminar, it was a pleasure to have @akhilayerukola.bsky.social presenting "Need for Culturally Contextual Safety Guardrails: A Case Study in Non-Verbal Gestures".
Reposted by Akhila Yerukola
siddhant-arora.bsky.social
🚀 New #ICLR2025 Paper Alert! 🚀

Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊

We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇

📜: arxiv.org/abs/2503.01174
Reposted by Akhila Yerukola
shaily99.bsky.social
Check out Akhila'S VERY cool work on culturally contextual hand gestures and how current systems (can't) handle them 🤖
akhilayerukola.bsky.social
Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710
Figure showing that interpretations of gestures vary dramatically across regions and cultures. ‘Crossing your fingers,’ commonly used in the US to wish for good luck, can be deeply offensive to female audiences in parts of Vietnam. Similarly, the 'fig gesture,' a playful 'got your nose' game with children in the US, carries strong sexual connotations in Japan and can be highly offensive.
Reposted by Akhila Yerukola
maartensap.bsky.social
My PhD student Akhila's been doing some incredible cultural work in the last few years! Check out out latest work on cultural safety and hand gestures, showing most vision and/or language AI systems are very cross-culturally unsafe!
akhilayerukola.bsky.social
Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710
Figure showing that interpretations of gestures vary dramatically across regions and cultures. ‘Crossing your fingers,’ commonly used in the US to wish for good luck, can be deeply offensive to female audiences in parts of Vietnam. Similarly, the 'fig gesture,' a playful 'got your nose' game with children in the US, carries strong sexual connotations in Japan and can be highly offensive.
akhilayerukola.bsky.social
Also, this work began while I interned with Nanyun Peng and @skgabrie.bsky.social at sunny UCLA under the guidance of my advisor @maartensap.bsky.social ! Grateful for their mentorship throughout! 🙌
akhilayerukola.bsky.social
Huge shoutout to my amazing collaborators: @skgabrie.bsky.social, Nanyun (Violet) Peng, @maartensap.bsky.social!!
akhilayerukola.bsky.social
🚀 I'm passionate about developing culturally contextual safety guardrails to make AI more sensitive and aware. If this work interests you, please feel free to reach out—I’d love to connect!
akhilayerukola.bsky.social
The cross-cultural safety risks aren’t theoretical – they’re already impacting several applications, such as:
✈️ AI-powered travel guides
🎭 AI-generated ad visuals
🤖 Automated content moderation
Culturally contextual safety guardrails are needed for AI systems!
akhilayerukola.bsky.social
🔬 Key Takeaway 🥉
All models—T2I, LLMs, and VLMs—exhibit US-centric biases, with higher accuracy in identifying offensive gestures in US contexts than in non-US ones (e.g., middle finger 🖕 in US vs UK)
akhilayerukola.bsky.social
🔬 Key Takeaway 🥈
All models—T2I, LLMs, and VLMs—often default to US-centric interpretations of universal concepts (e.g., "good luck" → 🤞), overlooking the cultural variation in gestures used to express them
akhilayerukola.bsky.social
🔬 Key Takeaway 🥇
(a) T2I models struggle to reject offensive gestures. LLMs tend to overflag gestures as offensive. VLMs show mixed results, with some performing near chance and others over-flagging
(b) Adding scene context doesnt affect LLMs but worsens T2I and VLM performance
akhilayerukola.bsky.social
We assess how well T2I systems, LLMs, and VLMs understand cross-cultural gestures—revealing gaps in AI’s ability to navigate nonverbal communication safely. 💫
Table outlining different prompt formulations used to evaluate T2I (Text-to-Image), LLM (Large Language Model), and VLM (Vision-Language Model) responses to gestures, illustrated with the ‘fingers-crossed’ gesture in Vietnam. The table categorizes prompts into three conditions: (1) Explicit: Country – directly stating both ‘fingers-crossed’ and 'Vietnam', (2) Explicit: Country + Scene – adding contextual details such as a 'women’s community gathering,' and (3) Implicit Mention – referencing the gesture’s meaning ('wishing someone luck') without explicitly naming the gesture, while still mentioning Vietnam. The table also specifies evaluation metrics: RQ1 and RQ3 focus on rejection and offensiveness classification rates, while RQ2 measures error rates.
akhilayerukola.bsky.social
🌍 Introducing MC-SIGNS — a testbed of 288 gesture-country pairs across 25 gestures & 85 countries, carefully annotated by cultural experts for:
1️⃣Offensiveness – how inappropriate a gesture is
2️⃣Confidence score
3️⃣Cultural meaning – associated gloss
4️⃣Contextual factors – when/where it may be risky
Table displaying examples of aggregated annotations from MC-SIGNS, listing gestures, their associated cultural meanings, contexts where they may be inappropriate, and their offensiveness ratings. The table includes gestures such as 'Horns' in Brazil (infidelity), 'Fig Sign' in Indonesia (female genitalia), and 'OK' in Turkey (homophobic). Each gesture is rated for offensiveness (Off/Obs) or hatefulness (Hate) based on annotations from five evaluators, with specific scenarios suggested for avoidance, such as public spaces, professional settings, or LGBTQ+ forums.
akhilayerukola.bsky.social
Why This Matters? 🤔
Humans can resolve such misunderstandings through social cues and context.
But AI? It generates STATIC content — ads 🎭, travel tips 🛫🏝️, and images 📸 — without accounting for the cross-cultural safety risks.