Dayeon (Zoey) Ki
@dayeonki.bsky.social
160 followers 220 following 24 posts
CS PhD @umdclip Multilingual / Culture #NLProc, MT https://dayeonki.github.io/
Posts Media Videos Starter Packs
dayeonki.bsky.social
7/ 🌟 What’s next for Multi-Agent Debate?

Some exciting future directions:
1️⃣ Assigning specific roles to represent diverse cultural perspectives
2️⃣ Discovering optimal strategies for multi-LLM collaboration
3️⃣ Designing better adjudication methods to resolve disagreements fairly 🤝
dayeonki.bsky.social
6/ But do these gains hold across cultures? 🗾

🫂 We measure cultural parity across diverse groups — and find that Multi-Agent Debate not only boosts average accuracy but also leads to more equitable cultural alignment 🌍
dayeonki.bsky.social
5/ How do model decisions evolve through debate?

We track three phases of LLM behavior:
💗 Initial decision correctness
💚 Final decision correctness
💙 Judge’s decision correctness

✨ Multi-Agent Debate is most valuable when models initially disagree!
dayeonki.bsky.social
4/ 🔥 Distinct LLMs are complementary!

We find that:
🤯 Multi-Agent Debate lets smaller LLMs (7B) match the performance of much larger ones (27B)
🏆 Best combo? Gemma-2 9B + EXAONE-3 7B 💪
dayeonki.bsky.social
3/ Before bringing in two #LLMs, we first 📈 maximize single-LLM performance through:

1️⃣ Cultural Contextualization: adding relevant rules-of-thumb for the target culture
2️⃣ Self-Reflection: evaluating and improve its own outputs

These serve as strong baselines before we introduce collaboration 🤝
dayeonki.bsky.social
2/ 🤔 Why involve multiple #LLMs?

Different LLMs bring complementary perspectives and reasoning paths, thanks to variations in:
💽 Training data
🧠 Alignment processes
🌐 Language and cultural coverage

We explore one common form of collaboration: debate.
dayeonki.bsky.social
1/ Are two #LLMs better than one for equitable cultural alignment? 🌍

We introduce a Multi-Agent Debate framework — where two LLM agents debate the cultural adaptability of a given scenario.

#ACL2025 🧵👇
Reposted by Dayeon (Zoey) Ki
zouharvi.bsky.social
Trying to collect all the MT people here. I probably missed many. Ping me!

bsky.app/starter-pack...
dayeonki.bsky.social
7/ Can AskQE handle naturally occurring translation errors too? 🍃

Yes! It shows:
💁‍♀️ Stronger correlation with human judgments
✅ Better decision-making accuracy than standard QE metrics
dayeonki.bsky.social
6/ 🤖 What kinds of questions does AskQE generate?

Most commonly:
📏 Extent — How many COVID-19 cases were reported today? (24.6%)
💡 Concept — What is another name for paracetamol? (23.6%)
dayeonki.bsky.social
5/ 🔥 We test AskQE on ContraTICO and find:

📉 It effectively distinguishes minor to critical translation errors
👭 It aligns closely with established quality estimation (QE) metrics
dayeonki.bsky.social
4/ We introduce ContraTICO, a dataset of 8 contrastive MT error types in the COVID-19 domain 😷🦠

⚠️ Minor errors: spelling, word order, synonym, intensifier, expansion (no impact)
📛 Critical errors: expansion (impact), omission, alteration
dayeonki.bsky.social
3/ AskQE has two main components:

❓ Question Generation (QG): conditioned on the source + its entailed facts
❕ Question Answering (QA): based on the source and backtranslated MT

If the answers don’t match... there's likely an error ⚠️
dayeonki.bsky.social
2/ But why question answering? 🤔

1️⃣ Provides functional explanations of MT quality
2️⃣ Users can weigh the evidence based on their own judgment
3️⃣ Aligns well with real-world cross-lingual communication strategies 🌐
dayeonki.bsky.social
1/ How can a monolingual English speaker 🇺🇸 decide if an automatic French translation 🇫🇷 is good enough to be shared?

Introducing ❓AskQE❓, an #LLM-based Question Generation + Answering framework that detects critical MT errors and provides actionable feedback 🗣️

#ACL2025
Reposted by Dayeon (Zoey) Ki
myra.bsky.social
How does the public conceptualize AI? Rather than self-reported measures, we use metaphors to understand the nuance and complexity of people’s mental models. In our #FAccT2025 paper, we analyzed 12,000 metaphors collected over 12 months to track shifts in public perceptions.
dayeonki.bsky.social
7/ Taken together, we show that simpler texts are more translatable — and more broadly, #LLM-assisted input rewriting is a promising direction for improving translations! 💥

As LLM-based writing assistants grow, we encourage future work on interactive, rewriting-based approaches to MT 🫡
dayeonki.bsky.social
6/ 🧑‍⚖️ Do humans actually prefer translations of simplified inputs?

Yes! They rated these to be:
📝 More contextually appropriate
👁️ Easier to read
🤗 More comprehensible
compared to translations of original inputs!
dayeonki.bsky.social
5/ What does input rewriting actually change? 🧐

Here are 3 key findings:
1️⃣ Better translatability trades-off meaning preservation
2️⃣ Simplification boosts both input & output readability 📖
3️⃣ Input rewriting > Output post-editing 🤯
dayeonki.bsky.social
4/ 🤔 Can we have more selective strategies?

Yes! By selecting rewrites based on translatability scores at inference time, we outperform all other methods 🔥