Lightnews — Scholar-powered news

Dayeon (Zoey) Ki @dayeonki.bsky.social · Jun 12

7/ 🌟 What’s next for Multi-Agent Debate?

Some exciting future directions:
1️⃣ Assigning specific roles to represent diverse cultural perspectives
2️⃣ Discovering optimal strategies for multi-LLM collaboration
3️⃣ Designing better adjudication methods to resolve disagreements fairly 🤝

Dayeon (Zoey) Ki @dayeonki.bsky.social · Jun 12

6/ But do these gains hold across cultures? 🗾

🫂 We measure cultural parity across diverse groups — and find that Multi-Agent Debate not only boosts average accuracy but also leads to more equitable cultural alignment 🌍

Dayeon (Zoey) Ki @dayeonki.bsky.social · Jun 12

5/ How do model decisions evolve through debate?

We track three phases of LLM behavior:
💗 Initial decision correctness
💚 Final decision correctness
💙 Judge’s decision correctness

✨ Multi-Agent Debate is most valuable when models initially disagree!

1 2

Dayeon (Zoey) Ki @dayeonki.bsky.social · Jun 12

4/ 🔥 Distinct LLMs are complementary!

We find that:
🤯 Multi-Agent Debate lets smaller LLMs (7B) match the performance of much larger ones (27B)
🏆 Best combo? Gemma-2 9B + EXAONE-3 7B 💪

Dayeon (Zoey) Ki @dayeonki.bsky.social · Jun 12

3/ Before bringing in two #LLMs, we first 📈 maximize single-LLM performance through:

1️⃣ Cultural Contextualization: adding relevant rules-of-thumb for the target culture
2️⃣ Self-Reflection: evaluating and improve its own outputs

These serve as strong baselines before we introduce collaboration 🤝

Dayeon (Zoey) Ki @dayeonki.bsky.social · Jun 12

2/ 🤔 Why involve multiple #LLMs?

Different LLMs bring complementary perspectives and reasoning paths, thanks to variations in:
💽 Training data
🧠 Alignment processes
🌐 Language and cultural coverage

We explore one common form of collaboration: debate.

AskQE: Question Answering as Automatic Evaluation for Machine Translation

Dayeon (Zoey) Ki @dayeonki.bsky.social · Jun 12

1/ Are two #LLMs better than one for equitable cultural alignment? 🌍

We introduce a Multi-Agent Debate framework — where two LLM agents debate the cultural adaptability of a given scenario.

#ACL2025 🧵👇

1 6

Reposted by Dayeon (Zoey) Ki

Vilém Zouhar @zouharvi.bsky.social · Dec 2

Trying to collect all the MT people here. I probably missed many. Ping me!

bsky.app/starter-pack...

9 8 24

Dayeon (Zoey) Ki @dayeonki.bsky.social · May 21

8/ ❤️ Huge thanks to @marinecarpuat.bsky.social, Kevin duh, and the amazing UMD CLIP team for all the feedback and inspiration throughout this work!

We’d love for you to check it out 🚀
📄 Paper: arxiv.org/abs/2504.11582
🤖 Dataset: github.com/dayeonki/askqe

How can a monolingual English speaker determine whether an automatic translation in French is good enough to be shared? Existing MT error detection and quality estimation (QE) techniques do not addres...

Dayeon (Zoey) Ki @dayeonki.bsky.social · May 21

7/ Can AskQE handle naturally occurring translation errors too? 🍃

Yes! It shows:
💁‍♀️ Stronger correlation with human judgments
✅ Better decision-making accuracy than standard QE metrics

Dayeon (Zoey) Ki @dayeonki.bsky.social · May 21

6/ 🤖 What kinds of questions does AskQE generate?

Most commonly:
📏 Extent — How many COVID-19 cases were reported today? (24.6%)
💡 Concept — What is another name for paracetamol? (23.6%)

Dayeon (Zoey) Ki @dayeonki.bsky.social · May 21

5/ 🔥 We test AskQE on ContraTICO and find:

📉 It effectively distinguishes minor to critical translation errors
👭 It aligns closely with established quality estimation (QE) metrics

Dayeon (Zoey) Ki @dayeonki.bsky.social · May 21

4/ We introduce ContraTICO, a dataset of 8 contrastive MT error types in the COVID-19 domain 😷🦠

⚠️ Minor errors: spelling, word order, synonym, intensifier, expansion (no impact)
📛 Critical errors: expansion (impact), omission, alteration

Dayeon (Zoey) Ki @dayeonki.bsky.social · May 21

3/ AskQE has two main components:

❓ Question Generation (QG): conditioned on the source + its entailed facts
❕ Question Answering (QA): based on the source and backtranslated MT

If the answers don’t match... there's likely an error ⚠️

Dayeon (Zoey) Ki @dayeonki.bsky.social · May 21

2/ But why question answering? 🤔

1️⃣ Provides functional explanations of MT quality
2️⃣ Users can weigh the evidence based on their own judgment
3️⃣ Aligns well with real-world cross-lingual communication strategies 🌐

"It was 80% me, 20% AI": Seeking Authenticity in Co-Writing with Large Language Models

Dayeon (Zoey) Ki @dayeonki.bsky.social · May 21

1/ How can a monolingual English speaker 🇺🇸 decide if an automatic French translation 🇫🇷 is good enough to be shared?

Introducing ❓AskQE❓, an #LLM-based Question Generation + Answering framework that detects critical MT errors and provides actionable feedback 🗣️

#ACL2025

1 2 1

Reposted by Dayeon (Zoey) Ki

Myra Cheng @myra.bsky.social · May 2

How does the public conceptualize AI? Rather than self-reported measures, we use metaphors to understand the nuance and complexity of people’s mental models. In our #FAccT2025 paper, we analyzed 12,000 metaphors collected over 12 months to track shifts in public perceptions.

3 14 49

Reposted by Dayeon (Zoey) Ki

Vilém Zouhar @zouharvi.bsky.social · Apr 30

Multilinguality is happening at #NAACL2025

@crystinaz.bsky.social
@oxxoskeets.bsky.social
@dayeonki.bsky.social @onadegibert.bsky.social

1 14

Reposted by Dayeon (Zoey) Ki

Angel Hsing-Chi Hwang @angelhwang.bsky.social · Apr 18

Starting my journey on Bluesky with a topic that I care deeply about: AI tools can support creators in various ways, but disclosing AI use may risk devaluing creative work.

Check out our abstract here: angelhwang.github.io/doc/ic2s2_AI...
Inspired by our past work: arxiv.org/abs/2411.13032

Given the rising proliferation and diversity of AI writing assistance tools, especially those powered by large language models (LLMs), both writers and readers may have concerns about the impact of th...

Automatic Input Rewriting Improves Translation with Large Language Models

1 5 25

Dayeon (Zoey) Ki @dayeonki.bsky.social · Apr 17

8/ 🫶 Huge thanks to my advisor @marinecarpuat.bsky.social and the amazing UMD CLIP folks for all the insightful discussions!

Please check out our paper accepted to NAACL 2025 🚀
📄 Paper: arxiv.org/abs/2502.16682
🤖 Code: github.com/dayeonki/rew...

Can we improve machine translation (MT) with LLMs by rewriting their inputs automatically? Users commonly rely on the intuition that well-written text is easier to translate when using off-the-shelf M...

Dayeon (Zoey) Ki @dayeonki.bsky.social · Apr 17

7/ Taken together, we show that simpler texts are more translatable — and more broadly, #LLM-assisted input rewriting is a promising direction for improving translations! 💥

As LLM-based writing assistants grow, we encourage future work on interactive, rewriting-based approaches to MT 🫡

Dayeon (Zoey) Ki @dayeonki.bsky.social · Apr 17

6/ 🧑‍⚖️ Do humans actually prefer translations of simplified inputs?

Yes! They rated these to be:
📝 More contextually appropriate
👁️ Easier to read
🤗 More comprehensible
compared to translations of original inputs!

Dayeon (Zoey) Ki @dayeonki.bsky.social · Apr 17

5/ What does input rewriting actually change? 🧐

Here are 3 key findings:
1️⃣ Better translatability trades-off meaning preservation
2️⃣ Simplification boosts both input & output readability 📖
3️⃣ Input rewriting > Output post-editing 🤯

Dayeon (Zoey) Ki @dayeonki.bsky.social · Apr 17

4/ 🤔 Can we have more selective strategies?

Yes! By selecting rewrites based on translatability scores at inference time, we outperform all other methods 🔥