Yu (Hope) Hou
banner
houyu0930.bsky.social
Yu (Hope) Hou
@houyu0930.bsky.social
Reposted by Yu (Hope) Hou
The debate over “LLMs as annotators” feels familiar: excitement, backlash, and anxiety about bad science. My take in a new blogpost is that LLMs don’t break measurement; they expose how fragile it already was.

doomscrollingbabel.manoel.xyz/p/labeling-d...
Labeling Data with Language Models: Trick or Treat?
Large language models are now labeling data for us.
doomscrollingbabel.manoel.xyz
October 25, 2025 at 6:29 PM
Reposted by Yu (Hope) Hou
Our Responsible AI team at Apple is looking for spring/summer 2026 PhD research interns! Please apply at jobs.apple.com/en-us/detail... and email [email protected]. Do not send extra info (e.g., CV), just drop us a line so we can find your application in the central pool!
Machine Learning / AI Internships - Jobs - Careers at Apple
Apply for a Machine Learning / AI Internships job at Apple. Read about the role and find out if it’s right for you.
jobs.apple.com
October 10, 2025 at 2:28 AM
Reposted by Yu (Hope) Hou
What should Machine Translation research look like in the age of multilingual LLMs?

Here’s one answer from researchers across NLP/MT, Translation Studies, and HCI.
"An Interdisciplinary Approach to Human-Centered Machine Translation"
arxiv.org/abs/2506.13468
An Interdisciplinary Approach to Human-Centered Machine Translation
Machine Translation (MT) tools are widely used today, often in contexts where professional translators are not present. Despite progress in MT technology, a gap persists between system development and...
arxiv.org
June 18, 2025 at 12:08 PM
Reposted by Yu (Hope) Hou
A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at UMD CS @univofmaryland.bsky.social this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
June 13, 2025 at 6:20 PM
Reposted by Yu (Hope) Hou
🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?

🧟 You get what we call a Frankentext!

💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.
June 3, 2025 at 3:09 PM
Reposted by Yu (Hope) Hou
I guess that now that I have 1% of my Twitter followers follow me here 😅, I should announce it here too for those of you no longer checking Twitter: my nonfiction book, "Lost in Automatic Translation" is coming out this July: lostinautomatictranslation.com. I'm very excited to share it with you!
May 27, 2025 at 7:16 PM
Reposted by Yu (Hope) Hou
1/ How can a monolingual English speaker 🇺🇸 decide if an automatic French translation 🇫🇷 is good enough to be shared?

Introducing ❓AskQE❓, an #LLM-based Question Generation + Answering framework that detects critical MT errors and provides actionable feedback 🗣️

#ACL2025
May 21, 2025 at 5:49 PM
Reposted by Yu (Hope) Hou
We introduce a super simple yet effective strategy to improve video-language alignment (+18%): add hallucination correction in your training objective👌
Excited to share our accepted paper at ACL: Can Hallucination Correction Improve Video-language Alignment?
Link: arxiv.org/abs/2502.15079
May 20, 2025 at 9:12 PM
Reposted by Yu (Hope) Hou
Please help us spread the word! 📣

FATE is hiring a pre-doc research assistant! We're looking for candidates who will have completed their bachelor's degree (or equivalent) by summer 2025 and want to advance their research skills before applying to PhD programs.
May 20, 2025 at 2:34 PM
Reposted by Yu (Hope) Hou
I'm joining Wisconsin CS as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student!

Before then, I'll postdoc for a year in the NLP group at another UW 🏔️ in the Pacific Northwest
May 5, 2025 at 7:54 PM
Reposted by Yu (Hope) Hou
🔈 NEW PAPER 🔈
Excited to share my paper that analyzes the effect of cross-lingual alignment on multilingual performance
Paper: arxiv.org/abs/2504.09378 🧵
Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs
Large language models (LLMs) pre-trained predominantly on English text exhibit surprising multilingual capabilities, yet the mechanisms driving cross-lingual generalization remain poorly understood. T...
arxiv.org
April 18, 2025 at 3:00 PM
Reposted by Yu (Hope) Hou
🚨 New Paper 🚨

1/ We often assume that well-written text is easier to translate ✏️

But can #LLMs automatically rewrite inputs to improve machine translation? 🌍

Here’s what we found 🧵
April 17, 2025 at 1:32 AM
Reposted by Yu (Hope) Hou
A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines:

Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)

Plz RT 🙏
March 20, 2025 at 6:20 PM
Reposted by Yu (Hope) Hou
Nice modern NLP (Ai) intro talk slides isabelleaugenstein.github.io/slides/2025_... Isabelle Augenstein
isabelleaugenstein.github.io
March 11, 2025 at 7:31 AM
Reposted by Yu (Hope) Hou
🚨 Our team at UMD is looking for participants to study how #LLM agent plans can help you answer complex questions

💰 $1 per question
🏆 Top-3 fastest + most accurate win $50
⏳ Questions take ~3 min => $20/hr+

Click here to sign up (please join, reposts appreciated 🙏): preferences.umiacs.umd.edu
March 11, 2025 at 2:30 PM
Reposted by Yu (Hope) Hou
Our FATE MTL team has been working on a series of projects on anthropomorphic AI systems for which we recently put out a few pre-prints I’m excited about. While working on these we tried to think carefully not only about key research questions but also how we study and write about these systems
March 5, 2025 at 7:55 PM
Reposted by Yu (Hope) Hou
New synthetic benchmark for multilingual long-context LLMs! Surprisingly, English and Chinese are not the top-performing languages (it's Polish!). We also observe a widening gap between high and low-resource languages as context size increases. Check out the paper for more 👇
Is the needle-in-a-haystack test still meaningful given the giant green heatmaps in modern LLM papers?

We create ONERULER 💍, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!

Our analysis across 26 languages 🧵👇
March 5, 2025 at 6:44 PM
Reposted by Yu (Hope) Hou
🚨 New Position Paper 🚨

Multiple choice evals for LLMs are simple and popular, but we know they are awful 😬

We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? 🫠

Here's why MCQA evals are broken, and how to fix them 🧵
February 24, 2025 at 9:04 PM
Reposted by Yu (Hope) Hou
⚠️Current methods for generating instruction-following data fall short for long-range reasoning tasks like narrative claim verification.

We present CLIPPER ✂️, a compression-based pipeline that produces grounded instructions for ~$0.5 each, 34x cheaper than human annotations.
February 21, 2025 at 4:25 PM
Reposted by Yu (Hope) Hou
New preprint!
Metaphors shape how people understand politics, but measuring them (& their real-world effects) is hard.

We develop a new method to measure metaphor & use it to study dehumanizing metaphor in 400K immigration tweets Link: bit.ly/4i3PGm3

#NLP #NLProc #polisky #polcom #compsocialsci
🐦🐦
February 20, 2025 at 7:59 PM
Reposted by Yu (Hope) Hou
New open source reasoning model!

Huginn-3.5B reasons implicitly in latent space 🧠

Unlike O1 and R1, latent reasoning doesn’t need special chain-of-thought training data, and doesn't produce extra CoT tokens at test time.

We trained on 800B tokens 👇
February 10, 2025 at 3:58 PM
I have learned a lot in this project! If you are interested in how NLI can be used in VLMs to complement its representation, check it out!
NLI Improves Compositionality in Vision-Language Models is accepted to #ICLR2025!

CECE enables interpretability and achieves significant improvements in hard compositional benchmarks without fine-tuning (e.g., Winoground, EqBen) and alignment (e.g., DrawBench, EditBench). + info: cece-vlm.github.io
January 23, 2025 at 6:45 PM
Reposted by Yu (Hope) Hou
Accepted at #ICLR2025✨

🧐Which languages benefit the most from vocabulary adaptation?

We introduce VocADT, a new vocabulary adaptation method with a vocabulary adapter.
We explore the impact of various adaptation strategies on languages with diverse scripts and fragmentation to answer this question
January 22, 2025 at 7:27 PM
Reposted by Yu (Hope) Hou
We are looking for volunteers for reviewing and AC roles! Please sign up here:
forms.gle/rZp67YvMn1hn...
COLM 2025 Program Committee Volunteer Form
This form is to volunteer to serve on the COLM 2025 (https://colmweb.org/) program committee as a reviewer or an area chair.
forms.gle
January 13, 2025 at 5:06 PM
Reposted by Yu (Hope) Hou
🎉Announcing... the 2024 TMLR Outstanding Certifications! (aka, our "best paper" awards!)

Are you bursting with anticipation to see what they are? Check out this blog post, and read down-thread!! 🎉🧵👇 1/n
medium.com/@TmlrOrg/ann...
Announcing the 2024 TMLR Outstanding Certification
By the 2024 TMLR Outstanding Paper Committee: Michael Bowling, Brian Kingsbury, Andreas Kirsch, Yingzhen Li, and Eleni Triantafillou
medium.com
January 8, 2025 at 5:41 PM