Tiancheng Hu
@tiancheng.bsky.social
990 followers 1.1K following 36 posts
PhD student @CambridgeLTL; Previously @DLAB @EPFL; Interested in NLP and CSS. Apple Scholar, Gates Scholar.
Posts Media Videos Starter Packs
tiancheng.bsky.social
SimBench: Benchmarking the Ability of Large
Language Models to Simulate Human Behaviors, SRW Oral, Monday, July 28, 14:00-15:30
tiancheng.bsky.social
I will be presenting:

iNews: A Multimodal Dataset for Modeling Personalized Affective Responses to News, Poster Session 1, Monday, July 28, 11:00-12:30; Also at LAW workshop
tiancheng.bsky.social
Heading to Vienna today to attend #ACL2025NLP! Let's chat if you are interested in LLM social simulation, personalization, character training and human-centered AI!
Reposted by Tiancheng Hu
morlikow.bsky.social
I will be at #acl2025 to present "Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions" ✨

Huge thank you to my collaborators Jiaxin Pei @paul-rottger.bsky.social Philipp Cimiano @davidjurgens.bsky.social @dirkhovy.bsky.social 🍰

more below
Picture of Matthias Orlikowski presenting a poster on the paper titled "Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions". The poster is similar to the one that will be presented at ACL 2025, showing a number of figures about the key results.
tiancheng.bsky.social
Centaur (a model of general cognition tuned from 160 multi-step psych experiment data) nature.com/articles/s41...
@marcelbinz.bsky.social
@ericschulz.bsky.social
tiancheng.bsky.social
This work complements other fantastic work and data:
Twin-2K-500 (2k individual answering 500+ questions) arxiv.org/abs/2505.17479,
Generative Agent Simulations of 1,000 People (2h interview as seeds for simulation) arxiv.org/abs/2411.10109
@joon-s-pk.bsky.social
@mbernst.bsky.social
tiancheng.bsky.social
Our unique focus: we're not replicating static profiles (like survey answers). We're simulating a cognitive process - how an individual processes new information and reacts emotionally.
tiancheng.bsky.social
Working on LLM social simulation and need data?
Excited to announce our iNews paper is accepted to #ACL2025! 🥳 It's a large-scale dataset for predicting individualized affective responses to real-world, multimodal news.

Paper: arxiv.org/abs/2503.03335

Data: huggingface.co/datasets/pit...
tiancheng.bsky.social
Centaur (a model of general cognition tuned from 160 multi-step psych experiment data) t.co/X6IFC29lbx
‪@marcelbinz.bsky.social‬
‪@ericschulz.bsky.social‬
https://www.nature.com/articles/s41586-025-09215-4
t.co
tiancheng.bsky.social
This work complements other fantastic work and data:
Twin-2K-500 (2k individual answering 500+ questions) arxiv.org/abs/2505.17479,
Generative Agent Simulations of 1,000 People (2h interview as seeds for simulation) arxiv.org/abs/2411.10109
‪@joon-s-pk.bsky.social‬
‪@mbernst.bsky.social‬
tiancheng.bsky.social
Our unique focus: we're not replicating static profiles (like survey answers). We're simulating a cognitive process - how an individual processes new information and reacts emotionally.
Reposted by Tiancheng Hu
bminixhofer.bsky.social
We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*!

With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more🧵
Image illustrating that ALM can enable Ensembling, Transfer to Bytes, and general Cross-Tokenizer Distillation.
tiancheng.bsky.social
iNews applications:
• LLM personalization
• Affective computing
• Human behavior simulation
• Social computing
• and many more! (8/8)
We are particularly grateful to @camlangsci.bsky.social for funding support and @Kiran Garimella
tiancheng.bsky.social
Few-Shot:
• "Early ascent phenomenon": performance dips with few examples, then improves
• Persona info consistently helps, even at 32-shot (reaching 44.4% accuracy).
• Image few-shot prompting scales worse than text, despite zero-shot advantage. (7/8)
tiancheng.bsky.social
Zero-Shot LLM Prediction:
• Persona info boosts accuracy across models (up to 7% gain!).
• Image inputs generally outperform text inputs in zero-shot.
• Gemini 1.5 Pro + image + persona = best zero-shot performance (still only 40% accuracy though). (6/8)
tiancheng.bsky.social
These persona variables explain up to 15.2% of annotation variance—more than any existing subjective NLP dataset! Individual differences aren't noise—they're systematic patterns we can model. (5/8)
tiancheng.bsky.social
What makes iNews unique? We don't aggregate responses. We capture personal reactions AND collect comprehensive annotator characteristics (i.e. demographics, personality, media habits). (4/8)
tiancheng.bsky.social
We're introducing iNews: a large-scale dataset capturing the inherent subjectivity of how people respond emotionally to real news content. 2,899 Facebook posts (screenshot so multimodal!) × 291 diverse annotators = rich, subjective affective data. (3/8)
tiancheng.bsky.social
Current AI systems are often trained with the assumption that we all feel the same about content, but psychology shows we don't. Our emotions vary by age, gender, personality, politics & countless other factors. (2/8)
tiancheng.bsky.social
Ever notice how something that makes your blood boil barely registers with your friend? Our emotional reactions aren't universal at all—they're deeply personal. And AI needs to understand that. Excited to share our new paper: "iNews" 🧵 (1/8) arxiv.org/abs/2503.03335
iNews: A Multimodal Dataset for Modeling Personalized Affective Responses to News
Current approaches to emotion detection often overlook the inherent subjectivity of affective experiences, instead relying on aggregated labels that mask individual variations in emotional responses. ...
arxiv.org
tiancheng.bsky.social
Great work by @riverdong.bsky.social - we dug deep into existing datasets & algorithms and found quite some surprising stuff
riverdong.bsky.social
🚨New Paper Alert🚨
Many personalization methods optimize performance but ignore real-world impact.
We examine its effects on:
✅ Performance
⚖️ Fairness: Can it represent minorities fairly?
⚠️ Unintended Effects: Does it harm safety?
🔄 Adaptability: Quickly adapt to new users?
tiancheng.bsky.social
I can relate :) honestly I don't think nearly enough people in the west have a clear understanding of the process of getting a EU, U.S. etc. visa sigh😮‍💨
tiancheng.bsky.social
9/9: A round of applause 👏 for our stellar team: @tiancheng.bsky.social & @yarakyrychenko.bsky.social (co-leads), @steverathje.bsky.social, @nigelhcollier, @profsanderlinden.bsky.social, and @roozenbot. Special thanks to @cambridgeltl.bsky.social, @iislucas and @gatesfoundation.bsky.social.