Andy Liu
@andyliu.bsky.social
1.9K followers 660 following 15 posts
phd type things @ cmu lti andyjliu.github.io
Posts Media Videos Starter Packs
andyliu.bsky.social
Please reach out if you'd like to chat about this work! We hope ConflictScope helps researchers study how models handle value conflicts that matter to their communities.
Code and data: github.com/andyjliu/con...
Arxiv: www.arxiv.org/abs/2509.25369
andyliu.bsky.social
ConflictScope can also be used to evaluate different approaches toward steering models. We find that including detailed target rankings in system prompts consistently improves model alignment with the target ranking while under conflict, but with plenty of room for improvement.
andyliu.bsky.social
We find significant shifts between models’ expressed and revealed preferences under conflict! Models say they prefer actions that support protective values (e.g. harmlessness) when asked directly, but support personal values (e.g. helpfulness) in more realistic evaluations.
andyliu.bsky.social
To address issues with multiple-choice evaluation, we focus on open-ended evaluation with a simulated user. Annotation studies show strong correlation between LLM and human judgments of which action a model took in a given scenario, allowing us to automate open-ended evaluations.
andyliu.bsky.social
We introduce new metrics to measure how morally challenging a dataset is for models. We find that ConflictScope produces datasets that elicit more disagreement and stronger preferences than moral dilemma datasets, while alignment data frequently elicits indifference from models.
andyliu.bsky.social
Given a set of values, ConflictScope generates scenarios in which an LLM-based assistant faces a conflict between a pair of values in the set. It then evaluates which value a target LLM supports more in each scenario before combining scenario-level judgments into a value ranking.
andyliu.bsky.social
🚨New Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict.
(📷 xkcd)
andyliu.bsky.social
Placing LLMs in simulated markets helps us quantitatively and qualitatively measure their propensity to collude, as well as how environmental changes affect this. Read below or find @veronateo.bsky.social at the ICML multi-agent systems workshop to learn more!
veronateo.bsky.social
Excited to share our paper “Evaluating LLM Agent Collusion in Double Auctions”!

We put LLMs in a simulated market and find that collusion increases when they are able to communicate via natural language, differs across models, and is influenced by urgency and oversight.

1/
andyliu.bsky.social
these are great, thanks! will check them out
andyliu.bsky.social
started Axiomatic but didn’t get very far - Permutation City looks fun though, thanks
andyliu.bsky.social
looking for 2025 book recs!

things i've previously liked, for reference -
nonfiction: the structure of scientific revolutions, cybernetic revolutionaries, seeing like a state
fiction: stories of your life and others, one hundred years of solitude, project hail mary, recursion
andyliu.bsky.social
PRISM has preference scores for different models that you can convert into pairwise labels
Reposted by Andy Liu
ltiatcmu.bsky.social
Looking for all your LTI friends on Bluesky? The LTI Starter Pack is here to help!

go.bsky.app/NhTwCVb
andyliu.bsky.social
could I be added? thanks for curating :)