Sonia Murthy
@soniakmurthy.bsky.social
50 followers 19 following 12 posts
cs phd student and kempner institute graduate fellow at harvard. interested in language, cognition, and ai soniamurthy.com
Posts Media Videos Starter Packs
Pinned
soniakmurthy.bsky.social
(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @tomerullman.bsky.social and @jennhu.bsky.social, to appear at #NAACL2025! 🐟

We want models that match our values...but could this hurt their diversity of thought?
Preprint: arxiv.org/abs/2411.04427
soniakmurthy.bsky.social
Thanks Hope! I just came across your related work with the CSS team at Microsoft- I'd love to chat about it sometime if you're free 🙂
soniakmurthy.bsky.social
Hi Daniel- thanks so much. The preprint is dependable, though missing a little additional discussion that made it into the camera-ready. I can email you the camera ready and will update arxiv with it shortly. Thank you!
soniakmurthy.bsky.social
Many thanks to my collaborators and @kempnerinstitute.bsky.social for helping make this idea come to life, and to @rdhawkins.bsky.social for helping plant the seeds 🌱
soniakmurthy.bsky.social
(8/9) We think that better understanding such tradeoffs will be important to building LLMs that are aligned to human values– human values are diverse, our models should be too.
soniakmurthy.bsky.social
(7/9) This suggests a trade-off: increasing model safety in terms of value alignment decreases safety in terms of diversity of thoughts and opinion.
soniakmurthy.bsky.social
(6/9) We put a suite of aligned models, and their instruction fine-tuned counterparts, to the test and found:
* no model reaches human-like diversity of thought.
* aligned models show LESS conceptual diversity than instruction fine-tuned counterparts
soniakmurthy.bsky.social
(5/9) Our experiments are inspired by human studies in two domains with rich behavioral data.
soniakmurthy.bsky.social
(4/9) We introduce a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by considering how its “individuals’” variability relates to that of the population.
soniakmurthy.bsky.social
(3/9) One key issue is whether LLMs capture conceptual diversity: the variation among individuals’ representations of a particular domain. How do we measure this? And how does alignment affect this?
soniakmurthy.bsky.social
(2/9) There's a lot of interest right now in getting LLMs to mimic the response distributions of “populations”--heterogeneous collections of individuals– for the purposes of political polling, opinion surveys, and behavioral research.
soniakmurthy.bsky.social
(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @tomerullman.bsky.social and @jennhu.bsky.social, to appear at #NAACL2025! 🐟

We want models that match our values...but could this hurt their diversity of thought?
Preprint: arxiv.org/abs/2411.04427