Lightnews — Scholar-powered news

Taylor Sorensen @taylor-sorensen.bsky.social · 23h

Also check out these very cool related papers exploring diversity and mode collapse!
arxiv.org/pdf/2505.00047 from
Peter West et al.

arxiv.org/abs/2504.05228, arxiv.org/abs/2404.10859 by Yiming Zhang +
Daphne Ippolito et al.

arxiv.org/abs/2510.01171 by
Jiayi Zhang et al.

arxiv.org

1

Taylor Sorensen @taylor-sorensen.bsky.social · 23h

Check it out!
Paper: arxiv.org/abs/2510.06084
Models: huggingface.co/collections/...
Data and code: github.com/tsor13/spect...

Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability

Language model post-training has enhanced instruction-following and performance on many downstream tasks, but also comes with an often-overlooked cost on tasks with many possible valid answers. We cha...

arxiv.org

1 1

Taylor Sorensen @taylor-sorensen.bsky.social · 23h

In new work, we introduce a simple post-training method and large-scale resource for maximizing diversity and coverage! We call it Spectrum Tuning.
More on this in the coming days - but I'm really excited about this work, and am so happy that it's now public

1 1

Taylor Sorensen @taylor-sorensen.bsky.social · 23h

Pretrained models are better at this - they actually give you substantively different outputs when you sample. BUT, they are unable to reliably follow instructions.
How can we train models to follow instructions AND to span the space of possible outputs?

1 1

Taylor Sorensen @taylor-sorensen.bsky.social · 23h

This may seem like a silly toy example - shouldn’t we just use np.randint()?
Fair - but this simple case is illustrative of a broader weakness. What about creative writing? Or hypothesis generation? Or diverse data generation?
We need models that SPAN the entire output space.

1 1

Taylor Sorensen @taylor-sorensen.bsky.social · 23h

Current post-training teaches a model to output the highest reward answer, even if there are other good answers. E.g. when picking random numbers, 7 seems like the most “random” number to annotators - so models ALWAYS pick 7!
arxiv.org/pdf/2505.00047
arxiv.org/pdf/2203.02155
arxiv.org/pdf/2510.01171

1 1

Taylor Sorensen @taylor-sorensen.bsky.social · 23h

Did you know that LLMs suffer from serious mode collapse?

For example, if you ask models to tell you a joke, they almost always tell you the same joke? This is true across samples and even across model families!

Why does this happen? Can we improve it?

1 1 2

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 21

Others can disagree, but in my view it's important to understand the science behind diverse viewpoint modeling both for understanding potential risks and for building prosocial systems. AI alignment especially is a case where I think if we aren't careful, it will be easy for people to be left behind

4

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 21

That is a risk to personalization technologies like these 😬 However, I'm excited about AI's potential to _reduce_ polarization ( www.pnas.org/doi/full/10....), find common ground (www.science.org/doi/10.1126/...), and help people gain sympathy by exploring others' perspectives (ongoing work)

PNAS

Proceedings of the National Academy of Sciences (PNAS), a peer reviewed journal of the National Academy of Sciences (NAS) - an authoritative source of high-impact, original research that broadly spans...

www.pnas.org

1 2

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 21

That being said, I'm in total agreement that there are absolutely risks to the technology as well, and figuring out which technologies to deploy where in what way will be very important.

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 21

Additionally, I think steerability to diverse perspectives becomes even more important as AI systems start having more autonomy (e.g. AI agents). I want an AI agent that knows _my_ perspective, not just the average one!

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 21

That being said, it's my belief that all model responses already have _a_ worldview associated with it - so personally, I think it's important to have systems that a) we can measure/explictly see what perspectives it is being aligned to so b) many people's perspectives can be included.

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 21

Apologies if this wasn't clear! They're provided textually in an in-context prompt, which the model then tries to steer towards

And yes you are absolutely right, that's one of the risks of personalization in general (see great paper here: arxiv.org/pdf/2303.05453)

1

Reposted by Taylor Sorensen

Abhilasha Ravichander @lasha.bsky.social · Mar 21

Want to know what training data has been memorized by models like GPT-4?

We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models,

without requiring access to
🙅‍♀️ Model weights
🙅‍♀️ Training data
🙅‍♀️ Token probabilities 🧵 (1/5)

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models

High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. ...

arxiv.org

4 27 98

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

This was my Google DeepMind internship work with amazing collaborators Pushkar Mishra, @romapatel.bsky.social, Michael Henry Tessler, @mbakker.bsky.social, Georgina Evans, Iason Gabriel, @noahdgoodman.bsky.social, @verenarieser.bsky.social
They are a really amazing team!

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

Read the full paper here!
arxiv.org/abs/2503.15484

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

Value profiles enable new ways to model variation and enable representation at the individual level. We hope that our work helps enable systems that better model diverse perspectives and that work for everyone (yay for #pluralisticalignment #nlpforsocialgood #compsocialscience #compdemocracy!)

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

There are benefits though!
✅ Value profiles may enhance user agency, as, a person could change their own value profile
✅ Enabling value reflection via bottom-up discovery and top-down editing
✅ Unlike sociodemographics, which are often unchosen, people can choose values for themselves
(16/?)

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

Our goal in this work is to improve AI systems' ability to model diverse perspectives and better serve more people! However, risks remain
❌ Privacy risks: people may not wish values inferred
❌ Systems may fail to generalize to less common values, and we only test on English language
(15/?)

2

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

As a last experiment, we simulate an annotator population ("jury learning" @mitchellgordon) using with our trained models and value profiles.

We find that the instance-level interannotator agreement (IAA) predicted by our simulated population correlates with the observed IAA.
(14/?)

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

We also find that our value profile system is very well-calibrated.

This calibration is important for trusting the model's confidence and for disentangling value-related epistemic uncertainty from aleatoric uncertainty in rater variation.
(13/?)

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

Value profiles are written in natural language. But are they actually semantically interpretable? Does the system change its judgments with wording changes in common-sense ways?

Yes, we find that semantic changes in value profile lead to expected changes in the output.
(12/?)

1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

Clustering with value profiles also enables dataset-level qualitative analysis.

For example, for OQA/DIC, even restricting to just 2 clusters explains the majority of rater variation, suggesting a bimodal distribution.

Additionally, the profile descriptions suggest why people may disagree.
(11/?)

1 1

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

Our algorithm is effective at uncovering useful rater groupings, with the resulting value profile clusters outperform the most performant demographic grouping!

Additionally, on the dataset where demographics helped most, the clusters partially recover ideological trends.
(10/?)

1 3

Taylor Sorensen @taylor-sorensen.bsky.social · Mar 20

To characterize common modes of (dis)agreement, we introduce a value-based clustering algorithm.

Unlike traditional methods, ours: 1) does not require that raters label overlapping instances, 2) leverages semantic instance information, and 3) returns cluster descriptions.
(9/?)

1