Lightnews — Scholar-powered news

Adrian Chan

@gravity7.bsky.social

750 followers 620 following 410 posts

Bridging IxD, UX, & Gen AI design & theory. Ex Deloitte Digital CX. Stanford '88 IR. Edinburgh, Berlin, SF. Philosophy, Psych, Sociology, Film, Cycling, Guitar, Photog. Linkedin: adrianchan. Web: gravity7.com. Insta, X, medium: @gravity7

Posts Media Videos Starter Packs

Adrian Chan @gravity7.bsky.social · Jun 9

Those #LLM reward models like sycophancy even more than you do!

Researchers find preferences for verbosity, listicles, vagueness, and jargon even higher among LLM-based reward models (synthetic data) than among us humans.
#AI #AIalignment
arxiv.org/abs/2506.05339

Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models

Language models serve as proxies for human preference judgements in alignment and evaluation, yet they exhibit systematic miscalibration, prioritizing superficial patterns over substantive qualities. ...

Adrian Chan @gravity7.bsky.social · Jun 8

Everybody talking about the "new" apple paper might find this MLST interview with @rao2z.bsky.social interesting. "Reasoning" and "inner thoughts" of LLMs were exposed as self-mumblings and fumblings long ago. #LLMs #AI
www.youtube.com/watch?v=y1Wn...

Do you think that ChatGPT can reason?

YouTube video by Machine Learning Street Talk

www.youtube.com

Adrian Chan @gravity7.bsky.social · Jun 3

yes - people will still need a phone, and a lot of AI products, services, and UI will need a screen. and a touchable one at that.

Adrian Chan @gravity7.bsky.social · May 21

This is interesting, published yesterday. CoT type reasoning shifts attention away from instruction tokens. Paper proposes "constraint attention" to keep models attentive to instructions when doing CoT.
#AI #LLM

www.arxiv.org/abs/2505.11423

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Reasoning-enhanced large language models (RLLMs), whether explicitly trained for reasoning or prompted via chain-of-thought (CoT), have achieved state-of-the-art performance on many complex reasoning ...

Adrian Chan @gravity7.bsky.social · May 16

"What's the best way to think about this?" #LLM research produces encyclopedia of reasoning strategies, allowing models to select the best way to reason through problems.

arxiv.org/abs/2505.10185

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Long chain-of-thought (CoT) is an essential ingredient in effective usage of modern large language models, but our understanding of the reasoning strategies underlying these capabilities remains limit...

Adrian Chan @gravity7.bsky.social · May 14

Clarifying questions w #LLMs increase user satisfaction when users can see the point of answering them. Specific questions beat generic ones.

But I wonder if this changes when #agents are personal assistants, & are more personal & more aware.

#UX #AI #Design

arxiv.org/abs/2402.01934

Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness

Clarifying questions are an integral component of modern information retrieval systems, directly impacting user satisfaction and overall system performance. Poorly formulated questions can lead to use...

Adrian Chan @gravity7.bsky.social · May 14

Interesting - could #LLMs in search capture context missed when googling?

"backtracing ... retrieve the cause of the query from a corpus. ... targets the information need of content creators who wish to improve their content in light of questions from information seekers."
arxiv.org/abs/2403.03956

Backtracing: Retrieving the Cause of the Query

Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures). While information retrieval (IR) systems may provide answers for such user queries, they...

Adrian Chan @gravity7.bsky.social · May 14

They mostly test whether they can steer pos/neg responses. But given Shakespeare was a test also, wld be interesting to extract style vectors from any number of authors then compare generations. (Is this approach used in those "historical avatars?" No idea.)

Adrian Chan @gravity7.bsky.social · May 14

@tedunderwood.me In case you haven't seen this paper, you might find interesting. Researchers extract style vectors (incl from Shakespeare) and apply to an LLM internal layers instead of training on original texts. Generations can then be "steered" to a desired style.

arxiv.org/abs/2402.01618

Style Vectors for Steering Generative Large Language Model

This research explores strategies for steering the output of large language models (LLMs) towards specific styles, such as sentiment, emotion, or writing style, by adding style vectors to the activati...

Adrian Chan @gravity7.bsky.social · May 14

But design will need to focus on tweaking model interactions so that they track conversational content and turns over time. For example with bi-directional prompting: models prompt users to keep conversations on track.

This seems a rich opportunity for interaction design #UX #IxD #LLMs #AI

Adrian Chan @gravity7.bsky.social · May 14

to sustain dialog. Social interaction face to face or online is already vulnerable to misunderstandings and failures, and we have use of countless signals, gestures, etc w which to rescue our interactions.

A communication-first approach to LLMs for conversation makes sense, as talk is not writing.

Adrian Chan @gravity7.bsky.social · May 14

"when LLMs take a wrong turn in a conversation, they get lost and do not recover."

Interaction design is going to be necessary to scaffold LLMs for talk, be it voice or single user chat or multi-user (e.g. social media).

It's one thing to read/summarize written documents, quite another ...

Adrian Chan @gravity7.bsky.social · May 14

"LLMs tend to (1) generate overly verbose responses, leading them to (2) propose final solutions prematurely in conversation, (3) make incorrect assumptions about underspecified details, and (4) rely too heavily on previous (incorrect) answer attempts."

arxiv.org/abs/2505.06120

LLMs Get Lost In Multi-Turn Conversation

Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, ...

Adrian Chan @gravity7.bsky.social · May 14

"LLMs ... recognize graph-structured data... However... we found that even when the topological connection information was randomly shuffled, it had almost no effect on the LLMs’ performance... LLMs did not effectively utilize the correct connectivity information."
www.arxiv.org/abs/2505.02130

Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data

Attention mechanisms are critical to the success of large language models (LLMs), driving significant advancements in multiple fields. However, for graph-structured data, which requires emphasis on to...

Adrian Chan @gravity7.bsky.social · May 12

Perhaps one could fine tune on Lewis Carroll, then feed the model with philosophical paradoxes, and see whether the model produces more imaginative generations.

Adrian Chan @gravity7.bsky.social · May 12

I think because this isn't making the model trip, synesthetically, but is simply giving it juxtapositions. So what is studied is a response to these paradoxical and conceptually incompatible prompts, not a measure of any latent conceptual activations or features.

Adrian Chan @gravity7.bsky.social · May 12

Let's dose an LLM and study its hallucinations!

LLMs were fed "blended" prompts, impossible conceptual combinations, meant to elicit hallucinations. Models did not trip, but instead tried to reason their way through their responses.

arxiv.org/abs/2505.00557

Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models

Hallucinations in large language models (LLMs) present a growing challenge across real-world applications, from healthcare to law, where factual reliability is essential. Despite advances in alignment...

Adrian Chan @gravity7.bsky.social · May 10

Yes and the label applied says as much about the person as it does about the model. In the world of creatives, the most-used term now is "slop," derived perhaps from enshitification. The latter capturing corporate malice where the "slop" is AI-generated byproduct unfit for human consumption...

Adrian Chan @gravity7.bsky.social · May 10

Thread started w your second post so yes I missed the initial post. Never mind.

Adrian Chan @gravity7.bsky.social · May 10

Assuming alignment using synthetic data is undesirable, one route is to complement global alignment (alignment to some "universally" preferred human values) w local, contextualized alignment, via feedback and use by the user. Tune the LLM's behavior to user preferences.

Adrian Chan @gravity7.bsky.social · May 10

Customized LLMs use the feedback obtained from the individual user interactions and align to those.

Adrian Chan @gravity7.bsky.social · May 10

Staying power of ceasefires becoming a proxy for multilateral resilience amid baseline rivalries?

Adrian Chan @gravity7.bsky.social · May 10

I think this will be one accelerant for individualized/personally customized AI - e.g. personal assistants. The verifiers can use the user's preferences and tune to those rather than apply globally aligned behavioral rules.

Adrian Chan @gravity7.bsky.social · May 10

It's also a problem of use cases and user adoption. Though it may turn out that Transformer-based AI does indeed fail to meet expectations.

There's a lot of misunderstanding and anthropomorphism of AI's reasoning, for example, that might not turn out well.

Adrian Chan @gravity7.bsky.social · May 10

Coincidentally many startups of that time set up in loft & warehouse spaces w exposed concrete & steel beams.... I like this analogy especially for Social Interaction Design/Social UX, where "social architecture" is exposed for users to take up in norms, behaviors, expectations for how to engage