Lightnews — Scholar-powered news

@laurenbjiang.bsky.social

Posts Replies Media Videos

Pinned

laurenbjiang.bsky.social @laurenbjiang.bsky.social · Apr 23

🚀 How well can LLMs know you and personalize your response? Turns out, not so much!

Introducing the PersonaMem Benchmark --
🎯Latest models (GPT-4.1, GPT-4.5, o4-mini, Llama-4, Gemini 2.0, Deepseek-R1, Claude-3.7) all struggle in personalization!
🧵(1/8)

Fig 1: Overview of PersonaMem benchmark. Each benchmark sample is a user persona with static (e.g., demographic info.) and dynamic attributes (e.g., evolving preferences). Users engage with a chatbot in multi-session interactions across a variety of topics such as food recommendation, travel planning, and therapy consultation. As the user’s preferences evolve over time, the benchmark offers annotated questions assessing whether models can track and incorporate the changes into their responses. Fig 2: Model performances by number of sessions elapsed since most recent preferences were mentioned in long context. Top: up to 20 sessions/128k tokens; Bottom: up to 60 sessions/1M tokens. Long-context retrieval is important for personalization in practice.

laurenbjiang.bsky.social

@laurenbjiang.bsky.social

Personalization becomes one of the next huge waves in AI 🌊🌊🌊

🚨 We release PersonaMem-v2, the best-quality dataset for LLM personalization, supporting your AI to better understand users and builds a memory that grows with each user over time.

Check our paper and data below👇
🧵(1/5)

December 22, 2025 at 7:25 PM

laurenbjiang.bsky.social

@laurenbjiang.bsky.social

April 23, 2025 at 6:00 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news