Eugene Yan
@eugeneyan.com
11K followers 330 following 480 posts
RecSys, AI, Engineering; Principal Applied Scientist @ Amazon. Led ML @ Alibaba, Lazada, Healthtech Series A. Writing @ eugeneyan.com, aiteratelabs.com.
Posts Media Videos Starter Packs
Pinned
eugeneyan.com
I've been nerdsniped by the idea of Semantic IDs.

Here's the result of my training runs:
• RQ-VAE to compress item embeddings into tokens
• SASRec to predict the next item (i.e., 4-tokens) exactly
• Qwen3-8B that can return recs and natural language!

eugeneyan.com/writing/sema...
How to Train an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs
An LLM that can converse in English & item IDs, and make recommendations w/o retrieval or tools.
eugeneyan.com
Reposted by Eugene Yan
ethanrosenthal.com
I’ve seen semantic IDs pop up but never bothered to actually look into them. This write up from @eugeneyan.com is a great intro that also illustrates why they’re pretty interesting for mixing recsys and LLMs eugeneyan.com/writing/sema...
How to Train an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs
An LLM that can converse in English & item IDs, and make recommendations w/o retrieval or tools.
eugeneyan.com
eugeneyan.com
yeap the plan is to map products to semantic IDs which the model can understand
eugeneyan.com
demo of the LLM-recommender hybrid returning both semantic IDs & english, and:
• steering recs via natural language
• explaining the recommendation
• naming the bundle of recommendations
• multi-turn conversation to get recs

watch till the end for the bloopers lol

www.youtube.com/watch?v=_0n4...
LLM-Recomender Hybrid with Steerable Recommendations and Reasoning on Recommendations
YouTube video by Eugene Yan
www.youtube.com
eugeneyan.com
For example, given a sequence of items, it can recommend the next best item. But better than that, you can steer the recommendations with natural language! And it can explain why it gave that recommendation, as well as creatively name recommendation bundles.
eugeneyan.com
I've been nerdsniped by the idea of Semantic IDs.

Here's the result of my training runs:
• RQ-VAE to compress item embeddings into tokens
• SASRec to predict the next item (i.e., 4-tokens) exactly
• Qwen3-8B that can return recs and natural language!

eugeneyan.com/writing/sema...
How to Train an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs
An LLM that can converse in English & item IDs, and make recommendations w/o retrieval or tools.
eugeneyan.com
Reposted by Eugene Yan
eugeneyan.com
Wrote an intro to evals for long-context Q&A systems:
• How it differs from basic Q&A
• What dimensions & metrics to eval on
• How to build llm-evaluators
• How to build eval datasets
• Benchmarks: narratives, technical docs, multi-docs

eugeneyan.com/writing/qa-e...
Evaluating Long-Context Question & Answer Systems
Evaluation metrics, how to build eval datasets, eval methodology, and a review of several benchmarks.
eugeneyan.com
eugeneyan.com
Wrote an intro to evals for long-context Q&A systems:
• How it differs from basic Q&A
• What dimensions & metrics to eval on
• How to build llm-evaluators
• How to build eval datasets
• Benchmarks: narratives, technical docs, multi-docs

eugeneyan.com/writing/qa-e...
Evaluating Long-Context Question & Answer Systems
Evaluation metrics, how to build eval datasets, eval methodology, and a review of several benchmarks.
eugeneyan.com
eugeneyan.com
Some thoughts on leadership: eugeneyan.com/writing/lead...
• What makes an exceptional leader?
• What do exceptional leaders do?
• Leadership styles: Commando, soldier, police
eugeneyan.com
For example, Amazon started to implement the first version of Amazon Prime in late 2004 and announced it on February 2 2005, six weeks later. An account of how it came amount and lots of anecdotes here. vox.com/recode/2019/...

Also this list: patrickcollison.com/fast
The making of Amazon Prime, the internet’s most successful and devastating membership program
An oral history of the subscription service that changed online shopping forever.
vox.com
eugeneyan.com
The best leaders I’ve worked with operate with perma-urgency. They act like early founders, mindful of existential threats. And they can balance speed, sustainability, and repay tech debt. Ultimately, customers love it and teams thrive when we ship fast to deliver delight.
eugeneyan.com
converted all images to webp and hopefully made the site faster. something i wouldn't have bothered in the past
✅ selfcheckgpt.jpg: 226.15KB → 60.53KB (73.24% reduction)
✅ query-processing.jpg: 95.74KB → 42.84KB (55.25% reduction)
✅ sldc-specialists.jpg: 30.88KB → 12.37KB (59.95% reduction)
✅ feature-store-ad.png: 157.24KB → 57.73KB (63.29% reduction)
✅ llm-patterns-aieng-2023-v0-004.jpg: 132.81KB → 43.50KB (67.24% reduction)
✅ google-user-intent.png: 46.67KB → 24.80KB (46.86% reduction)
✅ quy-nguyen.jpeg: 4.27KB → 1.98KB (53.61% reduction)
✅ fbi-tab2.jpg: 396.10KB → 128.28KB (67.61% reduction)
✅ ey-fastball.png: 4.94KB → 0.46KB (90.71% reduction)
✅ favicon-16x16.png: 0.60KB → 0.26KB (56.96% reduction)
✅ android-chrome-192x192.png: 11.50KB → 2.96KB (74.30% reduction)
✅ apple-touch-icon.png: 10.15KB → 2.55KB (74.88% reduction)
✅ android-chrome-512x512.png: 33.22KB → 6.72KB (79.77% reduction)
✅ favicon-32x32.png: 1.35KB → 0.45KB (66.45% reduction)
✅ 404-8.jpg: 43.90KB → 21.81KB (50.32% reduction)
✅ 404-9.jpg: 42.80KB → 16.86KB (60.60% reduction)
✅ 404.jpg: 25.67KB → 6.37KB (75.20% reduction)
✅ 404-11.jpg: 91.48KB → 13.71KB (85.02% reduction)
✅ 404-10.jpg: 309.60KB → 34.35KB (88.91% reduction)
✅ 404-12.jpg: 68.38KB → 50.75KB (25.79% reduction)
✅ 404-13.jpg: 73.79KB → 38.56KB (47.74% reduction)
✅ 404-14.jpg: 86.34KB → 45.91KB (46.82% reduction)
✅ 404-1.jpg: 25.67KB → 6.37KB (75.20% reduction)
✅ 404-2.jpg: 113.98KB → 67.45KB (40.82% reduction)
✅ 404-3.jpg: 38.06KB → 13.68KB (64.04% reduction)
✅ 404-7.jpg: 55.02KB → 26.54KB (51.77% reduction)
✅ 404-6.jpg: 45.98KB → 15.45KB (66.41% reduction)
✅ 404-4.jpg: 94.63KB → 50.33KB (46.82% reduction)
✅ 404-5.jpg: 48.97KB → 17.69KB (63.87% reduction)

==================================================
SUMMARY STATISTICS
==================================================
Total files converted: 1002
Total original size: 122.78MB
Total WebP size: 38.75MB
Total size reduction: 84.03MB (68.44%)
Average size reduction per file: 68.44%
Proportional savings: 122.78MB → 38.75MB
==================================================
eugeneyan.com
Previously, these tasks weren't worth the effort but now they can be done in hours. What an amazing time to build and play =D
eugeneyan.com
Had a fun couple of hours this weekend with Codex & Windsurf
• Migrated off deprecated jekyll-algolia to official sdk (better indexing)
• Added recommendations + relevance scores to each post
• Improved site responsiveness; fixed dark mode flicker
• Marie Kondo-ed unused files & dead code
Image of recommender widget at the bottom of posts on eugeneyan.com
eugeneyan.com
In orgs pushing the envelope, there's always a minority that can be counted on to get shit done against all odds, driven by force of will, resourcefulness, influence, etc. When you identify them, vest in them authority, autonomy, and step back and watch them perform miracles.
eugeneyan.com
opps! thanks for letting me know, fixed!
bsky share button
eugeneyan.com
p.s., If you’re interested in topics like this, my friends Ben and Swyx are organizing the AI Engineer World’s Fair in San Francisco on 3rd - 5th June. Come talk to builders deploying AI systems in production. Here’s a big discount for tickets: ti.to/software-3/a...
AI Engineer World's Fair 2025
The AI Engineer World's Fair is the biggest technical AI event of the year, happening Summer 2025, the one place you can meet with ~every major AI lab from OpenAI to Anthropic to Cohere, every AI infr...
ti.to
eugeneyan.com
Here's a three-minute demo of news-agents in action. It's pretty cool at the 30-second mark how the sub-agents get spawned! We then see the main agent assigning tasks and polling for progress, and finally shutting the sub-agents down when they're done with their assigned tasks.
eugeneyan.com
To better understand MCPs and agentic workflows, I built news-agents to generate a daily news recap. The main agent spawns sub-agents, assigning them news feeds to parse and summarize, and then generates a final overall summary plus analysis.

eugeneyan.com/writing/news...
Building News Agents for Daily News Recaps with MCP, Q, and tmux
Learning to automate simple agentic workflows with Amazon Q CLI, Anthropic MCP, and tmux.
eugeneyan.com
eugeneyan.com
@hamel.bsky.social & @sh-reya.bsky.social are two of the world's best on evals. They've built evals for 35+ AI apps & helped teams ship confidently. Now they'll teach everything they know on building evals that work.

Enrollment closes in 4 days.

Secret 35% discount code: maven.com/parlance-lab...
Effective Evals for AI products
eugeneyan.com
The Art of Doing Science and Engineering: Learning to Learn by Richard Hamming only $1.99 for the Kindle version today: amazon.com/dp/B088TMLQDC
The Art of Doing Science and Engineering: Learning to Learn by Richard Hamming
Reposted by Eugene Yan
harrisonpim.com
Enjoyed this on eval-driven product development from @eugeneyan.com. It chimes with my own experiences building around LLMs and search engines, including the thoughts on automated evaluators.
When deconstructed, EDD is just the good old scientific method under a new name
An LLM‑as‑Judge Won't Save The Product—Fixing Your Process Will
Applying the scientific method, building via eval-driven development, and monitoring AI output.
eugeneyan.com
eugeneyan.com
Surround yourself with people whose "work" is their calling, craft, and play.

They are intrinsically motivated, are driven to excel and do what's right, and and get so much shit done just because it's fun.
Reposted by Eugene Yan
eugenevinitsky.bsky.social
Some of the anti-AI stuff feels a bit like when people would say "don't use Wikipedia as a source." It's just like anything else, a piece of information that you weigh against multiple sources and your own understanding of its likely failure modes