Here's the result of my training runs:
• RQ-VAE to compress item embeddings into tokens
• SASRec to predict the next item (i.e., 4-tokens) exactly
• Qwen3-8B that can return recs and natural language!
eugeneyan.com/writing/sema...
Here's the result of my training runs:
• RQ-VAE to compress item embeddings into tokens
• SASRec to predict the next item (i.e., 4-tokens) exactly
• Qwen3-8B that can return recs and natural language!
eugeneyan.com/writing/sema...
Here's the result of my training runs:
• RQ-VAE to compress item embeddings into tokens
• SASRec to predict the next item (i.e., 4-tokens) exactly
• Qwen3-8B that can return recs and natural language!
eugeneyan.com/writing/sema...
• How it differs from basic Q&A
• What dimensions & metrics to eval on
• How to build llm-evaluators
• How to build eval datasets
• Benchmarks: narratives, technical docs, multi-docs
eugeneyan.com/writing/qa-e...
• How it differs from basic Q&A
• What dimensions & metrics to eval on
• How to build llm-evaluators
• How to build eval datasets
• Benchmarks: narratives, technical docs, multi-docs
eugeneyan.com/writing/qa-e...
• How it differs from basic Q&A
• What dimensions & metrics to eval on
• How to build llm-evaluators
• How to build eval datasets
• Benchmarks: narratives, technical docs, multi-docs
eugeneyan.com/writing/qa-e...
• How it differs from basic Q&A
• What dimensions & metrics to eval on
• How to build llm-evaluators
• How to build eval datasets
• Benchmarks: narratives, technical docs, multi-docs
eugeneyan.com/writing/qa-e...
• What makes an exceptional leader?
• What do exceptional leaders do?
• Leadership styles: Commando, soldier, police
• What makes an exceptional leader?
• What do exceptional leaders do?
• Leadership styles: Commando, soldier, police
• Migrated off deprecated jekyll-algolia to official sdk (better indexing)
• Added recommendations + relevance scores to each post
• Improved site responsiveness; fixed dark mode flicker
• Marie Kondo-ed unused files & dead code
• Migrated off deprecated jekyll-algolia to official sdk (better indexing)
• Added recommendations + relevance scores to each post
• Improved site responsiveness; fixed dark mode flicker
• Marie Kondo-ed unused files & dead code
eugeneyan.com/writing/news...
eugeneyan.com/writing/news...
Enrollment closes in 4 days.
Secret 35% discount code: maven.com/parlance-lab...
Enrollment closes in 4 days.
Secret 35% discount code: maven.com/parlance-lab...
When deconstructed, EDD is just the good old scientific method under a new name
When deconstructed, EDD is just the good old scientific method under a new name
They are intrinsically motivated, are driven to excel and do what's right, and and get so much shit done just because it's fun.
They are intrinsically motivated, are driven to excel and do what's right, and and get so much shit done just because it's fun.
eugeneyan.com/writing/eval...
eugeneyan.com/writing/eval...
They found that the best systems had neural metrics that did not correlate with human preferences.
arxiv.org/abs/2503.24013
They found that the best systems had neural metrics that did not correlate with human preferences.
arxiv.org/abs/2503.24013
• Read the source, docs, error msgs
• Simplify problems, write simple code
• Get their hands dirty
• Write to share & write well
• Have beginner's mind & keep learning
• Not afraid to say: I don't know
endler.dev/2025/best-pr...
• Read the source, docs, error msgs
• Simplify problems, write simple code
• Get their hands dirty
• Write to share & write well
• Have beginner's mind & keep learning
• Not afraid to say: I don't know
endler.dev/2025/best-pr...
• Don't skip error analysis
• Don't skip looking at your data
• Don't gatekeep who can write prompts
• Don't let zero users be a roadblock
• Don't be blindsided by criteria drift
• Don't skip error analysis
• Don't skip looking at your data
• Don't gatekeep who can write prompts
• Don't let zero users be a roadblock
• Don't be blindsided by criteria drift
> "the most effective route to improve outcomes was brute force: retry steps until they passed or reached a limit. We give the validation errors ... to the LLM and built a loop runner"
> "the most effective route to improve outcomes was brute force: retry steps until they passed or reached a limit. We give the validation errors ... to the LLM and built a loop runner"
- understanding things deeply, reading the actual source
- being willing to help other people
- status doesn’t matter, good ideas come from anywhere
endler.dev/2025/best-pr...
- understanding things deeply, reading the actual source
- being willing to help other people
- status doesn’t matter, good ideas come from anywhere
endler.dev/2025/best-pr...
Because books & movies were too large for LSTMs to do Q&A on, they embedded 200-word chunks and retrieved similar snippets to answer questions.
"Chunking and cosine similarity retrieval is so 2017."
arxiv.org/abs/1712.07040
Because books & movies were too large for LSTMs to do Q&A on, they embedded 200-word chunks and retrieved similar snippets to answer questions.
"Chunking and cosine similarity retrieval is so 2017."
arxiv.org/abs/1712.07040
Until then, here's some system designs:
• Retrieval vs. Ranking: eugeneyan.com/writing/syst...
• Real-time retrieval: eugeneyan.com/writing/real...
• Personalization: eugeneyan.com/writing/patt...
Until then, here's some system designs:
• Retrieval vs. Ranking: eugeneyan.com/writing/syst...
• Real-time retrieval: eugeneyan.com/writing/real...
• Personalization: eugeneyan.com/writing/patt...