Lightnews — Scholar-powered news

Eugene Yan

@eugeneyan.com

Some thoughts on leadership: eugeneyan.com/writing/lead...
• What makes an exceptional leader?
• What do exceptional leaders do?
• Leadership styles: Commando, soldier, police

May 21, 2025 at 2:17 AM

Eugene Yan

@eugeneyan.com

converted all images to webp and hopefully made the site faster. something i wouldn't have bothered in the past

✅ selfcheckgpt.jpg: 226.15KB → 60.53KB (73.24% reduction)
✅ query-processing.jpg: 95.74KB → 42.84KB (55.25% reduction)
✅ sldc-specialists.jpg: 30.88KB → 12.37KB (59.95% reduction)
✅ feature-store-ad.png: 157.24KB → 57.73KB (63.29% reduction)
✅ llm-patterns-aieng-2023-v0-004.jpg: 132.81KB → 43.50KB (67.24% reduction)
✅ google-user-intent.png: 46.67KB → 24.80KB (46.86% reduction)
✅ quy-nguyen.jpeg: 4.27KB → 1.98KB (53.61% reduction)
✅ fbi-tab2.jpg: 396.10KB → 128.28KB (67.61% reduction)
✅ ey-fastball.png: 4.94KB → 0.46KB (90.71% reduction)
✅ favicon-16x16.png: 0.60KB → 0.26KB (56.96% reduction)
✅ android-chrome-192x192.png: 11.50KB → 2.96KB (74.30% reduction)
✅ apple-touch-icon.png: 10.15KB → 2.55KB (74.88% reduction)
✅ android-chrome-512x512.png: 33.22KB → 6.72KB (79.77% reduction)
✅ favicon-32x32.png: 1.35KB → 0.45KB (66.45% reduction)
✅ 404-8.jpg: 43.90KB → 21.81KB (50.32% reduction)
✅ 404-9.jpg: 42.80KB → 16.86KB (60.60% reduction)
✅ 404.jpg: 25.67KB → 6.37KB (75.20% reduction)
✅ 404-11.jpg: 91.48KB → 13.71KB (85.02% reduction)
✅ 404-10.jpg: 309.60KB → 34.35KB (88.91% reduction)
✅ 404-12.jpg: 68.38KB → 50.75KB (25.79% reduction)
✅ 404-13.jpg: 73.79KB → 38.56KB (47.74% reduction)
✅ 404-14.jpg: 86.34KB → 45.91KB (46.82% reduction)
✅ 404-1.jpg: 25.67KB → 6.37KB (75.20% reduction)
✅ 404-2.jpg: 113.98KB → 67.45KB (40.82% reduction)
✅ 404-3.jpg: 38.06KB → 13.68KB (64.04% reduction)
✅ 404-7.jpg: 55.02KB → 26.54KB (51.77% reduction)
✅ 404-6.jpg: 45.98KB → 15.45KB (66.41% reduction)
✅ 404-4.jpg: 94.63KB → 50.33KB (46.82% reduction)
✅ 404-5.jpg: 48.97KB → 17.69KB (63.87% reduction)

==================================================
SUMMARY STATISTICS
==================================================
Total files converted: 1002
Total original size: 122.78MB
Total WebP size: 38.75MB
Total size reduction: 84.03MB (68.44%)
Average size reduction per file: 68.44%
Proportional savings: 122.78MB → 38.75MB
==================================================

May 18, 2025 at 11:09 PM

Eugene Yan

@eugeneyan.com

Had a fun couple of hours this weekend with Codex & Windsurf
• Migrated off deprecated jekyll-algolia to official sdk (better indexing)
• Added recommendations + relevance scores to each post
• Improved site responsiveness; fixed dark mode flicker
• Marie Kondo-ed unused files & dead code

May 18, 2025 at 9:06 PM

Eugene Yan

@eugeneyan.com

opps! thanks for letting me know, fixed!

May 7, 2025 at 2:57 AM

Eugene Yan

@eugeneyan.com

Here's a three-minute demo of news-agents in action. It's pretty cool at the 30-second mark how the sub-agents get spawned! We then see the main agent assigning tasks and polling for progress, and finally shutting the sub-agents down when they're done with their assigned tasks.

May 7, 2025 at 12:24 AM

Eugene Yan

@eugeneyan.com

@hamel.bsky.social & @sh-reya.bsky.social are two of the world's best on evals. They've built evals for 35+ AI apps & helped teams ship confidently. Now they'll teach everything they know on building evals that work.

Enrollment closes in 4 days.

Secret 35% discount code: maven.com/parlance-lab...

April 30, 2025 at 2:56 AM

Eugene Yan

@eugeneyan.com

The Art of Doing Science and Engineering: Learning to Learn by Richard Hamming only $1.99 for the Kindle version today: amazon.com/dp/B088TMLQDC

April 27, 2025 at 11:01 PM

Eugene Yan

@eugeneyan.com

Great example of generate -> validate loop + error analysis

> "the most effective route to improve outcomes was brute force: retry steps until they passed or reached a limit. We give the validation errors ... to the LLM and built a loop runner"

April 15, 2025 at 1:57 AM

Eugene Yan

@eugeneyan.com

Stumbled on the first(?) RAG in NarrativeQA from 2017.

Because books & movies were too large for LSTMs to do Q&A on, they embedded 200-word chunks and retrieved similar snippets to answer questions.

"Chunking and cosine similarity retrieval is so 2017."

arxiv.org/abs/1712.07040

4.3 Neural Benchmarks on Stories The design of the NarrativeQA dataset makes the straight-forward application of the existing neural architectures computationally infeasible, as this would require running an recurrent neural network on sequences of hundreds of thousands of time steps or computing a distribution over the entire input for attention, as is common. We split the task into two steps: first, we retrieve a small number of relevant passages from the story using an IR system, and subsequently, apply one of

the neural models above on the resulting document. The question becomes the query for retrieval. This IR problem is much harder that traditional document retrieval, as the documents, the passages here, are very similar, and the question is short and entities mentioned likely occur many times in the story. Our retrieval system considers chunks of 200 words from story and computes representations for all chunks and the query. We then select a varying number of such chunks based on their similarity to the query. We experiment with different representations and similarity measures in Section 5. Finally, we concatenate the selected chunks in the correct temporal order and insert delimiters between them to obtain a much shorter document. For span prediction models, we then further select a span from the retrieved chunks as described in Section 4.2.

April 12, 2025 at 5:34 PM

Eugene Yan

@eugeneyan.com

If you were building a Q&A feature (or chatbot) based on very long documents (like books), what evals would you focus on?

April 9, 2025 at 1:48 AM

Eugene Yan

@eugeneyan.com

Can't wait for when I can vibe code a production recommender system.

Until then, here's some system designs:

• Retrieval vs. Ranking: eugeneyan.com/writing/syst...
• Real-time retrieval: eugeneyan.com/writing/real...
• Personalization: eugeneyan.com/writing/patt...

April 8, 2025 at 5:14 AM

Eugene Yan

@eugeneyan.com

Your favorite AI writer's favorite AI writer

To Eugene,

My favorite AI writer

Chip Huyen

April 5, 2025 at 4:20 PM

Eugene Yan

@eugeneyan.com

includes resources on writing from my favourite writers

Any resources you’d recommend on the topic of writing?

Writing, Briefly
Write Like You Talk
Write Simply
Why Everyone Should Write
Writing Better
Easy Reading Is Damn Hard Writing
Mise en Place Writing
Amazon Writing Style Tips
Some Blogging Myths
Some Tactics for Writing in Public
Some Thoughts on Writing
10 years of professional blogging – what I’ve learned
Lessons from content marketing myself (aka blogging) for five years
Make Your Writing Work Harder For You
What I learned writing a book
How Jeff Bezos Turned Narrative into Amazon’s Competitive Advantage
Seemingly Paradoxical Rules of Writing
What I Did Not Learn About Writing In School
What I Learned from Writing Online - For Fellow Non-Writers
How to Write Better with The Why, What, How Framework
How to Write Design Docs for Machine Learning Systems
Writing Tools: 55 Essential Strategies for Every Writer

April 2, 2025 at 2:07 AM

Eugene Yan

@eugeneyan.com

Been querying gpt-4.5 and it's better in ways we can't quantify yet: creativity, humor, world knowledge, wisdom, nuance, based, etc.

Excited about how we'll discover new ways to evaluate gpt-4.5 on these aspects which will also transfer to product / application related evals

My reaction is that there is an evaluation crisis. I don't really know what metrics to look at right now.
MMLU was a good and useful for a few years but that's long over.
SWE-Bench Verified (real, practical, verified problems) I really like and is great but itself too narrow.
Chatbot Arena received so much focus (partly my fault?) that LLM labs have started to really overfit to it, via a combination of prompt mining (from API requests), private evals bombardment, and, worse, explicit use of rankings as training supervision. I think it's still ~ok and there's a lack of "better", but it feels on decline in signal.
There's a number of private evals popping up, an ensemble of which might be one promising path forward.
In absence of great comprehensive evals I tried to turn to vibe checks instead, but I now fear they are misleading and there is too much opportunity for confirmation bias, too low sample size, etc., it's just not great.

TLDR my reaction is I don't really know how good these models are right now.

March 2, 2025 at 8:56 PM

Eugene Yan

@eugeneyan.com

cdn.openai.com/gpt-4-5-syst...

February 27, 2025 at 7:09 PM

Eugene Yan

@eugeneyan.com

agent ≈ model + tools, within a for-loop + environment

slide on openai agent definition from swyx talk

February 26, 2025 at 2:05 AM

Eugene Yan

@eugeneyan.com

♥️ it's tricky to separate what i do on the job (at the bookstore i work at) and what i hack on in my personal time. out of abundance of caution, to not discuss possible proprietary info, i won't be sharing more about the backend of aireadingclub.com 😔

👋 Hi there! I work on <redacted> at Airbnb. I really enjoy your writing.

As a personal project I started trying to write something kind of like AI reading club. I wanted a more powerful version of the "x ray" feature in Kindle, because sometimes when I pick up a new book in a series or return to a book after a break, I cannot remember all of the characters or the plot. I work mainly on <redacted> and not so much app development, and I really struggled with how to build my AI x ray. AI reading club is awesome and I was curious about how you built the retrieval pipeline, any preprocessing you did to the text, etc.

Thanks for publishing so much great work. My team and many of the machine learning engineers whom I support at Airbnb frequently share your posts.

Best regards

January 28, 2025 at 3:31 AM

Eugene Yan

@eugeneyan.com

Thanks to the hundreds of readers who've tried aireadingclub.com and interacted with Dewey.

If you've tried aireadingclub and have feedback, feature ideas, or thoughts on how AI can help you get more out of reading, please comment or dm me 🙏

Books on AI Reading Club and the number of messages on them.

January 22, 2025 at 10:47 PM

Eugene Yan

@eugeneyan.com

> Nobody tells you the variables you should be regressing. What's the target? What's the source? Do you notice when results are rubbish? ... That's why I think you need smart people who appear to do something technically easy but actually not so easy.

news.ycombinator.com/item?id=1906...

"...I joined a hedged fund, Renaissance Technologies, I'll make a comment about that. It's funny that I think the most important thing to do on data analysis is to do the simple things right. So, here's a kind of non-secret about what we did at renaissance: in my opinion, our most important statistical tool was simple regression with one target and one independent variable. It's the simplest statistical model you can imagine. Any reasonably smart high school student could do it. Now we have some of the smartest people around, working in our hedge fund, we have string theorists we recruited from Harvard, and they're doing simple regression. Is this stupid and pointless? Should we be hiring stupider people and paying them less? And the answer is no. And the reason is nobody tells you what the variables you should be regressing [are]. What's the target. Should you do a nonlinear transform before you regress? What's the source? Should you clean your data? Do you notice when your results are obviously rubbish? And so on. And the smarter you are the less likely you are to make a stupid mistake. And that's why I think you often need smart people who appear to be doing something technically very easy, but actually usually not so easy.]
[[at] my hedge fund, which was not a very big company, we had 7 Phd's just cleaning data and organizing the databases]"

January 17, 2025 at 1:22 AM

Eugene Yan

@eugeneyan.com

okay let's see what bugs come up lol 🤞

Screenshot from google analytics showing 60 active users per minute

January 15, 2025 at 2:20 AM

Eugene Yan

@eugeneyan.com

Finally, if we need help with a term or character that was previously mentioned, Dewey can help with a summary of the term so we don’t have to look it up ourselves.

January 15, 2025 at 1:51 AM

Eugene Yan

@eugeneyan.com

If you've stopped reading a book for a while, it can be challenging to pick it up again and remember what you've read. To help with this, it can help with summarizing the book up to the current page and refresh our memory, highlighting major themes, characters, and concepts.

January 15, 2025 at 1:51 AM

Eugene Yan

@eugeneyan.com

It can also help with creating quizzes / flashcards. The goal here is to test our knowledge and improve retention.

January 15, 2025 at 1:51 AM

Eugene Yan

@eugeneyan.com

With the context, it can answer simple queries via "Explain" and "Discuss". The goal is to keep us in flow while reading, instead of having to reread other sections of the book or open a web browser for our queries.

January 15, 2025 at 1:50 AM

Eugene Yan

@eugeneyan.com

At the heart of AI Reading Club is Dewey, your AI reading companion.

It understands context via selected text or the page we're on. This explicit context is displayed during discussions. At the same time, behind the scenes, it can retrieve and consider the rest of the book as implicit context.

It understands our context either via the text we select or the page we're on. This explicit context is displayed during discussions. At the same time, behind the scenes, it can also retrieve and consider the rest of the book as implicit context.

January 15, 2025 at 1:50 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news