Deedy
banner
deedydas.bsky.social
Deedy
@deedydas.bsky.social
VC at Menlo Ventures. Formerly founding team Glean, Google Search. Cornell CS. Tweets about tech, immigration, India, fitness and search.
Try it here: yiyan.baidu.com
March 16, 2025 at 5:17 PM
Baidu, the Google of China, just dropped two models today:
— ERNIE 4.5: beats GPT 4.5 for 1% of price
— Reasoning model X1: beats DeepSeek R1 for 50% of price.

China continues to build intelligence too cheap to meter. The AI price war is on.
March 16, 2025 at 5:17 PM
"Make it look like I was on a luxury five star hotel vacation"

Google Gemini really cooked with this one.

This is next gen photo editing.
March 14, 2025 at 2:24 AM
WOW the new Google Flash model is the first time ever that you can do targetted edits of pictures with English.

"Make the steak vegetarian"
"Make the bridge go away"
"Make the keyboard more colorful"

And my favorite
"Give the OpenAI logo more personality"
March 13, 2025 at 6:06 AM
AI is now making cutting edge science better.

The Nature published that reasoning LLMs found errors in 1% of the 10,000 research papers it analyzed with 35% false positive rate for $0.15-1/paper.

Anthropic founder’s view of “a country of geniuses in a data center” is happening.
March 9, 2025 at 3:40 AM
March 7, 2025 at 5:02 PM
HUGE New research paper shows how a 7B param AI model (90%) can beat OpenAI o1 (80%) on the MIT Integration Bee.

LADDER:
— Generate variants of problem
— Solve, verify, use GRPO (DeepSeek) to learn
TTRL:
— Do 1&2 when you see a new problem

New form of test time compute scaling!
March 7, 2025 at 5:02 PM
There are two categories:
— Daytona, for general purpose sort. Above numbers are Daytona.
— Indy, which can be specific to the 100-byte records with 10-byte keys.
Not super useful in practice though.

Link: sortbenchmark.org/
Google experiments on it: sortbenchmark.org/
March 3, 2025 at 3:30 AM
How well can computers sort 1 trillion numbers?

SortBenchmark, in distributed systems, measures this.
— How fast? 134s
— How cheap? $97
— How many in 1 minute? 370B numbers
— How much energy? ~59kJ or walking for 15mins

Every software engineer should know this.
March 3, 2025 at 3:30 AM
BREAKING DeepSeek just let the world know they make $200M/yr at 500%+ profit margin.

Revenue (/day): $562k
Cost (/day): $87k
Revenue (/yr): ~$205M

This is all while charging $2.19/M tokens on R1, ~25x less than OpenAI o1.

If this was in the US, this would be a >$10B company.
March 1, 2025 at 5:07 AM
Claude Artifact
Try out Artifacts created by Claude users
claude.site
February 26, 2025 at 2:34 AM
Claude's new Github "talk to your code" integration changes how engineers understand software.

Fork a repo.
Select a folder.
Ask it anything.
It even shows you what %age of the context window each folder takes.

Here it visualizes yt-dlp's (Youtube downloader) flow:
February 26, 2025 at 2:34 AM
I asked all 3 Deep Researches to "compare 25 LLMs in a table on 20 axes" to figure out which one was the best.

The winner was OpenAI.

It had the most detailed, high-quality and accurate answer, but you do pay $200/mo for it.
February 15, 2025 at 2:46 AM
February 14, 2025 at 2:47 AM
"The Mundanity of Excellence" [1989] is a timeless essay everyone ought to read in todays day and age.

Excellence is boring. It's making the same boring "correct" choice over and over again. You win by being consistent for longer.

Our short attention spans tend to forget that.
February 14, 2025 at 2:47 AM
Source: arxiv.org/pdf/2502.06807

(Check out the detailed code submissions and scoring in the appendix)
February 12, 2025 at 4:30 PM
HUGE: OpenAI o3 scores 394 of 600 in the International Olympiad of Informatics (IOI) 2024, earning a Gold medal and 18 in the world.

The model was NOT contaminated with this data and the 50 submission limit was used.

We will likely see superhuman coding models this year.
February 12, 2025 at 4:30 PM
February 12, 2025 at 3:06 AM
Everyone should be using this website to understand the inside of an LLM.

I'm surprised more people don't know about it. Benjamin Bycroft made this beautiful interactive visualization to show exactly how the inner workings of each of the weights of an LLM work.

Here's a link:
February 12, 2025 at 3:06 AM
New research shows that LLMs don't perform well on long context.

Perfect needle-in-the-haystack scores are easy—attention mechanisms can match the word. When you require 1-hop of reasoning, performance degrades quickly.

This is why guaranteeing correctness for agents is hard.
February 10, 2025 at 4:53 PM