Good summary from @garymarcus.bsky.social on the controversy surrounding o3 and ARC-AGI benchmark results. Clearly frontier model developers are still struggling with how to think about AGI, AI progress, and how to explain this all to the public. open.substack.com/pub/garymarc...
Good summary from @garymarcus.bsky.social on the controversy surrounding o3 and ARC-AGI benchmark results. Clearly frontier model developers are still struggling with how to think about AGI, AI progress, and how to explain this all to the public. open.substack.com/pub/garymarc...
Interesting context from @fchollet.bsky.social the creator of ARC-AGI regarding the “saturation” of the current ARC benchmark. It will be very revealing to see how o3 performs on the new ARC2 benchmark in 2025
December 22, 2024 at 3:16 PM
Interesting context from @fchollet.bsky.social the creator of ARC-AGI regarding the “saturation” of the current ARC benchmark. It will be very revealing to see how o3 performs on the new ARC2 benchmark in 2025
To be fair, watching Jalen pick apart the Pels last night, the protection afforded three-point shooters, and the rise in shot-making from the perimeter has really opened up the passing and driving lanes. While threes have soared, so have assists.
December 22, 2024 at 2:15 PM
To be fair, watching Jalen pick apart the Pels last night, the protection afforded three-point shooters, and the rise in shot-making from the perimeter has really opened up the passing and driving lanes. While threes have soared, so have assists.
I think all of this raises really important questions about what we mean by the “G” and the “I” in AGI. (And to @garymarcus.bsky.social point, the “T” in “transparency” and “truth in advertising…”
December 22, 2024 at 2:08 PM
I think all of this raises really important questions about what we mean by the “G” and the “I” in AGI. (And to @garymarcus.bsky.social point, the “T” in “transparency” and “truth in advertising…”
So, I was expecting great things from 2.0 Flash, the newest consumer model. Here’s what I got from the same prompt. This highlights the challenge foundation model providers have balancing usefulness and “risk” - especially as the public becomes more familiar with LLM’s true capabilities
December 12, 2024 at 4:01 PM
So, I was expecting great things from 2.0 Flash, the newest consumer model. Here’s what I got from the same prompt. This highlights the challenge foundation model providers have balancing usefulness and “risk” - especially as the public becomes more familiar with LLM’s true capabilities
Here’s the prompt I gave Gemini Advanced 1.5 and its response. You can see it breaks down the complex task into meaningful and manageable chunks. And, amazingly, goes off and happily does the work, mining the web and social media sites, and synthesizing its findings. 2/3
December 12, 2024 at 4:01 PM
Here’s the prompt I gave Gemini Advanced 1.5 and its response. You can see it breaks down the complex task into meaningful and manageable chunks. And, amazingly, goes off and happily does the work, mining the web and social media sites, and synthesizing its findings. 2/3