• giving a wrong answer to a puzzle
• giving Python code that could test its claim
• asserting it obtained results from the code supporting its claim
• but when I ran the code myself, it showed the claim was false.
g.co/gemini/share...
• giving a wrong answer to a puzzle
• giving Python code that could test its claim
• asserting it obtained results from the code supporting its claim
• but when I ran the code myself, it showed the claim was false.
g.co/gemini/share...
LLMs are bad at computation; side-effect of Faustian bargain for better usage of parallel flops
@letta.com figured out how to game memory benchmark LoCoMo — and then they wrote about it!
the takeaways are actually quite incredible. like, vectors & graphs are cool, but you’re better off just giving an agent better tools
www.letta.com/blog/benchma...
LLMs are bad at computation; side-effect of Faustian bargain for better usage of parallel flops
Sonnet: Dave the diver, Raft, Subnautica
Gemini Exp 1206: Dredge, Dave the Diver, Moonglow Bay
Gemini Flash Thinker: Fishing: North Atlantic /A Fishing sim, Sailwind, Subnautica
GPT4o: Fishing: North Atlantic, Dredge, Sailwind
o1: Dredge, Moonglow Bay, Call of the Sea
Sonnet: Dave the diver, Raft, Subnautica
Gemini Exp 1206: Dredge, Dave the Diver, Moonglow Bay
Gemini Flash Thinker: Fishing: North Atlantic /A Fishing sim, Sailwind, Subnautica
GPT4o: Fishing: North Atlantic, Dredge, Sailwind
o1: Dredge, Moonglow Bay, Call of the Sea
For attention, based on a 2023 paper, the inhomogenous mixtures of vMFs on a hypersphere perspective.
For attention, based on a 2023 paper, the inhomogenous mixtures of vMFs on a hypersphere perspective.
www.youtube.com/watch?v=S7Zp...
www.youtube.com/watch?v=S7Zp...
There are all kinds of injection possibilities; including web search, tool use, nesting and fork + join.
There are all kinds of injection possibilities; including web search, tool use, nesting and fork + join.
Personally, I think that's a good thing.
Personally, I think that's a good thing.
You can read the first two chapters here:
www.gregegan.net/MORPHOTROPHI...
You can read the first two chapters here:
www.gregegan.net/MORPHOTROPHI...
Given that, I think we're somewhere around Level 3 to 4 (3 > level < 5), with most of the useful stuff < level 4. As of today, level 5 does not appear to be in sight.
Given that, I think we're somewhere around Level 3 to 4 (3 > level < 5), with most of the useful stuff < level 4. As of today, level 5 does not appear to be in sight.