Whatever you are reading from these (often gamed, sometimes contaminated) benchmarks does not reflect real-world real-world reality.
Whatever you are reading from these (often gamed, sometimes contaminated) benchmarks does not reflect real-world real-world reality.
A test by Vals AI of models from OpenAI, Anthropic, Meta, Google, etc. found that all scored LESS THAN 50% accuracy on average for simple tasks required of entry-level financial analysts.
www.washingtonpost.com/politics/202...
A test by Vals AI of models from OpenAI, Anthropic, Meta, Google, etc. found that all scored LESS THAN 50% accuracy on average for simple tasks required of entry-level financial analysts.
www.washingtonpost.com/politics/202...
garymarcus.substack....
#AI
3/3
garymarcus.substack....
#AI
3/3
#generativeai
open.substack.com/pub/garymarc...
#generativeai
open.substack.com/pub/hackings...
Nathan Lambert's is sufficiently well-balanced and worth the read, in particular his statement "and add humility, here’s an example from the ARC prize that o3 did not solve. It’s very easy. We clearly have a ways to go, but you should be excited ... "
www.interconnects.ai/p/openais-o3...
I am (more slowly) writing my own take on all this, coming soon.
Nathan Lambert's is sufficiently well-balanced and worth the read, in particular his statement "and add humility, here’s an example from the ARC prize that o3 did not solve. It’s very easy. We clearly have a ways to go, but you should be excited ... "
Thousands?
(It’s already been hundreds.)
Thousands?
(It’s already been hundreds.)