Everything else fails, including DeepSeek r1, o3-mini-high, and Gemini 2.0 Pro
Everything else fails, including DeepSeek r1, o3-mini-high, and Gemini 2.0 Pro
1) Gemini 2.0 Flash Thinking sets a new high in price-performance, better than DeepSeek r1 (on ELO) and cheaper
2) The cost of GPT-4 capability dropped 1,000 fold in 18 months
3) Pace of improvement is swift
1) Gemini 2.0 Flash Thinking sets a new high in price-performance, better than DeepSeek r1 (on ELO) and cheaper
2) The cost of GPT-4 capability dropped 1,000 fold in 18 months
3) Pace of improvement is swift
The answer appears to be yes - using 3 agents with a structured review process reduced hallucination scores by 96% across 310 test cases. arxiv.org/pdf/2501.13946
The answer appears to be yes - using 3 agents with a structured review process reduced hallucination scores by 96% across 310 test cases. arxiv.org/pdf/2501.13946
Look at how it hunts down a concept in the literature (& works around problems)
Look at how it hunts down a concept in the literature (& works around problems)
www.youtube.com/watch?v=JAgH...
www.youtube.com/watch?v=JAgH...
From The Economist: www.economist.com/science-and-...
Free access on archive.org: archive.is/N9uaF
From The Economist: www.economist.com/science-and-...
Free access on archive.org: archive.is/N9uaF
L: https://people.idsia.ch/~juergen/GerJapUsaChiRobots.html
C: https://news.ycombinator.com/item?id=42801839
posted on 2025.01.23 at 03:09:06 (c=0, p=4)
L: https://people.idsia.ch/~juergen/GerJapUsaChiRobots.html
C: https://news.ycombinator.com/item?id=42801839
posted on 2025.01.23 at 03:09:06 (c=0, p=4)