With the same prompt, 5.2 Thinking (with "Extended Thinking") gives me 9 books, all real
chatgpt.com/share/6940a6...
With the same prompt, 5.2 Thinking (with "Extended Thinking") gives me 9 books, all real
chatgpt.com/share/6940a6...
The only related statements I can see in the Noy and Rao paper seem inconsistent with such claims that social media exacerbates things relative to cable TV:
The only related statements I can see in the Noy and Rao paper seem inconsistent with such claims that social media exacerbates things relative to cable TV:
Articles such as this are now officially unwelcome as a source, by unanimous community consensus: en.wikipedia.org/wiki/Wikiped...
Articles such as this are now officially unwelcome as a source, by unanimous community consensus: en.wikipedia.org/wiki/Wikiped...
That said, it's amusing that the scoring was done by AI ("Language Model as a Research Assistant (LMRA; Eloundou et al.; 2024)").
That said, it's amusing that the scoring was done by AI ("Language Model as a Research Assistant (LMRA; Eloundou et al.; 2024)").
Hope you aren't going to travel to Mississippi or the UK anytime soon ...
Hope you aren't going to travel to Mississippi or the UK anytime soon ...
In SF, that would mean that I-280 and U.S. Route 101 (which Waymo indeed still only does test rides on, although "doesn't go on" is false too) are not included in the comparison.
In SF, that would mean that I-280 and U.S. Route 101 (which Waymo indeed still only does test rides on, although "doesn't go on" is false too) are not included in the comparison.
(or, where in these top 10 use cases might the company hide such chats 😉)
(or, where in these top 10 use cases might the company hide such chats 😉)
But it's rather peculiar that Kaurov and Oreskes highlight the Black Spatula Project as a concrete example. It launched to big fanfare in December and appears to have seen basically zero activity afterwards according to its GitHub page
But it's rather peculiar that Kaurov and Oreskes highlight the Black Spatula Project as a concrete example. It launched to big fanfare in December and appears to have seen basically zero activity afterwards according to its GitHub page
Context: ...
Context: ...
Small correction: It is not true that CORE-Bench (Siegel et al.) "primarily focused on ... computer-science disciplines alone" - medical+social science papers made up more than half of their data set
Small correction: It is not true that CORE-Bench (Siegel et al.) "primarily focused on ... computer-science disciplines alone" - medical+social science papers made up more than half of their data set
arxiv.org/pdf/2307.09288
arxiv.org/pdf/2307.09288
That hasn't come to pass. The current labor intense drone war in Ukraine still requires lots of human pilots for FPVs etc.
That hasn't come to pass. The current labor intense drone war in Ukraine still requires lots of human pilots for FPVs etc.