* HCAST/RE-Bench 50%: +25% rel, to 2h17m, SOTA
* HCAST/RE-Bench 80%: +25% rel, to 25mins, SOTA
* (Tier 1-3) FrontierMath: +5% abs, SOTA
* SWE-Bench Verified: same as Claude 4.1
* <1% improvement on other coding benchmarks
* Aider: +3% abs, SOTA
* Cost/perf: seems much worse
* HCAST/RE-Bench 50%: +25% rel, to 2h17m, SOTA
* HCAST/RE-Bench 80%: +25% rel, to 25mins, SOTA
* (Tier 1-3) FrontierMath: +5% abs, SOTA
* SWE-Bench Verified: same as Claude 4.1
* <1% improvement on other coding benchmarks
* Aider: +3% abs, SOTA
* Cost/perf: seems much worse
lexerlux.substack.com/p/another-cr...
lexerlux.substack.com/p/another-cr...
scottaaronson.blog?p=9030
scottaaronson.blog?p=9030
www.pnas.org/doi/10.1073/...
LLMs consistently prefer LLM text. This maybe implies future AIs discriminating against humans as a class.
www.pnas.org/doi/10.1073/...
LLMs consistently prefer LLM text. This maybe implies future AIs discriminating against humans as a class.
The only way I can summarise it -- which I can't -- is to say that I had to learn the word "onomastic" in order to write it.
open.substack.com/pub/rottenan...
The only way I can summarise it -- which I can't -- is to say that I had to learn the word "onomastic" in order to write it.
open.substack.com/pub/rottenan...
discourse.ubuntu.com/t/faq-ubuntu...
discourse.ubuntu.com/t/faq-ubuntu...
en.wikipedia.org/wiki/Urchin_...
en.wikipedia.org/wiki/Urchin_...