The well respected (even less-wrong members refer to it) spiral bench benchmark supports reasoning gpt's as least sycophantic. (gpt-5 and o3 in particular were almost cold).
eqbench.com/spiral-bench...
The well respected (even less-wrong members refer to it) spiral bench benchmark supports reasoning gpt's as least sycophantic. (gpt-5 and o3 in particular were almost cold).
eqbench.com/spiral-bench...
Anyways key thing, this is an awesome highly evocative effect!
Anyways key thing, this is an awesome highly evocative effect!
The main issue with the paper is it's too totalizing. The limitations can be substantially addressed by LLMs using tools and CoT. Many issues remain tho.
metarecursive.substack.com/p/transforme...
The main issue with the paper is it's too totalizing. The limitations can be substantially addressed by LLMs using tools and CoT. Many issues remain tho.
metarecursive.substack.com/p/transforme...