More at https://maciej.gryka.net/
May 26th 10am: x.com/xuandongzhao... "Learning to Reason without External Rewards: LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence."