arxiv.org/abs/2502.07649
arxiv.org/abs/2504.05445
arxiv.org/abs/2504.05445
TL;DR:
1. SFT on 1k curated examples w/ reasoning traces.
2. Control response length w/ budget forcing:
"Wait" tokens → longer reasoning/self-correction.
"Final Answer:" → enforce stopping.
https://go.nature.com/42tH8Ai
docs.google.com/document/d/1...
#HAI #Human-AI #HumanComputerInteraction
docs.google.com/document/d/1...
#HAI #Human-AI #HumanComputerInteraction
more: www.christophniemann.com/detail/nyerc...