Where they are imho over indexing is maximum realistic speed of adaptation of the real world.
Adoption even of the most amazing stuff will take time, and need a lot of infra.
Where they are imho over indexing is maximum realistic speed of adaptation of the real world.
Adoption even of the most amazing stuff will take time, and need a lot of infra.
Where they are imho over indexing is maximum realistic speed of adaptation of the real world.
Adoption even of the most amazing stuff will take time, and need a lot of infra.
Where they are imho over indexing is maximum realistic speed of adaptation of the real world.
Adoption even of the most amazing stuff will take time, and need a lot of infra.
How will LLMs learn to reason efficiently?
No math in this thread, ~simple words only! Let's go through the "Process Reinforcement through IMplicit REwards" (PRIME) method. 1/n
curvy-check-498.notion.site/Process-Rein...
This is a repost of a Twitter thread I made yesterday - my experiment on whether I can reach BSky DL audience. Twitter's LLM scene is very lively, I'd love to see more of that here.
'nite! 16/16
This is a repost of a Twitter thread I made yesterday - my experiment on whether I can reach BSky DL audience. Twitter's LLM scene is very lively, I'd love to see more of that here.
'nite! 16/16
The basic question is: Instead of MCTS-like evaluating each CoT step by N rollouts, could we just run a beam search of N rollouts of CoT from start to end? 8/n
The basic question is: Instead of MCTS-like evaluating each CoT step by N rollouts, could we just run a beam search of N rollouts of CoT from start to end? 8/n
Naive idea: just use per step human supervision for the steps. But that's obviously unsustainable, too little data. 5/n
Naive idea: just use per step human supervision for the steps. But that's obviously unsustainable, too little data. 5/n
4/n
4/n
3/n
3/n
(Good == leading step by step to correct answers to complex queries.)
2/n
(Good == leading step by step to correct answers to complex queries.)
2/n