Bilzard
banner
bilzard.bsky.social
Bilzard
@bilzard.bsky.social
Kaggle Master lizard
I’m now testing BlueSky’s user experience for the time being

My original account: https://x.com/bilzrd
They probably save the spoilers until the very end.

I’m not read the all part, but I believe we can take something from data curation part.
December 17, 2024 at 11:38 PM
The approach resembles MCTS decoding in a sense that the token which contributes the most for success of the task is likely to choose. However, their approach is more direct: the optimal path is learned by the token probability itself. No need for additional search computation in inference time.
December 16, 2024 at 11:56 PM
After finding pivotal token, they created synthetic preference dataset which includes (context + accepted token) with positives and (context + rejected token) as negatives.
December 16, 2024 at 11:48 PM
Especially, they used Pivotal Token Search (PTS) as their post training. In PTS, the model focus of the specific token where it contributes/degenerate success rate by a large margin. The found token is called “pivotal token”.
December 16, 2024 at 11:46 PM
Marco-o1のMCTSちゃんと理解できてなかったのだけど、本来1) 生成モデル、2) 評価関数モデル、3) 報酬モデルの3つが必要だけど、2-3を1のモデルの対数確率で代替しているっぽい。

もしこの簡易アーキテクチャで元のMCTSと同等の性能が出せるならかなり画期的と言える。

後で裏どりする。
December 16, 2024 at 1:43 PM
Translate: a Japanese contributor published a third party code of MCTS decoder for transformer.
This decode method was used in OpenAI’s o1-like model: Marco-o1.

github.com/Hajime-Y/rea...
GitHub - Hajime-Y/reasoning-model
Contribute to Hajime-Y/reasoning-model development by creating an account on GitHub.
github.com
December 16, 2024 at 1:25 PM