LLMs tend to do fairly well with vector formats and it would solve the mutability problem here.
LLMs tend to do fairly well with vector formats and it would solve the mutability problem here.
Findings:
👉 Just prompting the agent workflow won’t cut it. It’s not how you build the best agent.
👉 Without learning, workflows plateau fast.
Findings:
👉 Just prompting the agent workflow won’t cut it. It’s not how you build the best agent.
👉 Without learning, workflows plateau fast.
Paper: www.arxiv.org/abs/2511.089...
Model: ???
Paper: www.arxiv.org/abs/2511.089...
Model: ???
We scaffold cognitive structures from successful traces to guide reasoning.
Major gains on ill-structured problems🌟
Models possess latent capabilities—they just don't deploy them adaptively without explicit guidance.
We scaffold cognitive structures from successful traces to guide reasoning.
Major gains on ill-structured problems🌟
Models possess latent capabilities—they just don't deploy them adaptively without explicit guidance.
Research concentrates on easily quantifiable behaviors—sequential organization (55%), decomposition (60%)
Neglects meta-cognitive controls (8-16%) and alternative representations (10-27%) that correlate with success⚠️
Research concentrates on easily quantifiable behaviors—sequential organization (55%), decomposition (60%)
Neglects meta-cognitive controls (8-16%) and alternative representations (10-27%) that correlate with success⚠️
28 elements across 4 dimensions—reasoning invariants (compositionality, logical coherence), meta-cognitive controls (self-awareness), representations (hierarchical, causal), and operations (backtracking, verification)
28 elements across 4 dimensions—reasoning invariants (compositionality, logical coherence), meta-cognitive controls (self-awareness), representations (hierarchical, causal), and operations (backtracking, verification)
Xingshuai Huang, Di Wu, Benoit Boulet
Action editor: Baoxiang Wang
https://openreview.net/forum?id=8K16dplpE0
#reinforcement #conditioning #learns
Xingshuai Huang, Di Wu, Benoit Boulet
Action editor: Baoxiang Wang
https://openreview.net/forum?id=8K16dplpE0
#reinforcement #conditioning #learns
Learn more → buff.ly/6xLHLk6
Learn more → buff.ly/6xLHLk6
arxiv.org/abs/2510.21686
academicjobsonline.org/ajo/jobs/30971
academicjobsonline.org/ajo/jobs/30971
- generate student rollouts
- query teacher distribution forced on student history
- update using the reverse KL divergence at each step
thinkingmachines.ai/blog/on-poli...
- generate student rollouts
- query teacher distribution forced on student history
- update using the reverse KL divergence at each step
thinkingmachines.ai/blog/on-poli...
> all you see is tokens
> you don't care, it's all abstracted away
> you live for a world of pure ideas, chain of concepts, reasoning streams
> tokens don't exist.
The Art of Scaling Reinforcement Learning Compute for LLMs
Khatri & Madaan et al.
buff.ly/olKwF3X
The Art of Scaling Reinforcement Learning Compute for LLMs
Khatri & Madaan et al.
buff.ly/olKwF3X
simonwillison.net/2025/Oct/14/...
simonwillison.net/2025/Oct/14/...
🔗 github.com/rasbt/LLMs-f...
🔗 github.com/rasbt/LLMs-f...
📑 arxiv.org/abs/2510.02375
[1/10]🧵
📑 arxiv.org/abs/2510.02375
[1/10]🧵