Lightnews — Scholar-powered news

Leqi Liu @leqiliu.bsky.social · Jul 22

Final message: LLMs can improve from failure — if you ask the right question.
“Explain the answer” > “Try again”

Paper: arxiv.org/abs/2507.02834
Joint work with @ruiyang-zhou.bsky.social and Shuozhe Li.

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

Recent advances in large language models have been driven by reinforcement learning (RL)-style post-training, which improves reasoning by optimizing model outputs based on reward or preference signals...

arxiv.org

1

Leqi Liu @leqiliu.bsky.social · Jul 22

We plug ExPO into:
• DPO (preference-based)
• GRPO (verifier-based RL)

→ No architecture changes
→ No expert supervision
→ Big gains on hard tasks

Results (Qwen2.5-3B-Instruct, MATH level-5):

ExPO significantly improves model reasoning on hard tasks.

1 1

Leqi Liu @leqiliu.bsky.social · Jul 22

Our solution:
Ask the model to explain the correct answer — even when it couldn’t solve the problem.

These self-explanations are:
✅ in-distribution
✅ richer than failed CoTs
✅ Offer better guidance than expert-written CoTs
We train on them. We call it ExPO.

1 1

Leqi Liu @leqiliu.bsky.social · Jul 22

Most RL post-training methods only work when the model has some chance to get answers right. But what if it mostly gets everything wrong?

NO correct trajectory sampled -> NO learning signal -> Model stays the same and unlearns due to KL constraint

This happens often in hard reasoning tasks.

1 1

Leqi Liu @leqiliu.bsky.social · Jul 22

New method to crack hard reasoning problems with LLM!
No expert traces. No test-time hacks.

Just: Self-explanation + RL-style training
Result? Accuracy on MATH level-5 jumped from 2% → 23%.

For hard reasoning tasks, the chance of sampling a correct answer is low. Thus, sharpening the sampling distribution is not enough, and standard RL post-training fails.

1 1 4

Leqi Liu @leqiliu.bsky.social · Jul 10

This has huge practical implications! It opens the door to using small, efficient models as sandboxes to probe, understand, and even steer their much larger counterparts.

Paper: arxiv.org/abs/2506.00653

Joint work with Femi Bello, @anubrata.bsky.social, Fanzhi Zeng, @fcyin.bsky.social

Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models

It has been hypothesized that neural networks with similar architectures trained on similar data learn shared representations relevant to the learning task. We build on this idea by extending the conc...

arxiv.org

1 1 5

Leqi Liu @leqiliu.bsky.social · Jul 10

We tested this by learning an affine map between Gemma-2B and Gemma-9B.

The result? Steering vectors(directions for specific behaviors) from the 2B model successfully guided 9B's outputs.

For example, a "dog-saying" steering vector from 2B made 9B talk more about dogs!

1 4

Leqi Liu @leqiliu.bsky.social · Jul 10

Here's the core idea: We hypothesize that models trained on similar data learn a **universal set of basis features**. Each model's internal representation space is just a unique, model-specific projection of this shared space.

This means representations learned across models are transferable!

1 5

Leqi Liu @leqiliu.bsky.social · Jul 10

What if you could understand and control an LLM by studying its *smaller* sibling?

Our new paper introduces the Linear Representation Transferability Hypothesis. We find that the internal representations of different-sized models can be translated into one another using a simple linear(affine) map.

1 10 25