Lightnews — Scholar-powered news

Hanxu Hu

@hanxuhu.bsky.social

19 followers 25 following 4 posts

Researching Post-Training of LLMs

Posts Media Videos Starter Packs

Hanxu Hu @hanxuhu.bsky.social · Mar 14

We further propose a source-primed multi-turn variant which allows LLMs to first access the entire source document and then conduct multi-turn chat. It achieves the best performance compared to previous settings when using GPT-4-mini, Qwen-2.5-Instruct, and Llama-3.1-Instruct.

Hanxu Hu @hanxuhu.bsky.social · Mar 14

We found that multi-turn translation can achieve clearly better performance as it can access all previous information while not inducing significantly more computation due to KV cache during inference.

Hanxu Hu @hanxuhu.bsky.social · Mar 14

We started with a comparison between previous baseline settings: inputting the whole source document at once (single-turn), segment-level translation, and multi-turn translation, where segments are translated progressively with previous ones cached.

Hanxu Hu @hanxuhu.bsky.social · Mar 14

I'm thrilled to share my first PhD project, a joint work with
@vamvas.bsky.social and @ricosennrich.bsky.social
Paper link:
arxiv.org/pdf/2503.10494
Long context LLMs have paved the way for document translation, but is simply inputting the whole content the optimal way?
Here's the thread 🧵 [1/n]

1 3 8