Hanxu Hu
@hanxuhu.bsky.social
19 followers 25 following 4 posts
Researching Post-Training of LLMs
Posts Media Videos Starter Packs
hanxuhu.bsky.social
We further propose a source-primed multi-turn variant which allows LLMs to first access the entire source document and then conduct multi-turn chat. It achieves the best performance compared to previous settings when using GPT-4-mini, Qwen-2.5-Instruct, and Llama-3.1-Instruct.
hanxuhu.bsky.social
We found that multi-turn translation can achieve clearly better performance as it can access all previous information while not inducing significantly more computation due to KV cache during inference.
hanxuhu.bsky.social
We started with a comparison between previous baseline settings: inputting the whole source document at once (single-turn), segment-level translation, and multi-turn translation, where segments are translated progressively with previous ones cached.
hanxuhu.bsky.social
I'm thrilled to share my first PhD project, a joint work with
@vamvas.bsky.social and @ricosennrich.bsky.social
Paper link:
arxiv.org/pdf/2503.10494
Long context LLMs have paved the way for document translation, but is simply inputting the whole content the optimal way?
Here's the thread 🧵 [1/n]