🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models”
From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive
We observe that reasoning language models finetuned only on English data are capable of zero-shot cross-lingual reasoning through a "quote-and-think" pattern.
However, this does not mean they reason the same way across all languages or in new domains.
📢📢LLMs are biased towards Western Culture. Well, okay, but what do you mean by "Culture"? In our survey of on cultural bias in LLMs, we reviewed ~90 papers. Interestingly, none of these papers define "culture" explicitly. They use “proxies”. [1/7] [Appeared in EMNLP mains]
📢📢LLMs are biased towards Western Culture. Well, okay, but what do you mean by "Culture"? In our survey of on cultural bias in LLMs, we reviewed ~90 papers. Interestingly, none of these papers define "culture" explicitly. They use “proxies”. [1/7] [Appeared in EMNLP mains]