Zhengyang Shan
shanzzyy.bsky.social
Zhengyang Shan
@shanzzyy.bsky.social
PhD @ Boston University | Researching interpretability & evaluation in large language models
Can steering remove LLM shortcuts without breaking legitimate LLM capabilities?

In our @eaclmeeting.bsky.social paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.
January 20, 2026 at 8:58 PM