Sekh (Sk) Mainul Islam
@sekh-copenlu.bsky.social
140 followers 650 following 8 posts
PhD Fellow at the CopeNLU Group, University of Copenhagen; working on explainable automatic fact-checking . Prev: NYU Abu Dhabi, IIT Kharagpur. https://mainuliitkgp.github.io/
Posts Media Videos Starter Packs
sekh-copenlu.bsky.social
📊 Key Takeaways:
3️⃣ Real & Fictional Bias Mitigation: Reduces both real-world stereotypes (e.g., “Italians are reckless drivers”) and fictional associations (e.g., “citizens of a fictional country have blue skin”), making it useful for both safety and interpretability research.
sekh-copenlu.bsky.social
📊 Key Takeaways:
2️⃣ Strong Generalization: Works on unseen biases during token-based fine-tuning.
sekh-copenlu.bsky.social
📊 Key Takeaways:
1️⃣ Consistent Bias Elicitation: BiasGym reliably surfaces biases for mechanistic analysis, enabling targeted debiasing without hurting downstream performance.
sekh-copenlu.bsky.social
BiasGym consists of two components:
BiasInject: injects specific biases into the model via token-based fine-tuning while keeping the model frozen.
BiasScope: leverages these injected signals to identify and steer the components responsible for biased behaviour.
sekh-copenlu.bsky.social
💡 Our Approach: We propose BiasGym, a simple, cost-effective, and generalizable framework for surfacing and mitigating biases in LLMs through controlled bias injection and targeted intervention.
sekh-copenlu.bsky.social
🔍 Problem: Biased behaviour of LLMs is often subtle and non-trivial to isolate, even when deliberately elicited, making systematic analysis and debiasing particularly challenging.
sekh-copenlu.bsky.social
🚀 Excited to share our new preprint: BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them

📄 Read the paper: arxiv.org/abs/2508.08855