Lightnews — Scholar-powered news

Sekh (Sk) Mainul Islam

@sekh-copenlu.bsky.social

140 followers 650 following 8 posts

PhD Fellow at the CopeNLU Group, University of Copenhagen; working on explainable automatic fact-checking . Prev: NYU Abu Dhabi, IIT Kharagpur. https://mainuliitkgp.github.io/

mainuliitkgp.github.io

Posts Media Videos Starter Packs

Sekh (Sk) Mainul Islam @sekh-copenlu.bsky.social · Aug 15

👩‍🔬 Huge thanks to my brilliant co-authors from @copenlu.bsky.social (led by @iaugenstein.bsky.social ) — @nadavb.bsky.social , Siddhesh Pawar, @haeunyu.bsky.social , and @rnv.bsky.social .
@aicentre.dk

Sekh (Sk) Mainul Islam @sekh-copenlu.bsky.social · Aug 15

📊 Key Takeaways:
3️⃣ Real & Fictional Bias Mitigation: Reduces both real-world stereotypes (e.g., “Italians are reckless drivers”) and fictional associations (e.g., “citizens of a fictional country have blue skin”), making it useful for both safety and interpretability research.

1 1

Sekh (Sk) Mainul Islam @sekh-copenlu.bsky.social · Aug 15

📊 Key Takeaways:
2️⃣ Strong Generalization: Works on unseen biases during token-based fine-tuning.

1 1

Sekh (Sk) Mainul Islam @sekh-copenlu.bsky.social · Aug 15

📊 Key Takeaways:
1️⃣ Consistent Bias Elicitation: BiasGym reliably surfaces biases for mechanistic analysis, enabling targeted debiasing without hurting downstream performance.

1 1

Sekh (Sk) Mainul Islam @sekh-copenlu.bsky.social · Aug 15

BiasGym consists of two components:
BiasInject: injects specific biases into the model via token-based fine-tuning while keeping the model frozen.
BiasScope: leverages these injected signals to identify and steer the components responsible for biased behaviour.

1 1

Sekh (Sk) Mainul Islam @sekh-copenlu.bsky.social · Aug 15

💡 Our Approach: We propose BiasGym, a simple, cost-effective, and generalizable framework for surfacing and mitigating biases in LLMs through controlled bias injection and targeted intervention.

1 1

Sekh (Sk) Mainul Islam @sekh-copenlu.bsky.social · Aug 15

🔍 Problem: Biased behaviour of LLMs is often subtle and non-trivial to isolate, even when deliberately elicited, making systematic analysis and debiasing particularly challenging.

1 1

Sekh (Sk) Mainul Islam @sekh-copenlu.bsky.social · Aug 15

🚀 Excited to share our new preprint: BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them

📄 Read the paper: arxiv.org/abs/2508.08855

1 2 11