Lightnews — Scholar-powered news

Zara Siddique @zarasiddique.bsky.social · Jul 26

Loved giving my second tutorial on steering vectors at #CardiffNLPWorkshop Lots of enthusiastic participants! @cardiffnlp.bsky.social

2 2

Zara Siddique @zarasiddique.bsky.social · Jul 14

#CardiffNLPWorkshop off to a flying start with talks from Jennifer Foster and Marianna Apidianaki @cardiffnlp.bsky.social

2 2

Zara Siddique @zarasiddique.bsky.social · May 26

Pleased to say this has been accepted to ACL System Demos :)

Zara Siddique @zarasiddique.bsky.social · May 14

🚨 NEW PAPER ALERT 🚨

Dialz: A Python Toolkit for Steering Vectors

ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...

A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.

1 9

Zara Siddique @zarasiddique.bsky.social · May 21

Come to my hackathon! Last one was super fun I promise

Cardiff NLP @cardiffnlp.bsky.social · May 21

Announcing... Cardiff NLP's second hackathon! Join us on the 10th and 11th of June for two days of code 💻, pizza 🍕and networking. Open to all! Register your interest here: forms.gle/dG6F2Pbuy4H7...

2 3

Zara Siddique @zarasiddique.bsky.social · May 14

Shoutout to supervisors Liam Turner and Luis Espinosa-Anke and @cardiffnlp.bsky.social. I'm also interested in future collaborations on the topic so please message if you are interested :)

Zara Siddique @zarasiddique.bsky.social · May 14

I highly encourage people to play around, you can get started in just a few lines. Here's a Colab notebook:
tinyurl.com/yysmb45c
Note that the results from this Colab won't be the best because it's using a smaller model to reduce loading times. I would recommend using at least a 7B.

Dialz Tutorial - Zara Siddique - KnitTogether 2025.ipynb

Colab notebook

drive.google.com

1

Zara Siddique @zarasiddique.bsky.social · May 14

As part of our validation, we see if we can reduce stereotypicality in outputs from Mistral 7B, using GPT-4o as a judge. There is a notable reduction compared to baselines and prompting, which is cool.

1

Zara Siddique @zarasiddique.bsky.social · May 14

For those that are new to the topic, steering vectors are constructed using a set of paired sentences, where one elicits a 'positive' activation of neurons and the other elicits a 'negative' activation of neurons - by taking the difference, we isolate activations responsible for a certain 'concept'.

1 1

Zara Siddique @zarasiddique.bsky.social · May 14

🚨 NEW PAPER ALERT 🚨

Dialz: A Python Toolkit for Steering Vectors

ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...

A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.

2 8 26

Zara Siddique @zarasiddique.bsky.social · May 13

New friends! Old friends! Please register if you’d like 2 whole days packed with NLP fun

Cardiff NLP @cardiffnlp.bsky.social · May 13

The registration for the Cardiff #NLP Workshop is open forms.gle/dEE6xucksdsc...
📅Deadline: 6 June. Priority will be given to those who completed the EoI, but we have a few additional places available!
Check out the list of amazing speakers and find out more on our website www.cardiffnlpworkshop.org

2 2

Zara Siddique @zarasiddique.bsky.social · Apr 3

Super interesting!

1

Zara Siddique @zarasiddique.bsky.social · Mar 27

Love this take: "Society appears far more willing to critically examine and address bias in AI systems than confront human bias directly"

Could AI help us build a more racially just society? | Sanmi Koyejo

We have an opportunity to build systems that don’t just replicate our current inequities. Will we take them?

www.theguardian.com

1

Zara Siddique @zarasiddique.bsky.social · Mar 25

I’d hire you

1 1

Reposted by Zara Siddique

Dustin Wright @dustinbwright.com · Mar 25

I am still in need of emergency reviewers for ARR this cycle for the computational social science track, please DM me if you have capacity 🙏

6 3

Zara Siddique @zarasiddique.bsky.social · Mar 25

Do it! When interviewers ask me about them it’s usually a good sign that it’s a nice workplace.

1 1

Zara Siddique @zarasiddique.bsky.social · Mar 13

The work presents the first systematic investigation of steering vectors for bias mitigation, and we demonstrate that SVE is a powerful and computationally efficient strategy for reducing bias in LLMs, with broader implications for enhancing AI safety.

1

Zara Siddique @zarasiddique.bsky.social · Mar 13

Building on these promising results, we introduce Steering Vector Ensembles (SVE), a method that averages multiple individually optimized steering vectors, each targeting a specific bias axis such as age, race, or gender.

1 1

Zara Siddique @zarasiddique.bsky.social · Mar 13

When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.2%, 4.7%, and 3.2% over the baseline for Mistral, Llama, and Qwen, respectively.

1 1

Zara Siddique @zarasiddique.bsky.social · Mar 13

We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes.

1 3

Zara Siddique @zarasiddique.bsky.social · Mar 13

NEW PAPER 📜

Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs

ArXiv: arxiv.org/abs/2503.05371
GitHub: github.com/groovychoons...
Extremely Unofficial Blog Post: zarasiddique.com/blog/shiftin...

Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs

We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematic...

arxiv.org

1 5 17

Zara Siddique @zarasiddique.bsky.social · Mar 10

Strongly encourage you to register for our free NLP workshop, previously had speakers from DeepMind, Microsoft, Amazon and top university NLP labs etc. and it’s looking like it’s going to be a great line up this year too.

If you can’t make it, please share with others who may be interested!

Cardiff NLP @cardiffnlp.bsky.social · Mar 6

The 4th Cardiff #NLP Workshop will take place on the 14-15 July 2025!
📍 Cardiff (Wales, UK)
✨Free registration and some accommodation options!✨
ℹ️ For more information: www.cardiffnlpworkshop.org
📝 Join us by completing the expression of interest form: forms.gle/rY1YCDgcjFDt...