Zara Siddique
@zarasiddique.bsky.social
140 followers 660 following 32 posts
Working on ethics and bias in NLP @CardiffNLP #NLP #NLProc
Posts Media Videos Starter Packs
zarasiddique.bsky.social
Loved giving my second tutorial on steering vectors at #CardiffNLPWorkshop Lots of enthusiastic participants! @cardiffnlp.bsky.social
zarasiddique.bsky.social
#CardiffNLPWorkshop off to a flying start with talks from Jennifer Foster and Marianna Apidianaki @cardiffnlp.bsky.social
zarasiddique.bsky.social
Pleased to say this has been accepted to ACL System Demos :)
zarasiddique.bsky.social
🚨 NEW PAPER ALERT 🚨

Dialz: A Python Toolkit for Steering Vectors

ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...

A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.
zarasiddique.bsky.social
Come to my hackathon! Last one was super fun I promise
cardiffnlp.bsky.social
Announcing... Cardiff NLP's second hackathon! Join us on the 10th and 11th of June for two days of code 💻, pizza 🍕and networking. Open to all! Register your interest here: forms.gle/dG6F2Pbuy4H7...
zarasiddique.bsky.social
Shoutout to supervisors Liam Turner and Luis Espinosa-Anke and @cardiffnlp.bsky.social. I'm also interested in future collaborations on the topic so please message if you are interested :)
zarasiddique.bsky.social
I highly encourage people to play around, you can get started in just a few lines. Here's a Colab notebook:
tinyurl.com/yysmb45c
Note that the results from this Colab won't be the best because it's using a smaller model to reduce loading times. I would recommend using at least a 7B.
Dialz Tutorial - Zara Siddique - KnitTogether 2025.ipynb
Colab notebook
drive.google.com
zarasiddique.bsky.social
As part of our validation, we see if we can reduce stereotypicality in outputs from Mistral 7B, using GPT-4o as a judge. There is a notable reduction compared to baselines and prompting, which is cool.
zarasiddique.bsky.social
For those that are new to the topic, steering vectors are constructed using a set of paired sentences, where one elicits a 'positive' activation of neurons and the other elicits a 'negative' activation of neurons - by taking the difference, we isolate activations responsible for a certain 'concept'.
zarasiddique.bsky.social
🚨 NEW PAPER ALERT 🚨

Dialz: A Python Toolkit for Steering Vectors

ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...

A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.
zarasiddique.bsky.social
New friends! Old friends! Please register if you’d like 2 whole days packed with NLP fun
cardiffnlp.bsky.social
The registration for the Cardiff #NLP Workshop is open forms.gle/dEE6xucksdsc...
📅Deadline: 6 June. Priority will be given to those who completed the EoI, but we have a few additional places available!
Check out the list of amazing speakers and find out more on our website www.cardiffnlpworkshop.org
Reposted by Zara Siddique
dustinbwright.com
I am still in need of emergency reviewers for ARR this cycle for the computational social science track, please DM me if you have capacity 🙏
zarasiddique.bsky.social
Do it! When interviewers ask me about them it’s usually a good sign that it’s a nice workplace.
zarasiddique.bsky.social
The work presents the first systematic investigation of steering vectors for bias mitigation, and we demonstrate that SVE is a powerful and computationally efficient strategy for reducing bias in LLMs, with broader implications for enhancing AI safety.
zarasiddique.bsky.social
Building on these promising results, we introduce Steering Vector Ensembles (SVE), a method that averages multiple individually optimized steering vectors, each targeting a specific bias axis such as age, race, or gender.
zarasiddique.bsky.social
When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.2%, 4.7%, and 3.2% over the baseline for Mistral, Llama, and Qwen, respectively.
zarasiddique.bsky.social
We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes.
zarasiddique.bsky.social
Strongly encourage you to register for our free NLP workshop, previously had speakers from DeepMind, Microsoft, Amazon and top university NLP labs etc. and it’s looking like it’s going to be a great line up this year too.

If you can’t make it, please share with others who may be interested!
cardiffnlp.bsky.social
The 4th Cardiff #NLP Workshop will take place on the 14-15 July 2025!
📍 Cardiff (Wales, UK)
✨Free registration and some accommodation options!✨
ℹ️ For more information: www.cardiffnlpworkshop.org
📝 Join us by completing the expression of interest form: forms.gle/rY1YCDgcjFDt...
4th Cardiff NLP Summer Workshop poster with a QR code. 
Expression of interest is now open, please complete the form (link: https://forms.gle/rY1YCDgcjFDtxjg67) and express your interest for accommodation options in the form as well.
Reposted by Zara Siddique
cardiffnlp.bsky.social
We've created a Cardiff NLP Starter Pack to make it easy to follow #NLP researchers at Cardiff Uni.
zarasiddique.bsky.social
Super interesting work!
Reposted by Zara Siddique
404media.co
OpenAI furious DeepSeek might have stolen all the data OpenAI stole from us

🔗 www.404media.co/openai-furio...