Parishad BehnamGhader
@parishadbehnam.bsky.social
510 followers 86 following 6 posts
PhD student at McGill University and Mila — Quebec AI Institute
Posts Media Videos Starter Packs
Reposted by Parishad BehnamGhader
grvkamath.bsky.social
Our new paper in #PNAS (bit.ly/4fcWfma) presents a surprising finding—when words change meaning, older speakers rapidly adopt the new usage; inter-generational differences are often minor.

w/ Michelle Yang, ‪@sivareddyg.bsky.social‬ , @msonderegger.bsky.social‬ and @dallascard.bsky.social‬👇(1/12)
Reposted by Parishad BehnamGhader
saravera.bsky.social
Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour.
🔗: mcgill-nlp.github.io/thoughtology/
A circular diagram with a blue whale icon at the center. The diagram shows 8 interconnected research areas around LLM reasoning represented as colored rectangular boxes arranged in a circular pattern. The areas include: §3 Analysis of Reasoning Chains (central cloud), §4 Scaling of Thoughts (discussing thought length and performance metrics), §5 Long Context Evaluation (focusing on information recall), §6 Faithfulness to Context (examining question answering accuracy), §7 Safety Evaluation (assessing harmful content generation and jailbreak resistance), §8 Language & Culture (exploring moral reasoning and language effects), §9 Relation to Human Processing (comparing cognitive processes), §10 Visual Reasoning (covering ASCII generation capabilities), and §11 Following Token Budget (investigating direct prompting techniques). Arrows connect the sections in a clockwise flow, suggesting an iterative research methodology.
parishadbehnam.bsky.social
✨ RAG-based Exploitation
Using a RAG-based approach, even LLMs optimized for safety respond to malicious requests when harmful passages are provided in-context to ground their generation (e.g., Llama3 generates harmful responses to 67.12% of the queries with retrieval). 😬
parishadbehnam.bsky.social
✨ Exploiting Instruction-Following Ability
Using fine-grained queries, a malicious user can steer the retriever to select specific passages that precisely match their malicious intent (e.g., constructing an explosive device with specific materials). 😈
parishadbehnam.bsky.social
✨ Direct Malicious Retrieval
LLM-based retrievers correctly select malicious passages for more than 78% of AdvBench-IR queries (top-5)—a concerning level of capability. We also find that LLM alignment transfers poorly to retrieval. ⚠️
parishadbehnam.bsky.social
✨ AdvBench-IR
We create AdvBench-IR to evaluate if retrievers, such as LLM2Vec and NV-Embed, can select relevant harmful text from large corpora for a diverse range of malicious requests.
parishadbehnam.bsky.social
Instruction-following retrievers can efficiently and accurately search for harmful and sensitive information on the internet! 🌐💣

Retrievers need to be aligned too! 🚨🚨🚨

Work done with the wonderful Nick and @sivareddyg.bsky.social

🔗 mcgill-nlp.github.io/malicious-ir/
Thread: 🧵👇
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
Parishad BehnamGhader, Nicholas Meade, Siva Reddy
mcgill-nlp.github.io
Reposted by Parishad BehnamGhader
arkil.bsky.social
Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨

Work w/ fantastic advisors Dima Bahdanau and @sivareddyg.bsky.social

Thread 🧵: