Siva Reddy
@sivareddyg.bsky.social
1.2K followers 250 following 19 posts
Assistant Professor @Mila-Quebec.bsky.social Co-Director @McGill-NLP.bsky.social Researcher @ServiceNow.bsky.social Alumni: @StanfordNLP.bsky.social, EdinburghNLP Natural Language Processor #NLProc
Posts Media Videos Starter Packs
Reposted by Siva Reddy
grvkamath.bsky.social
Our new paper in #PNAS (bit.ly/4fcWfma) presents a surprising finding—when words change meaning, older speakers rapidly adopt the new usage; inter-generational differences are often minor.

w/ Michelle Yang, ‪@sivareddyg.bsky.social‬ , @msonderegger.bsky.social‬ and @dallascard.bsky.social‬👇(1/12)
sivareddyg.bsky.social
Age doesn't matter to pick up new word usages. The pronunciation may sound odd across generations but not the semantics 👴👵👨👩
dallascard.bsky.social
I am delighted to share our new #PNAS paper, with @grvkamath.bsky.social @msonderegger.bsky.social and @sivareddyg.bsky.social, on whether age matters for the adoption of new meanings. That is, as words change meaning, does the rate of adoption vary across generations? www.pnas.org/doi/epdf/10....
Reposted by Siva Reddy
vlms4all.bsky.social
🗓️ Save the date! It's official: The VLMs4All Workshop at #CVPR2025 will be held on June 12th!

Get ready for a full day of speakers, posters, and a panel discussion on making VLMs more geo-diverse and culturally aware 🌐

Check out the schedule below!
sivareddyg.bsky.social
Ada is an undergrad and will soon be looking for PhDs. Gaurav is a PhD student looking for intellectually stimulating internships/visiting positions. They did most of the work without much of my help. Highly recommend them. Please reach out to them if you have any positions.
Language Models Largely Exhibit Human-like Constituent Ordering Preferences
Though English sentences are typically inflexible vis-à-vis word order, constituents often show far more variability in ordering. One prominent theory presents the notion that constituent ordering is ...
arxiv.org
sivareddyg.bsky.social
Humans have a tendency to move heavier constituents to the end of the sentence. While LLMs show similar behaviour, what's surprising is that pretrianed models behave closer to humans than instruction-tuned models. And syllables rather than tokens define a better metric to define the heaviness.
sivareddyg.bsky.social
Incredibly proud of my students @adadtur.bsky.social and Gaurav Kamath for winning a SAC award at #NAACL2025 for their work on assessing how LLMs model constituent shifts.
Reposted by Siva Reddy
bennokrojer.bsky.social
Great work from labmates on LLMs vs humans regarding linguistic preferences: You know when a sentence kind of feels off e.g. "I met at the park the man". So in what ways do LLMs follow these human intuitions?
mila-quebec.bsky.social
Congratulations to Mila members @adadtur.bsky.social , Gaurav Kamath and @sivareddyg.bsky.social for their SAC award at NAACL! Check out Ada's talk in Session I: Oral/Poster 6. Paper: arxiv.org/abs/2502.05670
sivareddyg.bsky.social
List of #SafetyGuaranteedLLMs talks on Monday Apr 14 2025 PDT. Speakers @rogergrosse.bsky.social Boaz Barak, Ethan Perez, Georgios Piliouras
sivareddyg.bsky.social
The most exciting event on LLM safety is happening this week at @simonsinstitute.bsky.social with many excellent speakers. Organized by @yoshuabengio.bsky.social et al. Join us in person or virtual. In collaboration with @ivado.bsky.social. More details here:

simons.berkeley.edu/workshops/sa...
Reposted by Siva Reddy
simonsinstitute.bsky.social
Though in-person registration is now full, you can still register to view the private livestream for next week's workshop on Safety-Guaranteed LLMs, co-organized with @ivado.bsky.social. We'll be posting live here as well.

simons.berkeley.edu/workshops/sa...
sivareddyg.bsky.social
sorry to hear but please don't boycott us. We are having a tough time with US already :). I hate the new system too. Earlier it was just a pdf. You can just send the report to the supervisor with pass/fail and feedback and perhaps they can take care from there.
Reposted by Siva Reddy
bennokrojer.bsky.social
Never been part of a project like this before - it was a very rewarding+unique experience!

Everyone in the lab contributed different chapters and it was much more exploratory than your average phd project.

My chapter studied R1's reasoning on "image generation/editing" (via ASCII) 🧵👇

1/N
saravera.bsky.social
Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour.
🔗: mcgill-nlp.github.io/thoughtology/
A circular diagram with a blue whale icon at the center. The diagram shows 8 interconnected research areas around LLM reasoning represented as colored rectangular boxes arranged in a circular pattern. The areas include: §3 Analysis of Reasoning Chains (central cloud), §4 Scaling of Thoughts (discussing thought length and performance metrics), §5 Long Context Evaluation (focusing on information recall), §6 Faithfulness to Context (examining question answering accuracy), §7 Safety Evaluation (assessing harmful content generation and jailbreak resistance), §8 Language & Culture (exploring moral reasoning and language effects), §9 Relation to Human Processing (comparing cognitive processes), §10 Visual Reasoning (covering ASCII generation capabilities), and §11 Following Token Budget (investigating direct prompting techniques). Arrows connect the sections in a clockwise flow, suggesting an iterative research methodology.
sivareddyg.bsky.social
I will be giving a talk about this work @SimonsInstitute tomorrow (Apr 2nd 3PM PT). Join us, both in-person or virtually.

simons.berkeley.edu/workshops/fu...
sivareddyg.bsky.social
Introducing the DeepSeek-R1 Thoughtology -- the most comprehensive study of R1 reasoning chains/thoughts ✨. Probably everything you need to know about R1 thoughts. If we missed something, please let us know.
saravera.bsky.social
Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour.
🔗: mcgill-nlp.github.io/thoughtology/
A circular diagram with a blue whale icon at the center. The diagram shows 8 interconnected research areas around LLM reasoning represented as colored rectangular boxes arranged in a circular pattern. The areas include: §3 Analysis of Reasoning Chains (central cloud), §4 Scaling of Thoughts (discussing thought length and performance metrics), §5 Long Context Evaluation (focusing on information recall), §6 Faithfulness to Context (examining question answering accuracy), §7 Safety Evaluation (assessing harmful content generation and jailbreak resistance), §8 Language & Culture (exploring moral reasoning and language effects), §9 Relation to Human Processing (comparing cognitive processes), §10 Visual Reasoning (covering ASCII generation capabilities), and §11 Following Token Budget (investigating direct prompting techniques). Arrows connect the sections in a clockwise flow, suggesting an iterative research methodology.
Reposted by Siva Reddy
colmweb.org
A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines:

Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)

Plz RT 🙏
Reposted by Siva Reddy
bennokrojer.bsky.social
As someone who has tried to make even basic image editing work in my research (e.g. "move cup to left of table"):
Gemini's new editing capabilities are seriously impressive!

Playing around with it is quite fun...
Edit 1: "edit the image to contain 3 more people"
sivareddyg.bsky.social
Why do LLMs have a hard time aligning, while humans are better at it? 🌟The answer lies in the lack of a societal alignment framework for LLMs 🌍.

Incredible effort by @karstanczak.bsky.social in pulling views from multiple disciplines and experts in these fields.

arxiv.org/abs/2503.00069
karstanczak.bsky.social
📢New Paper Alert!🚀

Human alignment balances social expectations, economic incentives, and legal frameworks. What if LLM alignment worked the same way?🤔

Our latest work explores how social, economic, and contractual alignment can address incomplete contracts in LLM alignment🧵
sivareddyg.bsky.social
How to Get Your LLM to Generate Challenging
Problems for Evaluation? 🤔 Check out our CHASE recipe. A highly relevant problem given that most human-curated datasets are crushed within days.
arkil.bsky.social
Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨

Work w/ fantastic advisors Dima Bahdanau and @sivareddyg.bsky.social

Thread 🧵:
Reposted by Siva Reddy
bennokrojer.bsky.social
Finally it's handy that all my twitter posts got migrated here to bsky:

I'll be presenting AURORA at @neuripsconf.bsky.social on Wednesday!

Come by to discuss text-guided editing (and why imo it is more interesting than image generation), world modeling, evals and vision-and-language reasoning
bennokrojer.bsky.social
AURORA 🌌 is now accepted as a Spotlight at NeurIPS 🥂

We wondered if a model can do *controlled* video generation but in a *single* step?

So we built a dataset+model for “taking actions” on images via editing, or what you could call single-step controlled video gen
bennokrojer.bsky.social
Did you miss the recent Auroras? No problem! ✨🎆

Super excited to share AURORA, a *general* image editing model + high-quality data that improves where prev work fails the most:
Performing *action or movement* edits, i.e. a kind of world model setup

Insights/Details ⬇️
sivareddyg.bsky.social
Congratulations
@andreasmadsen.bsky.social
on successfully defending your PhD ⚔️ 🎉🎉 Grateful to you for stretching my interests into interpretability and engaging me with exciting deas. Good luck with your mission on building faithfully interpretable models.
andreasmadsen.bsky.social
I’m thrilled to share that I’ve finished my Ph.D. at Mila and Polytechnique Montreal. For the last 4.5 years, I have worked on creating new faithfulness-centric paradigms for NLP Interpretability. Read my vision for the future of interpretability in our new position paper: arxiv.org/abs/2405.05386
Interpretability Needs a New Paradigm
Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only model...
arxiv.org
Reposted by Siva Reddy
apoorvkh.com
“Turn” a decoder into an encoder with LLM2Vec (github.com/McGill-NLP/l...). Seen at COLM 2024 :)

If you want the naive, training-free / model-agnostic approach: their related work section says it is most common to using the final token’s last hidden state.
GitHub - McGill-NLP/llm2vec: Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders' - McGill-NLP/llm2vec
github.com
sivareddyg.bsky.social
Stages of #ICLR reviewing:
Stage 1: 😍 I hope I learn something new
Stage 2: 🤗 I hope I am constructive enough while being critical. Submits review
Stage 3: 🤯 Receives 5 page response + revision with many new pages
Stage 4: 😱 Crap, how do I get out of this?
Stage 5: 😵‍💫 What year is it?
Reposted by Siva Reddy