Hongli Zhan ✈️ ICML
@hongli-zhan.bsky.social
110 followers 240 following 13 posts
http://honglizhan.github.io PhD Candidate 🤘@UTAustin | previously @IBMResearch @sjtu1896 | NLP for social good
Posts Media Videos Starter Packs
Pinned
hongli-zhan.bsky.social
I'll be at #ICML to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and let’s catch up on LLM alignment! 😃

🚀TL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align responses — with minimal human effort.

🧵
hongli-zhan.bsky.social
I will also be on IBM's Expo Talk Panel on Monday, Jul 14, to discuss how SPRI can be incorporated in the industry for generating high-quality synthetic data. You can find more details of the talk here: icml.cc/virtual/2025...
ICML Expo Talk Panel Situating principles in context for synthetic dataICML 2025
icml.cc
hongli-zhan.bsky.social
📜Link to the paper: icml.cc/virtual/2025...
👨🏻‍💻Code and data: github.com/honglizhan/S...

Shout out to an amazing team @jessyjli.bsky.social, @m-yurochkin.bsky.social, Muneeza Azmat & Raya Horesh! Also super grateful to the reviewers for their invaluable feedback!

#ICML2025 #LLMAlignment
ICML Poster SPRI: Aligning Large Language Models with Context-Situated PrinciplesICML 2025
icml.cc
hongli-zhan.bsky.social
1️⃣SPRI generates principles as effective as psychologists to improve users’ well-being

2️⃣SPRI enables tailored rubrics for LLM-judges, matching human-crafted rubrics (e.g., BiGGen-Bench)

3️⃣SPRI-generated synthetic data boosts Llama/Mistral/Gemma (7~9B) on TruthfulQA, with no loss on other benchmarks
hongli-zhan.bsky.social
🎯Motivation: Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead?

💡SPRI tackles this and proves to rival human oracle guidance in the three real-world use cases we tested on 👇
hongli-zhan.bsky.social
I'll be at #ICML to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and let’s catch up on LLM alignment! 😃

🚀TL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align responses — with minimal human effort.

🧵
hongli-zhan.bsky.social
🎯Motivation: Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead?

💡SPRI tackles this and proves to rival human oracle guidance in the three real-world use cases we tested on 👇
hongli-zhan.bsky.social
I’m excited to share that our paper has been accepted at #ICML2025! 🎉🥳🎊

This work was done during my internship at IBM Research, and it wouldn’t have been possible without a top-notch team and my amazing advisor 👏
Reposted by Hongli Zhan ✈️ ICML
jessyjli.bsky.social
To appear #ICML2025!! 🎉
hongli-zhan.bsky.social
Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply.

Can we guide responses with context-situated principles instead?

Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort.

arxiv.org/pdf/2502.03397
hongli-zhan.bsky.social
I definitely agree :) I think SPRI can help generate SFT data for their constitutional classifiers that extrapolate *beyond* the "chemical weapons" context that they show in Sec 5 and Appendix B.

Thanks for sharing this!
Reposted by Hongli Zhan ✈️ ICML
jessyjli.bsky.social
The principles that LLMs align with should be specific to the task at hand! Check out @hongli-zhan.bsky.social’s latest work 👇
hongli-zhan.bsky.social
Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply.

Can we guide responses with context-situated principles instead?

Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort.

arxiv.org/pdf/2502.03397
hongli-zhan.bsky.social
[5/5] Code and model generations: github.com/honglizhan/S...

This project was carried out during my internship at IBM Research, and I’d like to highlight the support and mentorship from my amazing hosts Muneeza Azmat, Raya Horesh, @m-yurochkin.bsky.social and advisor @jessyjli.bsky.social!
arxiv.org
hongli-zhan.bsky.social
[4/5] In addition, when applying SPRI to generate SFT data for alignment, we observe substantial improvement on TruthfulQA.
hongli-zhan.bsky.social
[3/5] We tested SPRI on 3 tasks: generating 1) cognitive reappraisals, 2) instance-specific rubrics for LLM-as-a-judge, and 3) SFT data for alignment.

SPRI turns out to work great for tasks that require complex principles, showcasing on-par performance as expert-guided methods.
hongli-zhan.bsky.social
[2/5] Short for Situated-PRInciples, SPRI involves 2 stages: 1) synthesizing context-situated principles, and 2) crafting principle-guided responses.

In each stage, a base model and a critic model are used to create principles and responses from scratch through critique-refine.
hongli-zhan.bsky.social
Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply.

Can we guide responses with context-situated principles instead?

Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort.

arxiv.org/pdf/2502.03397