Apurv Verma
banner
apurv-verma.bsky.social
Apurv Verma
@apurv-verma.bsky.social
Building safer, more aligned models 🧭 📐
PhD student, NJIT 🎓 | NLP at Bloomberg 🛠️
Website: vermaapurv.com/aboutme/
Ever wondered about watermarking's effect on model alignment? 🤔
We found it shifts AI safety behavior. Our fix: generate 2-4 responses, pick the best one 🎯
"Watermarking Degrades Alignment in Language Models" 📄
arxiv.org/abs/2506.04462
#AIResearch #AISafety #Watermarking #LLMs
Watermarking Degrades Alignment in Language Models: Analysis and Mitigation
Watermarking techniques for large language models (LLMs) can significantly impact output quality, yet their effects on truthfulness, safety, and helpfulness remain critically underexamined. This paper...
arxiv.org
June 8, 2025 at 1:57 AM
Reposted by Apurv Verma
Very good (technical) explainer answering "How has DeepSeek improved the Transformer architecture?". Aimed at readers already familiar with Transformers.

epoch.ai/gradient-upd...
How has DeepSeek improved the Transformer architecture?
This Gradient Updates issue goes over the major changes that went into DeepSeek’s most recent model.
epoch.ai
January 30, 2025 at 9:07 PM
Reposted by Apurv Verma
Very interesting paper by Ananda Theertha Suresh et al.

For categorical/Gaussian distributions, they derive the rate at which a sample is forgotten to be 1/k after k rounds of recursive training (hence 𝐦𝐨𝐝𝐞𝐥 𝐜𝐨𝐥𝐥𝐚𝐩𝐬𝐞 happens more slowly than intuitively expected)
December 27, 2024 at 11:35 PM
I am an AI researcher working on safe AI. My most recent work can be found at arxiv.org/abs/2407.14937. I am trying to connect with other AI researchers on 🦋; follow me here, and I will follow you back.
arxiv.org
November 19, 2024 at 2:15 AM