¯\_(ツ)_/¯
PhD student @jhuclsp | Prev @IndiaMSR
GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...
#EMNLP2025 #EMNLP #SyntheticData
@jhuclsp.bsky.social
GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...
#EMNLP2025 #EMNLP #SyntheticData
@jhuclsp.bsky.social
GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...
#EMNLP2025 #EMNLP #SyntheticData
GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...
#EMNLP2025 #EMNLP #SyntheticData
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
We use entity-aware control codes + either ICL (with bad-token blocking) or prefix-tuning w/ masking to get strong privacy–utility tradeoffs on legal & clinical data, outperforming DP-SGD in practice (EMNLP 2025).
www.arxiv.org/abs/2509.25729
TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8
#NLProc #LLM #AIResearch
Our work arxiv.org/abs/2506.00628 (Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that the model relies on, and proposes a fix.
Our work arxiv.org/abs/2506.00628 (Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that the model relies on, and proposes a fix.
huggingface.co/Hplm
arxiv.org/abs/2504.05523
huggingface.co/Hplm
arxiv.org/abs/2504.05523
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
▶️Domain-specific pretraining!
Pretraining models can be a research tool, it's cheaper than LoRA, and allows studying
💠grammatical change
💠emergent word senses
💠who knows what more…
Train on your data with our pipeline or use ours!
#AI #LLM 🤖📈
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
The 12th Mid-Atlantic Student Colloquium is a one day event bringing together students, faculty and researchers from universities and industry in the Mid-Atlantic.
Please submit this very short form if you are interested in hosting! Deadline January 6th. #MASC2025
The 12th Mid-Atlantic Student Colloquium is a one day event bringing together students, faculty and researchers from universities and industry in the Mid-Atlantic.
Please submit this very short form if you are interested in hosting! Deadline January 6th. #MASC2025
The PhD admissions process is stressful! 😅
Want a behind-the-scenes look at the process? 👀✨ You have questions, we have answers. 📝🤝
Watch my Admissions AMA for @jhuclsp.
https://youtu.be/YlwpIPFNXjo?si=O7n5QwGT5sQdpg7u
The PhD admissions process is stressful! 😅
Want a behind-the-scenes look at the process? 👀✨ You have questions, we have answers. 📝🤝
Watch my Admissions AMA for @jhuclsp.
https://youtu.be/YlwpIPFNXjo?si=O7n5QwGT5sQdpg7u
We’re looking for candidates across data science and AI, including science, health, medicine, the humanities, engineering, policy, and ethics.
Spread the word and apply!
ai.jhu.edu/postdoctoral...
Please reply or DM me if you're doing research at CLSP and would like to be added - I'm still trying to find out which of us are on here so far.
go.bsky.app/JtWKca2
Please reply or DM me if you're doing research at CLSP and would like to be added - I'm still trying to find out which of us are on here so far.
go.bsky.app/JtWKca2