¯\_(ツ)_/¯
PhD student @jhuclsp | Prev @IndiaMSR
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social
Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social
Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text
Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.
Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
huggingface.co/Hplm
arxiv.org/abs/2504.05523
huggingface.co/Hplm
arxiv.org/abs/2504.05523
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.