Taylor Sorensen
@taylor-sorensen.bsky.social
230 followers
250 following
33 posts
NLP PhD Candidate at UW
Posts
Media
Videos
Starter Packs
Reposted by Taylor Sorensen
Abhilasha Ravichander
@lasha.bsky.social
· Mar 21
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. ...
arxiv.org