A. Feder Cooper
banner
afedercooper.bsky.social
A. Feder Cooper
@afedercooper.bsky.social
ML researcher, Stanford postdoc affiliate, future Yale professor

https://afedercooper.info
Not a perfect fit to the exact query I don't think, but I like this note as a starting place: lawreview.uchicago.edu/sites/defaul...

@jackbalkin.bsky.social
lawreview.uchicago.edu
January 29, 2026 at 6:02 PM
(lucky for everyone that I'm too lazy to write a blog post))
January 28, 2026 at 7:02 AM
Yes, I have published at that track before, and related ones. But I'm not eager to again. Getting into that is maybe worth a blog post.
January 28, 2026 at 5:51 AM
No I did not write/submit this paper to the ICML position paper track. Like many (but of course not all) papers submitted there, I think this is at most a blog post (where "at most" is a very generous upper bound, because the ~300 characters above almost certainly are enough).
January 28, 2026 at 5:48 AM
(This is all to say, I've been shocked at some of what I've heard coming out of industry. My assumption used to be that they knew a lot more about this than they seem to.)
January 25, 2026 at 9:17 PM
I think partially yes. There definitely are full-time applied and research people working on data curation as a topic. But there are a ton of gaps/ things that might seem surprising here. E.g., making corpus-level decisions doesn't always tell you much about the underlying training data examples.
January 25, 2026 at 9:15 PM
Am also concerned about this, but it’s not clear to me that companies even know everything that’s included. I suppose “use it all” is an editorial decision, though.
January 25, 2026 at 8:44 PM
I just had a paper I reviewed months ago be “desk rejected” by ICLR for this reason. (It’s arguably not a desk rejection after 3 reviewers already chimed in.) But, this seems to be where things are headed.
January 24, 2026 at 7:00 PM
Even if chucking the papers outright is undesirable (hallucination checkers are not error-free), I'm disappointed there's no process at all other than "oops, you can go fix it if you care to."
January 24, 2026 at 6:43 AM
(though going forward, I wouldn’t be sad if I had a bit more compute 🙃)
January 21, 2026 at 6:19 PM
One of my favorite responses to questions about compute in my work this year is “it’s expensive, yes, but I had to develop some efficient algos and write some efficient code to make this possible. This work was done at odd hours on 4 A100s shared by a dozen people.”
January 21, 2026 at 6:18 PM
note that i said “ML” and “copyright,” which are very specific things that i actually think have very little to do with the anger i’m referring to
January 14, 2026 at 12:27 AM
Reposted by A. Feder Cooper
got to experience the "I did not write that headline" phenomenon firsthand

The article: "Correctly scoping a legal safe harbor for A.I.-generated child sexual abuse material testing is tough."

The headline: "There's One Easy Solution to the A.I. Porn Problem"
January 13, 2026 at 9:03 PM
It's been quite the experience seeing the responses to this work (across the spectrum). I've been working in this area since 2020 & am very grateful to have amazing collaborators + mentors who've supported me along the way (only a few on bsky) @pamelasamuelson.bsky.social @zephoria.bsky.social
January 12, 2026 at 7:58 PM
For those interested in the details:

our recent work on production LLMs like Claude 3.7 Sonnet: arxiv.org/abs/2601.02671
Extracting books from production language models
Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's weights during training, and whether those memorized dat...
arxiv.org
January 12, 2026 at 7:58 PM
Happy you found our work interesting! Linking to the open-weight model extraction paper @marklemley.bsky.social was referring to:

arxiv.org/abs/2505.12546
Extracting memorized pieces of (copyrighted) books from open-weight language models
Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs' protected expr...
arxiv.org
January 12, 2026 at 4:37 AM
(Indexing on the word “often”)
January 11, 2026 at 10:52 PM
important disclaimer that our research (and the other papers referenced in this article) don’t really capture if they “often just repeat what they have seen elsewhere”
January 11, 2026 at 10:51 PM
Me too. Like every time I want to move on I get sucked back in.
January 11, 2026 at 9:28 PM