Jed Brown
banner
jedbrown.org
Jed Brown
@jedbrown.org
Prof developing fast algorithms, reliable software, and healthy communities for computational science. Opinions my own. https://hachyderm.io/@jedbrown

https://PhyPID.org | aspiring killjoy | against epistemicide | he/they
This preprint is a big caveat to (1) above. It suggests the plagiarism is common in LLM responses to organic prompts. If plagiarism detectors aren't flagging it, it may be because the passages are smaller or they aren't checking the original content.
"Chatbots are routinely breaching the ethical standards that humans are normally held to."

It is often asked how often organic prompting returns near-verbatim content in the responses. This preprint shows it's very common, especially with expository writing and code.

arxiv.org/abs/2411.10242
January 10, 2026 at 12:42 AM
"Chatbots are routinely breaching the ethical standards that humans are normally held to."

It is often asked how often organic prompting returns near-verbatim content in the responses. This preprint shows it's very common, especially with expository writing and code.

arxiv.org/abs/2411.10242
January 10, 2026 at 12:37 AM
Great contextualization of this work. When we let financial interests choose terminology and accept corporate testimony as though it were an honest and accurate depiction of the technology, we are perpetuating a lie to the public and abetting bad court rulings.
January 10, 2026 at 12:22 AM
Unsourced and improperly-sourced claims are rampant, as seen in the deluge of slop papers and legal briefs and government/Deloitte reports that people are constantly getting caught trying to fraudulently pass as human work. And note that these are not the crime, but merely evidence of the crime.
January 9, 2026 at 4:40 AM
I think it's a bad question for informing decisions (like "what's the chance I get stopped for speeding in this school zone?"), but the answer is that we really don't know. Only a subset of organic LLM interactions are checked for that purpose and current checkers are fallible in many ways.
January 9, 2026 at 4:40 AM
We know that:
1. organic prompting for content that is routinely run through plagiarism detectors (which access a subset of the LLM's training data) does not frequently turn red, and
2. some prompting elicits extensive verbatim content.

This is a recipe for lulling people into complacency.
January 9, 2026 at 3:29 AM
Ghost authorship and paraphrased plagiarism are rarely detected/enforced without other evidence (contracts, confessions/bragging, other process records), but it's still a clear professional norm, while a lot of people want to normalize LLMs as somehow being an exemption card for such norms.
January 9, 2026 at 3:29 AM
There is no consistent procedure for assessing plagiarism. Journals and institutions have internal protocols, but it's a subjective standard and not a legal matter (no court, no jury; that's only for copyright infringement). But it's still misconduct if you don't get caught.
January 9, 2026 at 3:29 AM
If you trust an LLM's "summary" (it isn't really a summary), you may commit misconduct by misstating their actual claims. If you take LLM output as a sort of fuzzy search/idea generator and track down original sources (don't trust LLM output), read them, and then write your own paper, that's fine.
January 9, 2026 at 3:29 AM
To "see that's not true" would be like wearing gloves when firing a gun and seeing that you didn't leave fingerprints on the weapon. You still pulled the trigger, but may be less likely to be caught. LLMs are like wearing gloves with holes: you never know if it's going to leave that evidence.
I think it's reasonable to be concerned about prompts containing passages from existing writing, but I can just imagine a university taking a very conservative approach and saying "don't use LLMs, you're likely to get plagiarized responses", people seeing that's not true, and ignoring the warning.
January 9, 2026 at 1:37 AM
I think you were using "plagiarism" colloquially to mean "near-verbatim" or "red on a plagiarism detector" while I was using the academic definition as an epistemic/process violation (in which ghost authorship very much is plagiarism and verbatim/similarity is merely circumstantial evidence).
Everyone should read The Two Victims of Plagiarism from @plagiarismtoday.com in the context of LLMs.

LLMs provide plausible deniability unless we recognize what it means to choose to use the plagiarism machine: non-consensual ghost authorship in a blender.

www.plagiarismtoday.com/2019/08/01/t...
January 9, 2026 at 1:37 AM
University admins are typically LLM boosters who make FOMO-driven commitments without understanding what the products are. Universities *should* recognize that LLM use in scholarly work is misconduct akin to ghost authorship because it misrepresents the author's epistemic relation to the work.
January 9, 2026 at 12:49 AM
My interpretation of that thread is that Git core maintainers believe you can't sign the DCO for a contribution derived from an LLM (a position I agree with and wrote about last summer 👇). One contributor using an LLM for a docs-only PR two years ago would be exempt, no?
hachyderm.io/@jedbrown/11...
Jed Brown (@[email protected])
Attached: 1 image What does the Signed-off-by tag mean? It is certifying the Developer Certificate of Origin (DCO). https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin I claim you cannot c...
hachyderm.io
January 9, 2026 at 12:40 AM
I appreciate your writing and I want to be constructive here. It's very confusing and involves fighting cognitive biases like the ELIZA Effect (Weizenbaum 1966) to accurately describe a synthetic text extruder as having no intent or basis in reality, but the anthropomorphization is so corrosive.
January 8, 2026 at 10:55 PM
This may seem like a pedantic point, but "Grok" did not "confirm". It was prompted by a user and replied like autocomplete. The response does not reflect awareness or a recounting of facts (the mechanism is indifferent to facts). It's confusing the public that so many journalists make this mistake.
This is a thread of major media outlets falsely anthropomorphising the "Grok" chatbot program and in doing so, actively and directly removing responsibility and accountability from individual people working at X who created a child pornography generator (Elon Musk, Nikita Bier etc)

#1: Reuters
January 8, 2026 at 10:49 PM
There is an ongoing natural experiment in which students use LLMs to generate papers and usually the plagiarism detector is green. (It's still plagiarism akin to ghost authorship, but hard to prove.) And that space of prompts and relevant training data may not be representative of professional uses.
January 8, 2026 at 10:41 PM
Thanks. It's hard to be confident that a given prompt (which might incidentally or intentionally contain a phrase appearing in the training data, such as a book that also quotes an attributed verbatim passage) won't elicit near-verbatim content.
January 8, 2026 at 10:41 PM
This focus on near-verbatim matches has already assumed the premise that we just don't want to get caught, not that we think copyright infringement or plagiarism are bad or dishonest practices. Meanwhile, OpenAI is arguing in court that legality of a prompt depends on what is returned by their model
It's notable that OpenAI lawyers tried this because it undermines the indemnification clause in their services agreement. That indemnification clause backed up by heaps of money and hubris has been key to lawyers for business users allowing widespread use of "AI"-generated content.
January 8, 2026 at 10:18 PM
Any time you prompt an LLM, you get text of unknown provenance. Software may be the most direct and mature "continue this" application, but spicy-autocomplete for prose is also a thing. There is no simple/reliable rule to prevent an LLM from producing near-verbatim results.
January 8, 2026 at 10:18 PM
Software devs do it routinely: e.g., type `//sparse matrix transpose` and auto-complete a page of near-verbatim code with namespaces intact and copyright stripped. devclass.com/2022/10/17/g...
That litigation is ongoing githubcopilotlitigation.com/case-updates...

And "verbatim" isn't the standard.
January 8, 2026 at 9:58 PM