Danny Wilf-Townsend
@dannywt.bsky.social
650 followers 430 following 81 posts
Associate Professor of Law at Georgetown Law thinking, writing, and teaching about civil procedure, consumer protection, and AI. Blog: https://www.wilftownsend.net/ Academic papers: https://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=2491047
Posts Media Videos Starter Packs
Reposted by Danny Wilf-Townsend
andrewkjennings.com
It's not false economy. The time it takes to generate a 50-state survey by hand is far greater than the time it takes to verify Lexis/Westlaw's AI answers and correct the errors.
dannywt.bsky.social
An update for Sonnet 4.5, released last week: it scored 60.2% on my final exam (with extended thinking on, 54.4% without it). That's a big step up (~20 percentage points) from Opus 4.1's scores, and puts Sonnet 4.5 close to, if slightly behind, other lead models. On a human curve, that's ~ an A-/B+
dannywt.bsky.social
For my latest round of informal tests of large language models, I looked at how good different models are at taking a law school exam—and also whether they are capable of grading exam answers in a consistent and reasonably accurate way. 🧵
www.wilftownsend.net/p/chatgpt-ta...
ChatGPT takes—and grades—my law school exam
The latest round of informal testing of large language models on legal questions
www.wilftownsend.net
dannywt.bsky.social
Oh, also, from the parochial law professor standpoint (i.e., the most important standpoint) it makes "looking for hallucinations" a less reliable form of trying to monitor student AI use on exams or papers.
dannywt.bsky.social
...has gone up. Certainly not to the point where I would recommend relying on AI for legal advice (or to write your briefs), but the size of the change does seem notable for at least those (and probably other) reasons.
dannywt.bsky.social
...a few thoughts: (1) for practitioners using AI, I would think that fewer hallucinations makes it faster and cheaper to review/check/edit AI-generated outputs. And (2) for non-experts using AI, who aren't editing but just reading (or even relying on) answers, the quality of those answers..
dannywt.bsky.social
I wouldn't draw a big conclusion specifically from this exercise; but it is consistent with my experience that hallucinations in answering legal questions seem way down in general now compared to, e.g., a year ago. In terms of the implications of that broader fact (if it is a fact)...
dannywt.bsky.social
One other note: across the five exam answers and dozens of answer evaluations generated here, I did not notice a single hallucination. This test wasn't designed to measure hallucination rates, but it's consistent with the general sense that they have dropped significantly
dannywt.bsky.social
For my latest round of informal tests of large language models, I looked at how good different models are at taking a law school exam—and also whether they are capable of grading exam answers in a consistent and reasonably accurate way. 🧵
www.wilftownsend.net/p/chatgpt-ta...
ChatGPT takes—and grades—my law school exam
The latest round of informal testing of large language models on legal questions
www.wilftownsend.net
Reposted by Danny Wilf-Townsend
ahemmer.bsky.social
Our office is again hiring one or more attorneys for a one-year fellowship to work directly with the Illinois Solicitor General and her team, beginning in August/September 2026.

www.governmentjobs.com/careers/ilag...
Job Opportunities | Office of the Illinois Attorney General
www.governmentjobs.com
dannywt.bsky.social
Overall, GPT-5-Pro was good enough to use for my (informal) approach here—it was both internally consistent and looked good in accuracy spot checks. Its grades show some models scoring in the A- to A range, consistent with what others have found, too.
dannywt.bsky.social
It turns out that some models are deeply inaccurate, and some are frequently inconsistent, but a few are reasonably consistent and accurate. And along the way, I learned that human graders are sometimes less consistent than we might hope.
Text describing consistency rates in human graders
dannywt.bsky.social
The goal here wasn't to use them to grade student work—something I would not recommend. It's instead to see if they can be used to automate the evaluation process of other language models: can we use LLMs to get a sense of different models' relative capacities on legal questions?
dannywt.bsky.social
For my latest round of informal tests of large language models, I looked at how good different models are at taking a law school exam—and also whether they are capable of grading exam answers in a consistent and reasonably accurate way. 🧵
www.wilftownsend.net/p/chatgpt-ta...
ChatGPT takes—and grades—my law school exam
The latest round of informal testing of large language models on legal questions
www.wilftownsend.net
dannywt.bsky.social
The review of "The Deletion Remedy" also discusses Christina Lee's "Beyond Algorithmic Disgorgement," papers.ssrn.com/sol3/papers..... Christina is on the job market this year, and if I were on a hiring committee I would definitely be taking a look.
Beyond Algorithmic Disgorgement: Remedying Algorithmic Harms
AI regulations are popping up around the world, and they mostly involve ex-ante risk assessment and mitigating those risks. But even with careful risk assessmen
papers.ssrn.com
dannywt.bsky.social
Indicting based on sandwich type could lead to quite a pickle. Let's hope this jury's not on a roll.
dannywt.bsky.social
Me too! They must be targeting proceduralists. Probably due to our lax morals.
dannywt.bsky.social
A nice quick read from my colleague @JonahPerlin about an issue that I see a lot of people oversimplifying: whether an attorney's use of a generative AI tool waives privilege. This is an area where I'm very interested to see how the law develops. news.bloomberglaw.com/us-law-week/...
No, Generative AI Didn’t Just Kill the Attorney-Client Privilege
Opinion: Georgetown Law professor Jonah Perlin says using third-party technology doesn't categorically waive the attorney-client privilege.
news.bloomberglaw.com
Reposted by Danny Wilf-Townsend
melaniemitchell.bsky.social
In a stunning moment of self-delusion, the Wall Street Journal headline writers admitted that they don't know how LLM chatbots work.
dannywt.bsky.social
And thank you to @wertwhile.bsky.social for the shoutout and discussion of my work!
dannywt.bsky.social
And I completely agree with what @wertwhile.bsky.social and @weisenthal.bsky.social say about OpenAI's o3 being the model to focus on—lots of people are forming impressions about AI capabilities based on older or less powerful tools, and aren't seeing the current level of capabilities as a result.
dannywt.bsky.social
Finally, the work of mine that is discussed a bit is this informal testing of AI models on legal questions. The most recent post is here: www.wilftownsend.net/p/testing-ge...
Testing generative AI on legal questions—May 2025 update
The latest round of my informal testing
www.wilftownsend.net
dannywt.bsky.social
A very pleasant surprise to listen to one of my favorite podcasts and hear my own work being discussed. And it's an excellent episode and overview for anyone thinking of AI's effects on the legal profession. Some thoughts / suggestions below for anyone who wants further reading:
dannywt.bsky.social
What an interesting question — cool study.