Jessy Li
jessyjli.bsky.social
Jessy Li
@jessyjli.bsky.social
https://jessyli.com Associate Professor, UT Austin Linguistics.
Part of UT Computational Linguistics https://sites.utexas.edu/compling/ and UT NLP https://www.nlp.utexas.edu/
Reposted by Jessy Li
New work to appear @ TACL!

Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.

Yet they often assign higher probability to ungrammatical strings than to grammatical strings.

How can both things be true? πŸ§΅πŸ‘‡
November 10, 2025 at 10:11 PM
Incredibly honored to serve as #EMNLP 2026 Program Chair along with @sunipadev.bsky.social and Hung-yi Lee, and General Chair @andre-t-martins.bsky.social. Looking forward to Budapest!!

(With thanks to Lisa Chuyuan Li who took this photo in Suzhou!)
November 8, 2025 at 2:39 AM
Reposted by Jessy Li
Delighted Sasha's (first year PhD!) work using mech interp to study complex syntax constructions won an Outstanding Paper Award at EMNLP!

Also delighted the ACL community continues to recognize unabashedly linguistic topics like filler-gaps... and the huge potential for LMs to inform such topics!
November 7, 2025 at 6:22 PM
Think your LLMs β€œunderstand” words like although/but/therefore? Think again!

They perform at chance for making inferences from certain discourse connectives expressing concession
"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! πŸ“·

We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge.

New paper w/ Daniel, Will, @jessyjli.bsky.social
October 16, 2025 at 5:02 PM
Test your models and see if they just memorize or truly understand!

PLSemanticsBench - where formal meets informal!

arxiv.org/abs/2510.03415

Team: Aditya Thimmaiah, Jiyang Zhang, Jayanth Srinivasa, Milos Gligoric
PLSemanticsBench: Large Language Models As Programming Language Interpreters
As large language models (LLMs) excel at code reasoning, a natural question arises: can an LLM execute programs (i.e., act as an interpreter) purely based on a programming language's formal semantics?...
arxiv.org
October 14, 2025 at 2:33 AM
So what's really happening⁉️
LLMs aren't interpreting rules -- they're recalling patterns.
Their "understanding" is promising... but shallow.

πŸ’‘It's time to test semantics, not just syntax.πŸ’‘
To move from surface-level memorization β†’ true symbolic reasoning.
October 14, 2025 at 2:33 AM
Change the rules -- swap (+ with -) or replace (+ with novel symbols) operators -- and accuracy collapses.
Models that were "near-perfect" drop to single digits. 😬
October 14, 2025 at 2:33 AM
🚨 Does your LLM really understand code -- or is it just really good at remembering it?
We built **PLSemanticsBench** to find out.
The results: a wild mix.

βœ…The Brilliant:
Top reasoning models can execute complex, fuzzer-generated programs -- even with 5+ levels of nested loops! 🀯

❌The Brittle: 🧡
October 14, 2025 at 2:33 AM
Reposted by Jessy Li
Find my students and collaborators at COLM this week!

Tuesday morning: @juand-r.bsky.social and @ramyanamuduri.bsky.social 's papers (find them if you missed it!)

Wednesday pm: @manyawadhwa.bsky.social 's EvalAgent

Thursday am: @anirudhkhatry.bsky.social 's CRUST-Bench oral spotlight + poster
October 7, 2025 at 6:03 PM
We’re hiring faculty as well! Happy to talk about it at COLM!
UT Austin Linguistics is hiring in computational linguistics!

Asst or Assoc.

We have a thriving group sites.utexas.edu/compling/ and a long proud history in the space. (For instance, fun fact, Jeff Elman was a UT Austin Linguistics Ph.D.)

faculty.utexas.edu/career/170793

🀘
UT Austin Computational Linguistics Research Group – Humans processing computers processing humans processing language
sites.utexas.edu
October 8, 2025 at 1:17 AM
Reposted by Jessy Li
Can we quantify what makes some text read like AI "slop"? We tried πŸ‘‡
"AI slop" seems to be everywhere, but what exactly makes text feel like "slop"?

In our new work (w/ @tuhinchakr.bsky.social, Diego Garcia-Olano, @byron.bsky.social ) we provide a systematic attempt at measuring AI "slop" in text!

arxiv.org/abs/2509.19163

🧡 (1/7)
September 24, 2025 at 1:28 PM
Reposted by Jessy Li
On my way to #COLM2025 🍁

Check out jessyli.com/colm2025

QUDsim: Discourse templates in LLM stories arxiv.org/abs/2504.09373

EvalAgent: retrieval-based eval targeting implicit criteria arxiv.org/abs/2504.15219

RoboInstruct: code generation for robotics with simulators arxiv.org/abs/2405.20179
October 6, 2025 at 3:50 PM
Reposted by Jessy Li
Traveling to my first @colmweb.org🍁

Not presenting anything but here are two posters you should visit:

1. @qyao.bsky.social on Controlled rearing for direct and indirect evidence for datives (w/ me, @weissweiler.bsky.social and @kmahowald.bsky.social), W morning

Paper: arxiv.org/abs/2503.20850
Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models
Language models (LMs) tend to show human-like preferences on a number of syntactic phenomena, but the extent to which these are attributable to direct exposure to the phenomena or more general propert...
arxiv.org
October 6, 2025 at 3:22 PM
Here is a genuine one :) CosmicAI’s AstroVisBench, to appear at #NeurIPS

bsky.app/profile/nsfs...
Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy!

A new benchmark developed by researchers at the NSF-Simons AI Institute for Cosmic Origins is testing how well LLMs implement scientific workflows in astronomy and visualize results.
October 2, 2025 at 2:03 PM
All of us (@kanishka.bsky.social @kmahowald.bsky.social and me) are looking for PhD students this cycle! If computational linguistics/NLP is your passion, join us at UT Austin!

For my areas see jessyli.com
September 30, 2025 at 7:30 PM
Can AI aid scientists amidst their own workflows, when they do not know step-by-step workflows and may not know, in advance, the kinds of scientific utility a visualization would bring?

Check out @sebajoe.bsky.social’s feature on ✨AstroVisBench:
Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy!

A new benchmark developed by researchers at the NSF-Simons AI Institute for Cosmic Origins is testing how well LLMs implement scientific workflows in astronomy and visualize results.
September 25, 2025 at 8:52 PM
Reposted by Jessy Li
πŸ“£ NEW HCTS course developed in collaboration with @tephi-tx.bsky.social: AI in Health Communication πŸ“£

Explore responsible applications and best practices for maximizing impact and building trust with @utaustin.bsky.social experts @jessyjli.bsky.social & @mackert.bsky.social.

πŸ’»: rebrand.ly/HCTS_AI
September 4, 2025 at 5:02 PM
Would be great to chat at COLM!
August 16, 2025 at 5:11 AM
Reposted by Jessy Li
long range narrative understanding, even basic fact checking that humans easily get near perfect on, has barely improved in LMs over years novelchallenge.github.io
NoCha leaderboard
novelchallenge.github.io
August 15, 2025 at 3:55 PM
Reposted by Jessy Li
πŸ€– 🧠 NEW PAPER ON COGSCI & AI 🧠 πŸ€–

Recent neural networks capture properties long thought to require symbols: compositionality, productivity, rapid learning

So what role should symbols play in theories of the mind? For our answer...read on!

Paper: arxiv.org/abs/2508.05776

1/n
August 15, 2025 at 4:27 PM
Yes, at least need other data (like Echos in AI), quality measure (LitBench), also what we did in QUDsim was to make sure the stories are from posts pre-LLM to prevent AI stories. Further, The way they measure style + semantic diversity doesn't align with how they define it (only capture lexical)
August 15, 2025 at 1:20 PM
Reposted by Jessy Li
I agree this thread's headline claim seems premature. Let me add our recent ACL Findings paper, with Dexter Ju and @hagenblix.bsky.social, which found syntactic simplification in at least some LMs, in a novel domain regeneration setting: aclanthology.org/2025.finding...
aclanthology.org
August 15, 2025 at 4:35 AM
Nice, reading level, syntactic complexity, and sentence structures are great angles to study this!!
August 15, 2025 at 5:20 AM
Thanks :) Yes will be there, let's catch up!
August 12, 2025 at 9:03 PM