Jessy Li
@jessyjli.bsky.social
2.4K followers 450 following 49 posts
https://jessyli.com Associate Professor, UT Austin Linguistics. Part of UT Computational Linguistics https://sites.utexas.edu/compling/ and UT NLP https://www.nlp.utexas.edu/
Posts Media Videos Starter Packs
Reposted by Jessy Li
gregdnlp.bsky.social
Find my students and collaborators at COLM this week!

Tuesday morning: @juand-r.bsky.social and @ramyanamuduri.bsky.social 's papers (find them if you missed it!)

Wednesday pm: @manyawadhwa.bsky.social 's EvalAgent

Thursday am: @anirudhkhatry.bsky.social 's CRUST-Bench oral spotlight + poster
jessyjli.bsky.social
We’re hiring faculty as well! Happy to talk about it at COLM!
kmahowald.bsky.social
UT Austin Linguistics is hiring in computational linguistics!

Asst or Assoc.

We have a thriving group sites.utexas.edu/compling/ and a long proud history in the space. (For instance, fun fact, Jeff Elman was a UT Austin Linguistics Ph.D.)

faculty.utexas.edu/career/170793

🤘
UT Austin Computational Linguistics Research Group – Humans processing computers processing humans processing language
sites.utexas.edu
Reposted by Jessy Li
byron.bsky.social
Can we quantify what makes some text read like AI "slop"? We tried 👇
chantalsh.bsky.social
"AI slop" seems to be everywhere, but what exactly makes text feel like "slop"?

In our new work (w/ @tuhinchakr.bsky.social, Diego Garcia-Olano, @byron.bsky.social ) we provide a systematic attempt at measuring AI "slop" in text!

arxiv.org/abs/2509.19163

🧵 (1/7)
jessyjli.bsky.social
On my way to #COLM2025 🍁

Check out jessyli.com/colm2025

QUDsim: Discourse templates in LLM stories arxiv.org/abs/2504.09373

EvalAgent: retrieval-based eval targeting implicit criteria arxiv.org/abs/2504.15219

RoboInstruct: code generation for robotics with simulators arxiv.org/abs/2405.20179
jessyjli.bsky.social
Here is a genuine one :) CosmicAI’s AstroVisBench, to appear at #NeurIPS

bsky.app/profile/nsfs...
nsfsimonscosmicai.bsky.social
Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy!

A new benchmark developed by researchers at the NSF-Simons AI Institute for Cosmic Origins is testing how well LLMs implement scientific workflows in astronomy and visualize results.
jessyjli.bsky.social
All of us (@kanishka.bsky.social @kmahowald.bsky.social and me) are looking for PhD students this cycle! If computational linguistics/NLP is your passion, join us at UT Austin!

For my areas see jessyli.com
jessyjli.bsky.social
Can AI aid scientists amidst their own workflows, when they do not know step-by-step workflows and may not know, in advance, the kinds of scientific utility a visualization would bring?

Check out @sebajoe.bsky.social’s feature on ✨AstroVisBench:
nsfsimonscosmicai.bsky.social
Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy!

A new benchmark developed by researchers at the NSF-Simons AI Institute for Cosmic Origins is testing how well LLMs implement scientific workflows in astronomy and visualize results.
Reposted by Jessy Li
uthealthcomm.org
📣 NEW HCTS course developed in collaboration with @tephi-tx.bsky.social: AI in Health Communication 📣

Explore responsible applications and best practices for maximizing impact and building trust with @utaustin.bsky.social experts @jessyjli.bsky.social & @mackert.bsky.social.

💻: rebrand.ly/HCTS_AI
jessyjli.bsky.social
Would be great to chat at COLM!
Reposted by Jessy Li
kylelo.bsky.social
long range narrative understanding, even basic fact checking that humans easily get near perfect on, has barely improved in LMs over years novelchallenge.github.io
NoCha leaderboard
novelchallenge.github.io
Reposted by Jessy Li
rtommccoy.bsky.social
🤖 🧠 NEW PAPER ON COGSCI & AI 🧠 🤖

Recent neural networks capture properties long thought to require symbols: compositionality, productivity, rapid learning

So what role should symbols play in theories of the mind? For our answer...read on!

Paper: arxiv.org/abs/2508.05776

1/n
The top shows the title and authors of the paper: "Whither symbols in the era of advanced neural networks?" by Tom Griffiths, Brenden Lake, Tom McCoy, Ellie Pavlick, and Taylor Webb.

At the bottom is text saying "Modern neural networks display capacities traditionally believed to require symbolic systems. This motivates a re-assessment of the role of symbols in cognitive theories."

In the middle is a graphic illustrating this text by showing three capacities: compositionality, productivity, and inductive biases. For each one, there is an illustration of a neural network displaying it. For compositionality, the illustration is DALL-E 3 creating an image of a teddy bear skateboarding in Times Square. For productivity, the illustration is novel words produced by GPT-2: "IKEA-ness", "nonneotropical", "Brazilianisms", "quackdom", "Smurfverse". For inductive biases, the illustration is a graph showing that a meta-learned neural network can learn formal languages from a small number of examples.
jessyjli.bsky.social
Yes, at least need other data (like Echos in AI), quality measure (LitBench), also what we did in QUDsim was to make sure the stories are from posts pre-LLM to prevent AI stories. Further, The way they measure style + semantic diversity doesn't align with how they define it (only capture lexical)
Reposted by Jessy Li
adinawilliams.bsky.social
I agree this thread's headline claim seems premature. Let me add our recent ACL Findings paper, with Dexter Ju and @hagenblix.bsky.social, which found syntactic simplification in at least some LMs, in a novel domain regeneration setting: aclanthology.org/2025.finding...
aclanthology.org
jessyjli.bsky.social
Nice, reading level, syntactic complexity, and sentence structures are great angles to study this!!
jessyjli.bsky.social
Thanks :) Yes will be there, let's catch up!
jessyjli.bsky.social
Paper links:
Echoes in AI: arxiv.org/abs/2501.00273
Syntactic templates (EMNLP'24): aclanthology.org/2024.emnlp-m...
Discourse similarity (COLM'25 to appear): arxiv.org/abs/2504.09373
jessyjli.bsky.social
The Echoes in AI paper showed quite the opposite with also a story continuation setup.
Additionally, we present evidence that both *syntactic* and *discourse* diversity measures show strong homogenization that lexical and cosine used in this paper do not capture.
jessyjli.bsky.social
Tuesday at #ACL2025: Jan will be presenting this from 4-5:30pm in x4/x5!
Turns out content selection in LLMs are highly consistent with each other, but not so much with their own notion of importance or with human’s…
jessyjli.bsky.social
Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper, led by Jan Trienes: an interpretable framework for salience analysis in LLMs.

First of all, information salience is a fuzzy concept. So how can we even measure it? (1/6)
Reposted by Jessy Li
kanishka.bsky.social
Looking forward to attending #cogsci2025 (Jul 29 - Aug 3)! I’m especially excited to meet students who will be applying to PhD programs in Computational Ling/CogSci in the coming cycle.

Please reach out if you want to meet up and chat! Email is the best way, but DM also works if you must!

quick🧵:
Placeholders for 3 students (number arbitrarily chosen) and me - to signify my eventual group!
jessyjli.bsky.social
If you’re heading to ICML, check out Hongli’s work on context-specific alignment!
hongli-zhan.bsky.social
I'll be at #ICML to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and let’s catch up on LLM alignment! 😃

🚀TL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align responses — with minimal human effort.

🧵
jessyjli.bsky.social
Check out this new opinion piece from Sebastian and Lily! We have really powerful AI systems now, so what’s the bottleneck preventing the wider adoption of fact checking systems, in high stakes scenarios like medicine? It’s how we define the tasks 👇
lilywchen.bsky.social
Are we fact-checking medical claims the right way? 🩺🤔

Probably not. In our study, even experts struggled to verify Reddit health claims using end-to-end systems.

We show why—and argue fact-checking should be a dialogue, with patients in the loop

arxiv.org/abs/2506.20876

🧵1/
An overview of our AI-in-the-loop expert study pipeline: given a claim from a subreddit, we extract the PIO elements and retrieve the evidence automatically. The evidence, its context, and the evidence are then presented to a medical expert to provide a judgment and a rationale for the factuality of the claim.
jessyjli.bsky.social
We have very good frameworks for cooperative dialog… but how about the opposite? @asher-zheng.bsky.social’s new paper takes a game-theoretic view and develops new metrics to quantify non-cooperative language ♟️

Turns out LLMs don’t have the pragmatic capabilities to perceive these…
asher-zheng.bsky.social
Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.

👉Introducing CoBRA🐍, a framework that assesses strategic language.

Work with my amazing advisors @jessyjli.bsky.social and @David I. Beaver!