Qing Yao
@qyao.bsky.social
37 followers 34 following 15 posts
Linguistics PhD student at UT Austin
Posts Media Videos Starter Packs
Reposted by Qing Yao
kmahowald.bsky.social
UT Austin Linguistics is hiring in computational linguistics!

Asst or Assoc.

We have a thriving group sites.utexas.edu/compling/ and a long proud history in the space. (For instance, fun fact, Jeff Elman was a UT Austin Linguistics Ph.D.)

faculty.utexas.edu/career/170793

🤘
UT Austin Computational Linguistics Research Group – Humans processing computers processing humans processing language
sites.utexas.edu
Reposted by Qing Yao
juand-r.bsky.social
Excited to present this at COLM tomorrow! (Tuesday, 11:00 AM poster session)
juand-r.bsky.social
One of the ways that LLMs can be inconsistent is the "generator-validator gap," where LLMs deem their own answers incorrect.

🎯 We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others!

🧵👇
A visualization of the generator-validator gap, where the LM likelihoods of for the generator and discriminator forms of questions are poorly correlated. Aligning the validator and generator rankings can fix it!
Reposted by Qing Yao
siyuansong.bsky.social
Heading to #COLM2025 to present my first paper w/ @jennhu.bsky.social @kmahowald.bsky.social !

When: Tuesday, 11 AM – 1 PM
Where: Poster #75

Happy to chat about my work and topics in computational linguistics & cogsci!

Also, I'm on the PhD application journey this cycle!

Paper info 👇:
siyuansong.bsky.social
New preprint w/ @jennhu.bsky.social @kmahowald.bsky.social : Can LLMs introspect about their knowledge of language?
Across models and domains, we did not find evidence that LLMs have privileged access to their own predictions. 🧵(1/8)
Reposted by Qing Yao
sashaboguraev.bsky.social
I will be giving a short talk on this work at the COLM Interplay workshop on Friday (also to appear at EMNLP)!

Will be in Montreal all week and excited to chat about LM interpretability + its interaction with human cognition and ling theory.
sashaboguraev.bsky.social
A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models.

New work with @kmahowald.bsky.social and @cgpotts.bsky.social!

🧵👇!
Reposted by Qing Yao
kanishka.bsky.social
The compling group at UT Austin (sites.utexas.edu/compling/) is looking for PhD students!

Come join me, @kmahowald.bsky.social, and @jessyjli.bsky.social as we tackle interesting research questions at the intersection of ling, cogsci, and ai!

Some topics I am particularly interested in:
Picture of the UT Tower with "UT Austin Computational Linguistics" written in bigger font, and "Humans processing computers processing human processing language" in smaller font
qyao.bsky.social
LMs’ dative alternation preferences come from both direct evidence and more general properties of language. They don’t just memorize–they generalize! See the paper for details on animacy too (interestingly more complicated!)
qyao.bsky.social
Learned length preference changes with the input manipulation. That is, the more “long-first” we make the input, the weaker the short-first preference. We think this shows the dative preferences in models come not just from datives but from general properties of English.
LMs' length preference vs. perplexity on validation set. We see that models whose training set manipulation reduces exposure to short-first orderings are the ones which have weaker short-first preference.
qyao.bsky.social
For example, “The primates use tools to eat the green coconuts from the shop” becomes:
-Short-first: [tools] use [the primates] [[to] eat [[the] [green] coconuts [from the shop]]]
-Long-first: [[[from the shop] [the] coconuts [green]] eat [to]] use [the primates] [tools]
qyao.bsky.social
We think it plausibly comes not from the datives alone but from general properties of English (which is “short-first”). To test that, we manipulate the global structure of the input, creating a corpus where every sentence is short-first and one where they’re all long-first.
qyao.bsky.social
Now what if we get rid of datives, and further all constructions which have two postverbal arguments? Now we see the length preference is back again. Yes it’s smaller (direct evidence matters), but why is it there? Where does it come from if not the datives?
DO preference vs. length difference when we remove all datives (left) and all cases with 2 post-verbal arguments (right). The pearson correlation, r is now -0.24 for the no-datives condition, and -0.22 for no cases with 2postverbal arguments.
qyao.bsky.social
What if we modify the corpus such that for every DO there is a PO (balance direct evidence)? The preferences are still present! But what if now we SWAP every dative in the input so that every DO is now a PO, every PO a DO? The preference essentially disappears (but not flipped!)
DO preference vs. length difference for the balanced and swapped-datives manipulations. Left: balanced, pearson correlation r = -0.33; right: swapped-datives, pearson correlation r = -0.03.
qyao.bsky.social
To test this, we train small LMs on manipulated datasets where we vary direct (datives) and indirect (non-datives) evidence and test the change in their preferences. First, we see that we get human-like preferences on a model trained on our default BabyLM corpus.
left: plot showing DO preference vs. Human Judgments – Pearson’s r = 0.5; right: plot showing the DO preference as a function of (log) length difference between the recipient and the theme, with pearson’s r = -0.43, where the negative sign indicates short-first is preferred
qyao.bsky.social
The English dative preferences come from more general features of the language: short constituents tend to appear earlier all over, not just in the dative. We hypothesize LMs rely on direct evidence from datives but also general word order preferences (e.g. “easy first”) from non-datives.
qyao.bsky.social
LMs learn argument-based preferences for dative constructions (preferring recipient first when it’s shorter), consistent with humans. Is this from memorizing preferences in training? New paper w/ @kanishka.bsky.social , @weissweiler.bsky.social , @kmahowald.bsky.social

arxiv.org/abs/2503.20850
examples from direct and prepositional object datives with short-first and long-first word orders: 
DO (long first): She gave the boy who signed up for class and was excited it.
PO (short first): She gave it to the boy who signed up for class and was excited.
DO (short first): She gave him the book that everyone was excited to read.
PO (long-first): She gave the book that everyone was excited to read to him.
qyao.bsky.social
For example, “The primates use tools to eat the green coconuts from the shop” becomes:
- Short-first: [tools] use [the primates] [[to] eat [[the] [green] coconuts [from the shop]]]
- Long-first: [[[from the shop] [the] coconuts [green]] eat [to]] use [the primates] [tools]
qyao.bsky.social
We think it plausibly comes not from the datives alone but from general properties of English (which is “short-first”). To test that, we manipulate the global structure of the input, creating a corpus where every sentence is short-first and one where they’re all long-first.
qyao.bsky.social
Now what if we get rid of datives, and further all constructions which have two postverbal arguments? Now we see the length preference is back again. Yes it’s smaller (direct evidence matters), but why is it there? Where does it come from if not the datives?
DO preference vs. length difference when we remove all datives (left) and all cases with 2 post-verbal arguments (right). The pearson correlation, r is now -0.24 for the no-datives condition, and -0.22 for no cases with 2postverbal arguments.
qyao.bsky.social
What if we modify the corpus such that for every DO there is a PO (balance direct evidence)? The preferences are still present! But what if now we SWAP every dative in the input so that every DO is now a PO, every PO a DO? The preference essentially disappears (but not flipped!)
DO preference vs. length difference for the balanced and swapped-datives manipulations. Left: balanced, pearson correlation r = -0.33; right: swapped-datives, pearson correlation r = -0.03.
qyao.bsky.social
To test this, we train small LMs on manipulated datasets where we vary direct (datives) and indirect (non-datives) evidence and test the change in their preferences. First, we see that we get human-like preferences on a model trained on our default BabyLM corpus.
left: plot showing DO preference vs. Human Judgments – Pearson’s r = 0.5; right: plot showing the DO preference as a function of (log) length difference between the recipient and the theme, with pearson’s r = -0.43, where the negative sign indicates short-first is preferred
qyao.bsky.social
The English dative preferences come from more general features of the language: short constituents tend to appear earlier all over, not just in the dative. We hypothesize LMs rely on direct evidence from datives but also general word order preferences (e.g. “easy first”) from non-datives.