Lightnews — Scholar-powered news

Reposted by Manya Wadhwa

Arkadiy Saakyan

@asaakyan.bsky.social

N-gram novelty is widely used as a measure of creativity and generalization. But if LLMs produce highly n-gram novel expressions that don’t make sense or sound awkward, should they still be called creative? In a new paper, we investigate how n-gram novelty relates to creativity.

N-gram novelty is widely used to evaluate language models' ability to generate text outside of their training data. More recently, it has also been adopted as a metric for measuring textual creativity. However, theoretical work on creativity suggests that this approach may be inadequate, as it does not account for creativity's dual nature: novelty (how original the text is) and appropriateness (how sensical and pragmatic it is). We investigate the relationship between this notion of creativity and n-gram novelty through 7542 expert writer annotations (n=26) of novelty, pragmaticality, and sensicality via close reading of human and AI-generated text. We find that while n-gram novelty is positively associated with expert writer-judged creativity, ~91% of top-quartile expressions by n-gram novelty are not judged as creative, cautioning against relying on n-gram novelty alone. Furthermore, unlike human-written text, higher n-gram novelty in open-source LLMs correlates with lower pragmaticality. In an exploratory study with frontier close-source models, we additionally confirm that they are less likely to produce creative expressions than humans. Using our dataset, we test whether zero-shot, few-shot, and finetuned models are able to identify creative expressions (a positive aspect of writing) and non-pragmatic ones (a negative aspect). Overall, frontier LLMs exhibit performance much higher than random but leave room for improvement, especially struggling to identify non-pragmatic expressions. We further find that LLM-as-a-Judge novelty scores from the best-performing model were predictive of expert writer preferences.

November 4, 2025 at 3:08 PM

Reposted by Manya Wadhwa

Jenna Russell

@jennarussell.bsky.social

AI is already at work in American newsrooms.

We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea.

Here's what we learned about how AI is influencing local and national journalism:

October 22, 2025 at 3:24 PM

Reposted by Manya Wadhwa

Tuhin Chakrabarty

@tuhinchakr.bsky.social

🚨New paper on AI & copyright

Authors have sued LLM companies for using books w/o permission for model training.

Courts however need empirical evidence of market harm. Our preregistered study exactly addresses this gap.

Joint work w Jane Ginsburg from Columbia Law and @dhillonp.bsky.social 1/n🧵

October 22, 2025 at 4:54 PM

Reposted by Manya Wadhwa

Kyle Mahowald

@kmahowald.bsky.social

UT Austin Linguistics is hiring in computational linguistics!

Asst or Assoc.

We have a thriving group sites.utexas.edu/compling/ and a long proud history in the space. (For instance, fun fact, Jeff Elman was a UT Austin Linguistics Ph.D.)

faculty.utexas.edu/career/170793

🤘

UT Austin Computational Linguistics Research Group – Humans processing computers processing humans processing language

sites.utexas.edu

October 7, 2025 at 8:53 PM

Reposted by Manya Wadhwa

Greg Durrett

@gregdnlp.bsky.social

Find my students and collaborators at COLM this week!

Tuesday morning: @juand-r.bsky.social and @ramyanamuduri.bsky.social 's papers (find them if you missed it!)

Wednesday pm: @manyawadhwa.bsky.social 's EvalAgent

Thursday am: @anirudhkhatry.bsky.social 's CRUST-Bench oral spotlight + poster

October 7, 2025 at 6:03 PM

Manya Wadhwa

@manyawadhwa.bsky.social

Unfortunately I won't be at #COLM2025 this week, but please check out our work being presented by my collaborators/advisors!

If you are interested in evals of open-ended tasks/creativity please reach out and we can schedule a chat! :)

Greg Durrett @gregdnlp.bsky.social · Oct 7

Find my students and collaborators at COLM this week!

Tuesday morning: @juand-r.bsky.social and @ramyanamuduri.bsky.social 's papers (find them if you missed it!)

Wednesday pm: @manyawadhwa.bsky.social 's EvalAgent

Thursday am: @anirudhkhatry.bsky.social 's CRUST-Bench oral spotlight + poster

October 8, 2025 at 12:19 AM

Reposted by Manya Wadhwa

Juan Diego Rodriguez

@juand-r.bsky.social

Excited to present this at #COLM2025 tomorrow! (Tuesday, 11:00 AM poster session)

Juan Diego Rodriguez @juand-r.bsky.social · Apr 16

One of the ways that LLMs can be inconsistent is the "generator-validator gap," where LLMs deem their own answers incorrect.

🎯 We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others!

🧵👇

A visualization of the generator-validator gap, where the LM likelihoods of for the generator and discriminator forms of questions are poorly correlated.

Aligning the validator and generator rankings can fix it!

October 6, 2025 at 8:40 PM

Reposted by Manya Wadhwa

Marzena Karpinska

@markar.bsky.social

Come to talk with us today about the evaluation of long form multilingual generation at the second poster session #COLM2025

📍4:30–6:30 PM / Room 710 – Poster #8

October 7, 2025 at 5:54 PM

Reposted by Manya Wadhwa

Chaitanya Malaviya

@cmalaviya.bsky.social

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses?

Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

June 6, 2025 at 4:32 PM

Reposted by Manya Wadhwa

Elias Stengel-Eskin

@esteng.bsky.social

Extremely excited to announce that I will be joining
@utaustin.bsky.social Computer Science in August 2025 as an Assistant Professor! 🎉

May 5, 2025 at 8:28 PM

Reposted by Manya Wadhwa

Vishakh Padmakumar

@vishakhpk.bsky.social

What does it mean for #LLM output to be novel?
In work w/ johnchen6.bsky.social, Jane Pan, Valerie Chen and He He, we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

April 29, 2025 at 4:35 PM

Reposted by Manya Wadhwa

Juan Diego Rodriguez

@juand-r.bsky.social

How do language models organize concepts and their properties? Do they use taxonomies to infer new properties, or infer based on concept similarities? Apparently, both!

🌟 New paper with my fantastic collaborators @amuuueller.bsky.social and @kanishka.bsky.social

Title: "Characterizing the Role of Similarity in the Property Inferences of Language Models"
Authors: Juan Diego Rodriguez, Aaron Mueller, Kanishka Misra

Left figure: "Given that dogs are daxable, is it true that corgis are daxable?" A language model could answer this either using taxonomic relations, illustrated by a taxonomy dog-corgi, dog-mutt, canine-wolf, etc., or by similarity relations (dogs are more similar to corgis than cats, wolves or shar peis).

Right figure: illustration of the causal model (and an example intervention) for distributed alignment search (DAS), which we used to find a subspace in the network responsible for property inheritance behavior. The bottom nodes are "property", "premise concept (A)" and "conclusion concept (B)" , the middle nodes are "A has property P", "B is a kind of A", and the top node is "B has property P".

November 8, 2024 at 9:40 PM

Reposted by Manya Wadhwa

Kanishka Misra 🌊

@kanishka.bsky.social

If you are at #NAACL2025 @naaclmeeting.bsky.social catch @juand-r.bsky.social presenting our poster on the interplay between similarity and category membership in the property inferences of LMs @ Poster Session 1 on Wednesday!

Or if you're at home like me, read our paper: arxiv.org/abs/2410.22590

Juan Diego Rodriguez @juand-r.bsky.social · Nov 8

How do language models organize concepts and their properties? Do they use taxonomies to infer new properties, or infer based on concept similarities? Apparently, both!

🌟 New paper with my fantastic collaborators @amuuueller.bsky.social and @kanishka.bsky.social

April 28, 2025 at 1:40 AM

Reposted by Manya Wadhwa

Anirudh Khatry

@anirudhkhatry.bsky.social

🚀Meet CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]

April 23, 2025 at 5:00 PM

Manya Wadhwa

@manyawadhwa.bsky.social

Evaluating language model responses on open-ended tasks is hard! 🤔

We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.

EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

April 22, 2025 at 3:04 PM

Reposted by Manya Wadhwa

Juan Diego Rodriguez

@juand-r.bsky.social

One of the ways that LLMs can be inconsistent is the "generator-validator gap," where LLMs deem their own answers incorrect.

🎯 We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others!

🧵👇

April 16, 2025 at 6:03 PM

Reposted by Manya Wadhwa

Juan Diego Rodriguez

@juand-r.bsky.social

1.) [NAACL 25] @kanishka.bsky.social, @amuuueller.bsky.social and I delve into how language models do property inheritance using behavioral and mechanistic analyses.
Thank you, Kanishka and Aaron. I could not have hoped for better collaborators! arxiv.org/abs/2410.22590

[👇 bsky.app/profile/juan...

March 11, 2025 at 10:03 PM

Reposted by Manya Wadhwa

Victor Wang

@victorwang37.bsky.social

LLM judges have become ubiquitous, but valuable signal is often ignored at inference.

We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: 🧵

(w/ Michael J.Q. Zhang, @eunsol.bsky.social)

March 6, 2025 at 2:32 PM

Reposted by Manya Wadhwa

Jessy Li

@jessyjli.bsky.social

Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper, led by Jan Trienes: an interpretable framework for salience analysis in LLMs.

First of all, information salience is a fuzzy concept. So how can we even measure it? (1/6)

February 21, 2025 at 6:15 PM

Reposted by Manya Wadhwa

Chau Minh Pham

@chautmpham.bsky.social

⚠️Current methods for generating instruction-following data fall short for long-range reasoning tasks like narrative claim verification.

We present CLIPPER ✂️, a compression-based pipeline that produces grounded instructions for ~$0.5 each, 34x cheaper than human annotations.

February 21, 2025 at 4:25 PM

Reposted by Manya Wadhwa

Kanishka Misra 🌊

@kanishka.bsky.social

Excited that this got accepted at naacl/@naaclmeeting.bsky.social 2025! Massive kudos to Juan Diego and Aaron for being the best co-authors and colleagues one could ask for! 🙏

Juan Diego Rodriguez @juand-r.bsky.social · Nov 8

How do language models organize concepts and their properties? Do they use taxonomies to infer new properties, or infer based on concept similarities? Apparently, both!

🌟 New paper with my fantastic collaborators @amuuueller.bsky.social and @kanishka.bsky.social

January 23, 2025 at 12:02 AM

Reposted by Manya Wadhwa

Thom Lake

@thomlake.bsky.social

I'm at #Neurips2024 this week!

My work (arxiv.org/abs/2406.17692) w/ @gregdnlp.bsky.social & @eunsol.bsky.social exploring the connection between LLM alignment and response pluralism will be at pluralistic-alignment.github.io Saturday. Drop by to learn more!

December 11, 2024 at 5:39 PM

Reposted by Manya Wadhwa

Nathan Lambert

@natolambert.bsky.social

I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with the help of a totally cracked team, we bring you the fruits of that labor — Tülu 3, an entirely open frontier model post training recipe. We beat Llama 3.1 Instruct.

Thread.

November 21, 2024 at 5:01 PM

Reposted by Manya Wadhwa

Jessy Li

@jessyjli.bsky.social

We at UT Linguistics are hiring for 🔥 2 faculty positions in Computational Linguistics! Assistant or Associate professors, deadline Dec 1.
UT has a super vibrant comp ling & #nlp community!!

Apply here 👉 apply.interfolio.com/158280

November 19, 2024 at 10:45 PM

Reposted by Manya Wadhwa

Akari Asai

@akariasai.bsky.social

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai

November 19, 2024 at 4:30 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news