Lightnews — Scholar-powered news

Ben Lipkin

@benlipkin.bsky.social

1K followers 280 following 25 posts

phd @ mit, research @ genlm, intern @ apple https://benlipkin.github.io/

benlipkin.github.io

Posts Media Videos Starter Packs

Pinned

Ben Lipkin @benlipkin.bsky.social · May 13

Many LM applications may be formulated as text generation conditional on some (Boolean) constraint.

Generate a…
- Python program that passes a test suite.
- PDDL plan that satisfies a goal.
- CoT trajectory that yields a positive reward.
The list goes on…

How can we efficiently satisfy these? 🧵👇

Reposted by Ben Lipkin

Greta Tuckute @gretatuckute.bsky.social · Aug 19

Humans largely learn language through speech. In contrast, most LLMs learn from pre-tokenized text.

In our #Interspeech2025 paper, we introduce AuriStream: a simple, causal model that learns phoneme, word & semantic information from speech.

Poster P6, tomorrow (Aug 19) at 1:30 pm, Foyer 2.2!

Reposted by Ben Lipkin

Hope Kean @hopekean.bsky.social · Aug 3

Is the Language of Thought == Language? A Thread 🧵
New Preprint (link: tinyurl.com/LangLOT) with @alexanderfung.bsky.social, Paris Jaggers, Jason Chen, Josh Rule, Yael Benn, @joshtenenbaum.bsky.social, ‪@spiantado.bsky.social‬, Rosemary Varley, @evfedorenko.bsky.social
1/8

Evidence from Formal Logical Reasoning Reveals that the Language of Thought is not Natural Language

Humans are endowed with a powerful capacity for both inductive and deductive logical thought: we easily form generalizations based on a few examples and draw conclusions from known premises. Humans al...

Reposted by Ben Lipkin

Thomas Hikaru Clark @thomashikaru.bsky.social · Jul 31

1/7 If you're at CogSci 2025, I'd love to see you at my talk on Friday 1pm PDT in Nob Hill A! I'll be talking about our work towards an implemented computational model of noisy-channel comprehension (with @postylem.bsky.social, Ted Gibson, and @rplevy.bsky.social).

Ben Lipkin @benlipkin.bsky.social · May 13

Great question! We have not directly compared. SMC offers a test-time approach to steer off-the-shelf models without additional training, whereas diffusion forcing trains autoregressive models to more effectively sample from the global target. These strategies can actually likely be combined.

Ben Lipkin @benlipkin.bsky.social · May 13

And check out the following papers, which set the technical landscape for this work to build on:

Lew et al (2023): arxiv.org/abs/2306.03081
Loula et al (2025): arxiv.org/abs/2504.13139

Ben Lipkin @benlipkin.bsky.social · May 13

Thanks to Ben LeBrun, @vigly.bsky.social bsky.social, @joaoloula.bsky.social sky.social, @drmaciver.bsky.social aciver.bsky.social, Li Du, Jason Eisner, Ryan Cotterell, Vikash Mansinghka, Tim O'Donnell, @alexlew.bsky.social ky.social, @xtimv.bsky.social

Paper Link: arxiv.org/abs/2504.05410

Ben Lipkin @benlipkin.bsky.social · May 13

Want to use AWRS SMC?

Check out the GenLM control library: github.com/genlm/genlm-...

GenLM supports not only grammars, but arbitrary programmable constraints from type systems to simulators.

If you can write a Python function, you can control your language model!

Ben Lipkin @benlipkin.bsky.social · May 13

Why does AWRS work?

Formal and empirical runtime analyses tell a fascinating story.

AWRS scales adaptively with the KL divergence between the conditional and base token-level models.

As your LM better understands the constraint, AWRS gets faster.

As the LM struggles, AWRS closes the gap.

Ben Lipkin @benlipkin.bsky.social · May 13

We tested AWRS SMC on several controlled generation tasks, from text-to-SQL to PDDL goal inference to molecular synthesis.

AWRS SMC outperforms baselines by large margins, e.g., see the jump from 3% -> 53% in the goal inference domain with only ~2.5x clock time overhead.

Ben Lipkin @benlipkin.bsky.social · May 13

Next, SMC uses the proposed extensions and corresponding weights from AWRS to update importance weights associated with partial sequences (particles).

These particles are resampled proportional to their weights, re-allocating computation towards the most promising sequences.

Ben Lipkin @benlipkin.bsky.social · May 13

First, AWRS reformulates the token-level inference problem from exact enumeration to adaptive rejection sampling.

This process yields equivalently distributed samples at a fraction of the cost.

AWRS then estimates and propagates an importance weight alongside these samples.

Ben Lipkin @benlipkin.bsky.social · May 13

So, what can we do?

AWRS SMC is a hierarchical inference framework based on sequential Monte Carlo using a novel stochastic proposal algorithm.

By jointly considering local and global signals, AWRS SMC is both probabilistically sound and sample efficient.

How does it work?

Ben Lipkin @benlipkin.bsky.social · May 13

Problem B: LCD distorts the distribution.

Consider this simple LM over the tokens `a` and `b` with the constraint that “strings must end with `a`”.

While the distribution on complete strings favors `ba`, autoregressive sampling will favor `ab`.

We don’t want this.

Ben Lipkin @benlipkin.bsky.social · May 13

Problem A: Token masking is often slow.

Must classify all 100,000+ tokens in the vocab at each step.

While regular and context-free grammars support low-overhead solutions using tools like Outlines (dottxtai.bsky.social), open-ended constraint enforcement has been harder.

Ben Lipkin @benlipkin.bsky.social · May 13

Approach 2: Locally constrained decoding (LCD).

At each step, mask the next-token distribution to prevent violations.

Pros: All samples are constraint-satisfying.
Cons: A) Masking a large vocabulary is slow. B) LCD distorts the sampled distribution.

Example:

Ben Lipkin @benlipkin.bsky.social · May 13

Approach 1: Sample-verify/Best-of-N.

Draw 𝑁 strings from the LM and use the constraint to rank/filter.

Pros: Samples 𝑥 ∝ 𝑃 as 𝑁 grows.
Cons: 𝑁 required to get a target sample scales exp(KL[𝑃||𝑄]). For difficult constraints, this becomes infeasible.

Example:

Ben Lipkin @benlipkin.bsky.social · May 13

Consider a prompted language model 𝑄 (a prior) and a constraint function 𝐶 (a likelihood).

Our goal is to sample a string 𝑥 from the conditional distribution 𝑃 = 𝑄(·|𝐶(𝑥)=1) (the target posterior).

How do people do this now, and why do current approaches fall short?

Ben Lipkin @benlipkin.bsky.social · May 13

Many LM applications may be formulated as text generation conditional on some (Boolean) constraint.

Generate a…
- Python program that passes a test suite.
- PDDL plan that satisfies a goal.
- CoT trajectory that yields a positive reward.
The list goes on…

How can we efficiently satisfy these? 🧵👇

Reposted by Ben Lipkin

João Loula @joaoloula.bsky.social · Apr 25

#ICLR2025 Oral

How can we control LMs using diverse signals such as static analyses, test cases, and simulations?

In our paper “Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo” (w/ @benlipkin.bsky.social,
@alexlew.bsky.social, @xtimv.bsky.social) we:

Reposted by Ben Lipkin

Kyle Mahowald (COLM 2025) @kmahowald.bsky.social · Apr 21

I might be able to hire a postdoc for this fall in computational linguistics at UT Austin. Topics in the general LLM + cognitive space (particularly reasoning, chain of thought, LLMs + code) and LLM + linguistic space. If this could be of interest, feel free to get in touch!

Reposted by Ben Lipkin

JHU Computer Science @jhucompsci.bsky.social · Apr 21

Jason Eisner & Li Du’s “Syntactic and semantic control of large language models via sequential Monte Carlo” with @joaoloula.bsky.social, @benlipkin.bsky.social, @yahyaemara.bsky.social, @alexlew.bsky.social, @xtimv.bsky.social, & more presents an architecture for controlled LM generation: (11/12)

Syntactic and Semantic Control of Large Language Models via...

A wide range of LM applications require generating text that conforms to syntactic or semantic constraints. Imposing such constraints can be naturally framed as _probabilistic conditioning_, but...

Reposted by Ben Lipkin

Colton Casto @coltoncasto.bsky.social · Apr 21

New paper! 🧠 **The cerebellar components of the human language network**

with: @hsmall.bsky.social @moshepoliak.bsky.social @gretatuckute.bsky.social @benlipkin.bsky.social @awolna.bsky.social @aniladmello.bsky.social and @evfedorenko.bsky.social

www.biorxiv.org/content/10.1...

1/n 🧵

The cerebellar components of the human language network

The cerebellum's capacity for neural computation is arguably unmatched. Yet despite evidence of cerebellar contributions to cognition, including language, its precise role remains debated. Here, we sy...

www.biorxiv.org

Ben Lipkin @benlipkin.bsky.social · Apr 10

Big thanks to this awesome team: Ben LeBrun, @postylem.bsky.social, @joaoloula.bsky.social, @drmaciver.bsky.social, Li Du, Jason Eisner, Ryan Cotterell, Vikash Mansinghka, Tim O'Donnell, @alexlew.bsky.social, @xtimv.bsky.social

Ben Lipkin @benlipkin.bsky.social · Apr 10

Link: arxiv.org/abs/2504.05410

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is n...

Ben Lipkin @benlipkin.bsky.social · Apr 10

New preprint on controlled generation from LMs!

I'll be presenting at NENLP tomorrow 12:50-2:00pm

Longer thread coming soon :)