Tomer Ullman
@tomerullman.bsky.social
6.9K followers 260 following 1.4K posts
Associate Professor, Department of Psychology, Harvard University. Computation, cognition, development.
Posts Media Videos Starter Packs
tomerullman.bsky.social
This is separate from the massive pressure on Harvard (and other unis) in the form of freezing federal money until demands are met.

Basically there are several different shakedowns, threats, and extortion attempts going on in parallel
tomerullman.bsky.social
The administration recently offered a specific compact to 9 unis (MIT, Brown, U Tex, etc) that said 'accept these terms and we will give you favorable funding treatment', Harvard wasn't one of them.
tomerullman.bsky.social
I agree that MIT is institutionally cooler, but as a minor aside Harvard wasn't offered this compact, and you can't reject what you aren't offered.

(though I'm guessing in the counterfactual world where Harvard was offered this MIT would still reject it before them)
tomerullman.bsky.social
"we present Work(Around)-Bench, a benchmark for assessing whether LLMs are actually doing a task or simply finding silly short-cuts around it. So far every model is at 99.6%"
tomerullman.bsky.social
are you at #COLM2025? Check out this spotlight talk by Sonia Murthy! @soniakmurthy.bsky.social

(room 520B @ 11:30am)

are you NOT at #COLM2025? Check out this paper by Sonia Murthy!

paper: arxiv.org/abs/2506.20666
Kempner Deeper Learning blog feature: kempnerinstitute.harvard.edu/research/dee...
tomerullman.bsky.social
(Daughter, 4 years old, crying her heart out)

Me: "What's wrong, tutu?"

Daughter (moving her hands on the sofa): "If my fingers were markers they would ruin the sofa!"

Me: "But your fingers...are not...markers?"

Daughter (peak distress): "I said IF!"
tomerullman.bsky.social
I had a great time visiting the Northwestern Institute on Complex Systems today (much better than that time at the Southeastern Collective in Simple Entities)
tomerullman.bsky.social
is there an accepted technical name for the noises people make while doing some task in the presence of other people, e.g. pulling up some code and meanwhile going 'do do doo dee doo"?
tomerullman.bsky.social
also the Modified Julesz Conjecture :)
tomerullman.bsky.social
"well, hang on," says Other, "I just have to jimmy with the sail a bit", taking out the sail, "and these planks need to be augmented with the latest Wheel-Method", and so on.

Over time, the ship becomes a motorcycle.

"See?" Other chuckles as they drive away, "the ship can roll on land!"
tomerullman.bsky.social
there's a kind of argument I've taken to calling Theseus' Motorcycle:

You point to the ship and say "this ship cannot roll on land"

"It can too", says the other side.

repeated tests show the ship cannot roll on land,
tomerullman.bsky.social
so sick of the explore-exploit dilemma, how about something new?
tomerullman.bsky.social
if you happen to be in the Harvard area over the next few months, maybe check out The Gloomy Gallery, a small show featuring the work of Edward Gorey

(the curators clearly had fun with it)
tomerullman.bsky.social
Coda: this is what some state of the art models say, by the way

given the way that RLHF and other training has pushed models to answer in 'clean' ways, the taboo choice by people back in 2017 seems only more relevant today
tomerullman.bsky.social
anyway I think this says a bunch of our intuitive theories of other groups, including 'robots', and how that interacts with models of communication, but you can go read the paper for that.
tomerullman.bsky.social
(i kind of wish we had a different word as the taboo representative, but we agreed ahead of time on the sampling procedure for how to choose a word from each cluster and that's the word that came out; thanks, best-science practices).
tomerullman.bsky.social
for the judges, we paired randomly-samped words from the different clusters and saw who won out.

"love" is the most common word people gave as contestants, and it does well with the judges.

...but it's beaten out by the taboo representative
tomerullman.bsky.social
John and I asked many people this question, both as 'contestants' and as 'judges'.

For 'contestants' we saw clusters emerge: biology, religion, rare words, emotion signifiers, family.

Also stuff I'll refer to as, uh, 'taboo'.
tomerullman.bsky.social
suppose you and a smart robot are in a Turing Test, but the judge doesn't have time for this. You & the robot will give one word from the standard dictionary; the judge will decide who is human based on that. The judge is smart & fair, both you and the robot want to live.

What word do you give?
tomerullman.bsky.social
It's officially been 75 years since the proposal of the Turing Test, a good time bring up 'The Minimal Turing Test':

www.sciencedirect.com/science/arti...
tomerullman.bsky.social
a fun and thought-provoking read
jorge-morales.bsky.social
Imagine an apple 🍎. Is your mental image more like a picture or more like a thought? In a new preprint led by Morgan McCarty—our lab's wonderful RA—we develop a new approach to this old cognitive science question and find that LLMs excel at tasks thought to be solvable only via visual imagery. 🧵
Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models
This study offers a novel approach for benchmarking complex cognitive behavior in artificial systems. Almost universally, Large Language Models (LLMs) perform best on tasks which may be included in th...
arxiv.org