Tomer Ullman
@tomerullman.bsky.social
6.9K followers 260 following 1.4K posts
Associate Professor, Department of Psychology, Harvard University. Computation, cognition, development.
Posts Media Videos Starter Packs
tomerullman.bsky.social
is there an accepted technical name for the noises people make while doing some task in the presence of other people, e.g. pulling up some code and meanwhile going 'do do doo dee doo"?
Reposted by Tomer Ullman
tomerullman.bsky.social
also the Modified Julesz Conjecture :)
tomerullman.bsky.social
"well, hang on," says Other, "I just have to jimmy with the sail a bit", taking out the sail, "and these planks need to be augmented with the latest Wheel-Method", and so on.

Over time, the ship becomes a motorcycle.

"See?" Other chuckles as they drive away, "the ship can roll on land!"
tomerullman.bsky.social
there's a kind of argument I've taken to calling Theseus' Motorcycle:

You point to the ship and say "this ship cannot roll on land"

"It can too", says the other side.

repeated tests show the ship cannot roll on land,
tomerullman.bsky.social
so sick of the explore-exploit dilemma, how about something new?
tomerullman.bsky.social
if you happen to be in the Harvard area over the next few months, maybe check out The Gloomy Gallery, a small show featuring the work of Edward Gorey

(the curators clearly had fun with it)
tomerullman.bsky.social
Coda: this is what some state of the art models say, by the way

given the way that RLHF and other training has pushed models to answer in 'clean' ways, the taboo choice by people back in 2017 seems only more relevant today
tomerullman.bsky.social
anyway I think this says a bunch of our intuitive theories of other groups, including 'robots', and how that interacts with models of communication, but you can go read the paper for that.
tomerullman.bsky.social
(i kind of wish we had a different word as the taboo representative, but we agreed ahead of time on the sampling procedure for how to choose a word from each cluster and that's the word that came out; thanks, best-science practices).
tomerullman.bsky.social
for the judges, we paired randomly-samped words from the different clusters and saw who won out.

"love" is the most common word people gave as contestants, and it does well with the judges.

...but it's beaten out by the taboo representative
tomerullman.bsky.social
John and I asked many people this question, both as 'contestants' and as 'judges'.

For 'contestants' we saw clusters emerge: biology, religion, rare words, emotion signifiers, family.

Also stuff I'll refer to as, uh, 'taboo'.
tomerullman.bsky.social
suppose you and a smart robot are in a Turing Test, but the judge doesn't have time for this. You & the robot will give one word from the standard dictionary; the judge will decide who is human based on that. The judge is smart & fair, both you and the robot want to live.

What word do you give?
tomerullman.bsky.social
It's officially been 75 years since the proposal of the Turing Test, a good time bring up 'The Minimal Turing Test':

www.sciencedirect.com/science/arti...
tomerullman.bsky.social
a fun and thought-provoking read
jorge-morales.bsky.social
Imagine an apple 🍎. Is your mental image more like a picture or more like a thought? In a new preprint led by Morgan McCarty—our lab's wonderful RA—we develop a new approach to this old cognitive science question and find that LLMs excel at tasks thought to be solvable only via visual imagery. 🧵
Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models
This study offers a novel approach for benchmarking complex cognitive behavior in artificial systems. Almost universally, Large Language Models (LLMs) perform best on tasks which may be included in th...
arxiv.org
tomerullman.bsky.social
daughter (9) with a contender for the greatest opening line of the past century
tomerullman.bsky.social
A: we're not sure, but: using a synthetic data-set with millions of paired "this side" examples vs. "that side" examples, and taking the difference in activations between them, we've created a specific steering vector that can move the model from one side to the other. give us 50 million dollars.
tomerullman.bsky.social
A: we don't know and you should be very very scared about that; suppose we didn't want it to cross the road? the biggest issue right now is making sure these models are aligned with chicken-like crossing preferences
tomerullman.bsky.social
A: We think it is using in-context crossing; as N (the number of examples of road-crossing in prompt) grows, the probability of generating road crossing increases in sigmoid fashion, suggest an growing probability of the "crossing" concept.
tomerullman.bsky.social
A: using mech interp, we've isolated activations having to do with what we're calling "road", "road-side", and "cross-other"; we can see the information flow from one to the other as the network combines and coordinates what we think is the crossing algorithm
tomerullman.bsky.social
Q: Why did the LLM cross the road?

A: We're not sure, but it achieved 94.7% on CHIKENBench-Large