Lightnews — Scholar-powered news

Cameron Jones @camrobjones.bsky.social · May 8

Totally agree with @seantrott.bsky.social here. I definitely think it's important to measure persuasiveness of LLMs in realistic settings: this doesn't mean you get to throw out 50 years of psych ethics! seantrott.substack.com/p/informed-c...

Informed consent is central to research ethics

On the unauthorized experiment conducted on a subreddit community.

seantrott.substack.com

1 1

Reposted by Cameron Jones

Jim Al-Khalili @jimalkhalili.bsky.social · Apr 3

🧪
Yes, LLMs can now pass the Turing test, but don’t confuse this with AGI, which is a long way off.

arxiv.org/abs/2503.23674

Large Language Models Pass the Turing Test

We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations s...

arxiv.org

6 8 50

Cameron Jones @camrobjones.bsky.social · Apr 1

There's lots more detail in the paper arxiv.org/abs/2503.23674. We also release all of the data (including full anonymized transcripts) for further scrutiny/analysis/to prove this isn't an April Fools joke.

The paper's under review and any feedback would be very welcome!

Large Language Models Pass the Turing Test

We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations s...

arxiv.org

Cameron Jones @camrobjones.bsky.social · Apr 1

Thanks so much to my co-author Ben Bergen, to Sydney Taylor (a former RA who wrote the persona prompt!), to Open Philanthropy and to 12 donors on Manifund who helped to support this work.

1

Cameron Jones @camrobjones.bsky.social · Apr 1

One of the most important aspects of the Turing test is that it's not static: it depends on people's assumptions about other humans and technology. We agree with
@brianchristian.bsky.social that humans could (and should) come back better next year!

1

Cameron Jones @camrobjones.bsky.social · Apr 1

More pressingly, I think the results provide more evidence that LLMs could substitute for people in short interactions without anyone being able to tell. This could potentially lead to automation of jobs, improved social engineering attacks, and more general societal disruption.

1

Cameron Jones @camrobjones.bsky.social · Apr 1

Did LLMs really pass if they needed a prompt? It's a good q. Without any prompt, LLMs would fail for trivial reasons (like admitting to being AI). & they could easily be fine-tuned to behave as they do when prompted. So I do think it's fair to say that LLMs pass.

1

Cameron Jones @camrobjones.bsky.social · Apr 1

Does this mean LLMs are intelligent? I think that's a very complicated question that's hard to address in a paper (or a tweet). But broadly I think this should be evaluated as one among many other pieces of evidence for the kind of intelligence LLMs display.

1

Cameron Jones @camrobjones.bsky.social · Apr 1

Turing is quite vague about exactly how the test should be implemented. As such there are many possible variations (e.g. 2-party, an hour, or with experts). I think this 3-party 5-min version is the mostly widely accepted "standard" test but planning to explore others in future.

1

Cameron Jones @camrobjones.bsky.social · Apr 1

So do LLMs pass the Turing test? We think this is pretty strong evidence that they do. People were no better than chance at distinguishing humans from GPT-4.5 and LLaMa (with the persona prompt). And 4.5 was even judged to be human significantly *more* often than actual humans!

1 1

Cameron Jones @camrobjones.bsky.social · Apr 1

As in previous work, people focused more on linguistic and socioemotional factors in their strategies & reasons. This might suggest people no longer see "classical" intelligence (e.g. math, knowledge, reasoning) as a good way of discriminating people from machines.

1

Cameron Jones @camrobjones.bsky.social · Apr 1

We also tried giving a more basic prompt to the models, without detailed instructions on the persona to adopt. Models performed significantly worse in this condition (highlighting the importance of prompting), but were still indistinguishable from humans in the Prolific study.

1

Cameron Jones @camrobjones.bsky.social · Apr 1

Across 2 studies (on undergrads and Prolific) GPT-4.5 was selected as the human significantly more often than chance (50%). LLaMa was not selected significantly more or less often than humans, suggesting ppts couldn't distinguish it from people. Baselines (ELIZA & GPT-4o) were worse than chance.

1

Cameron Jones @camrobjones.bsky.social · Apr 1

Participants spoke to two "witnesses" at the same time: one human and one AI. Here are some example convos from the study. Can you tell which one is the human? Answers & original interrogator verdicts in the paper...

You can play the game yourself here: turingtest.live

1

Cameron Jones @camrobjones.bsky.social · Apr 1

In previous work we found GPT-4 was judged to be human ~50% of the time in a 2-party Turing test, where ppts speak to *either* a human or a model.

This is probably easier for several reasons. Here we ran a new study with Turing's original 3-party setup

arxiv.org/abs/2503.23674

1

Cameron Jones @camrobjones.bsky.social · Apr 1

New preprint: we evaluated LLMs in a 3-party Turing test (participants speak to a human & AI simultaneously and decide which is which).

GPT-4.5 (when prompted to adopt a humanlike persona) was judged to be the human 73% of the time, suggesting it passes the Turing test (🧵)

1 3 11

Reposted by Cameron Jones

Kyle Mahowald (COLM 2025) @kmahowald.bsky.social · Mar 11

Check it out for cool plots like this about how affinities between words in sentences and how they can show how Green Day isn't like green paint or green tea. And congrats to @coryshain.bsky.social and the CLiMB lab! climblab.org

3 7 24

Reposted by Cameron Jones

Kobi Hackenburg @kobihackenburg.bsky.social · Mar 7

📈Out today in @PNASNews!📈

In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of LLMs yields sharply diminishing persuasive returns for static political messages.

🧵:

1 20 40

Cameron Jones @camrobjones.bsky.social · Mar 7

@yann-lecun.bsky.social at #StandUpForScience NYC in Washington Square Park — “I work on both natural and artificial intelligence, and I think this government could do with a little more intelligence.”

2

Reposted by Cameron Jones

Mason Youngblood @masonyoungblood.bsky.social · Mar 7

#StandUpForScience today! NYC is 12-3 PM EST in Washington Square Park, details about other cities here: standupforscience2025.org

STAND UP FOR SCIENCE

March 7, 2025. Washington DC and nationwide. Because science is for everyone.

standupforscience2025.org

1 5

Reposted by Cameron Jones

Simon Willison @simonwillison.net · Feb 25

Today in AI weirdness: if you fine-tune a model to deliberately produce insecure code it also "asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively" www.emergent-misalignment.com

Emergent Misalignment

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

www.emergent-misalignment.com

6 15 74

Reposted by Cameron Jones

Kevin Lala @kevinlala.bsky.social · Feb 25

Thanks to @kensycoop.bsky.social for this great interview about my book.

We cover domestication syndrome, plasticity-led evolution, soft inheritance, animal traditions, how culture shapes evolution, and more.

Kensy also does a wonderful production job, turning me into a coherent speaker! Thank you

Many Minds podcast @manymindspod.bsky.social · Feb 24

New episode!! 📣📣

A conversation w/ @kevinlala.bsky.social about his new (co-authored) book, ‘Evolution Evolving'!

Ideas about evolution have changed a lot in recent decades. An emerging view—synthesized by Lala et al.—puts developmental processes front and center.

Listen: disi.org/the-developm...

1 8 24

Reposted by Cameron Jones

Carl T. Bergstrom @carlbergstrom.com · Feb 22

Any talk you hear from the current administration about making the US more competitive in science and technology is utter bullshit. What they are doing is sabotaging our country for years if not decades to come.

WESA @wesa.fm · Feb 21

The University of Pittsburgh confirmed Friday that there would be no new Ph.D. offers of admission while Pitt works to understand how reduced federal aid could impact the institution.

The University of Pittsburgh pauses its Ph.D. admissions process amid research funding uncertainty

A spokesperson for the University told WESA Friday that the school has "temporarily paused additional Ph.D. offers of admission," while Pitt works to understand how proposed federal funding cuts could...

www.wesa.fm

36 570 1.9K

Cameron Jones @camrobjones.bsky.social · Feb 16

I wrote up some notes on my trip to the first @IASEAIorg conference—mostly on the importance of "agents", the risks that they might pose, and how/whether we can mitigate them.

camrobjones.substack.com/p/notes-from...

Notes from IASEAI

On agents, ethics, and catastrophic risks

camrobjones.substack.com

Cameron Jones @camrobjones.bsky.social · Feb 10

We've relaunched @turingtestlive with a 3-party format where you speak to a human and an LLM at the same time.

See if you can tell the difference between a human and an AI here: turingtest.live

The Turing Test — Can you tell a human from an AI?

Turingtest.live

3