Lightnews — Scholar-powered news

David Schlangen

@davidschlangen.bsky.social

370 followers 1.4K following 70 posts

Prof of Computational Linguistics / NLP @ Uni Potsdam, Germany. Working on embodied / multimodal / conversational AI. In a way. Also affiliated w/ DFKI Berlin (German Research Center for AI).

Posts Replies Media Videos

David Schlangen

@davidschlangen.bsky.social

Bonus post advertising this other thread through the medium of "memes" which I've been told is what you have to do on social media.

Scene from the film "wargames", with an added supertitle saying "How can we evaluate LLMs in interactions? Where do we get the interaction purposes from??", to which Matthew Broderick's character answers: "It's games"

Another still from the film, supertitle "But getting people to play games with the computer takes time!", to which our hero answers "Is there any way to make it play itself?"

The famous scene where the computer say "A strange game. The only winning move is" ... well, it now says, ".. to check out the clembench."

July 20, 2025 at 11:44 AM

David Schlangen

@davidschlangen.bsky.social

It's great to see the idea of using games / interactions to evaluate LLMs gain traction, with textarena.ai and now ARC-AGI-3 being latest entrants.
This is something we've been exploring since early 2023 with clembench ( clembench.github.io ), which we've been continuously maintaining & extending. »

July 20, 2025 at 11:21 AM

David Schlangen

@davidschlangen.bsky.social

This was the outcome of a collaboration that started last year at an ELLIS workshop, and that has brought together many labs (and many master's and PhD students, and PIs).

Much more remains to be explored in "learning in interaction" -- maybe by you?

🤖🧠 #NLP #AI #LLM

May 29, 2025 at 8:41 PM

David Schlangen

@davidschlangen.bsky.social

We find that imitation learning through SFT improves performance on unseen game instances, but does not generalise to new games and negatively impacts other skills -- while interactive learning with GRPO shows balanced improvements without loss of skills.

Table 3 from the paper linked in a post below.

May 29, 2025 at 8:41 PM

David Schlangen

@davidschlangen.bsky.social

Playpen is a training environment for post-training LLMs through learning in interaction, by self-play of "dialogue games": goal-oriented language-based activities that generate verifiable rewards.

Diagram showing an interaction triangle "interlocutor A -- world -- interlocutor B", except that this is mediated by GM (the "Game Master"), and that A is a learner wrapped around an LLM, and B also is a wrapper around a (non-learning) LLM.

May 29, 2025 at 8:41 PM

David Schlangen

@davidschlangen.bsky.social

🚨 New pre-print! (Well, new & much improved version in any case.) 🚨
If you're interested in LLM post-training techniques and in how to make LLMs better "language users", read this thread, introducing the "LM Playpen".

May 29, 2025 at 8:41 PM

David Schlangen

@davidschlangen.bsky.social

Nice baseline results as well: learning via SFT from transcripts does a bit, but only "real"(-ish) learning in interaction (GRPO) generalises. (Basically, you want to see the whole row being green in this table.)

2/2

April 15, 2025 at 6:51 PM

David Schlangen

@davidschlangen.bsky.social

Update 2: New pre-print! Outcome of an ELLIS workshop last year, & more than a year of discussions and work, across labs and countries: Meet the Playpen, an environment for exploring learning in dialogic interaction.

arxiv.org/abs/2504.08590

1/2

Titlepage of the paper linked in the post.

April 15, 2025 at 6:51 PM

David Schlangen

@davidschlangen.bsky.social

Update 1: New models added to our dialogue game-based agentic LLM leaderboard. TL;DR: GPT-4.1 as good as 4o, but much cheaper. Llama4 indeed not very good (decisively worse than 3.2 70B!). OLMo decent, but there's still a secret sauce that only closed labs have.

clembench.github.io

Screenshot of leaderboard as linked in post.

April 15, 2025 at 6:35 PM

David Schlangen

@davidschlangen.bsky.social

I'm not on X, so I'll use the opportunity of @karpathy.bsky.social 's post over there to plug our "clembench" project here. We've been doing exactly this--evaluating LLMs w/ conversational games--since early 2023, with several papers out by now (e.g. EMNLP 23).
clembench.github.io

February 3, 2025 at 10:22 AM

David Schlangen

@davidschlangen.bsky.social

I just randomly found this book on my bookshelf. It must have been transported there from an alternate timeline. “20 years of research on agents”? Preposterous! We all know that the very idea of software agents has only been invented last year by the LLM folks!

Photo of book: “The Handbook on Socially Interactive Agents: 20 Years of Research on Embodied Conversational Agents, IVAs, and Social Robotics”. Ed, Lugrin, Pelachaud, Traum. ACM, 2021

January 16, 2025 at 11:50 AM

David Schlangen

@davidschlangen.bsky.social

Looking forward to the first lecture of next year, where I can again use this meme I made a couple of years ago and multiply confuse the students in my "intro to NLP" class. (What is an "LP cover"? Who is that person?)

The cover of Lou Reed's "transformer" LP, with the network diagram of the transformers architecture made to look like the guitar that Lou Reed is holding.

December 20, 2024 at 12:38 PM

David Schlangen

@davidschlangen.bsky.social

Some good news: The world now has one more doctor! Brielen Madureira passed her viva with flying colours (or, as we say in German, summa cum laude). She gave us quite some material to discuss in the viva, ending with the attached theses. (Remote: Luciana Benotti as fantastic examiner.)

Picture of a happy examination committee and the candidate (wearing a silly hat).

Discussion Topics
• Instead of coming from the theory to define a suitable model, we often need to force our phenomena to fit popular machine learning frameworks (and now NLP is framing everything as next token prediction).
• Evaluation in being delegated to LLMs; the step that should bring us understanding and transparency
is becoming as undecipherable as the very problem we need to assess.
• Progress in disseminating Clark’s notion of grounding in NLP stumbles on questions of methodology
in data collection, modelling and evaluation.
• Proper methods are needed to assess what models can do, bearing in mind both the pertinent cognitive
underpinnings and the broad NLP methodology, e.g. by weighing up data and training practices, making representations more interpretable and profiling models’ behaviour, and promoting richer forms of evaluation.
• We should strive to make the development and use of conversational technologies a more “orientable” space, so that they do not erode the social value of dialogue.

November 28, 2024 at 5:45 PM

David Schlangen

@davidschlangen.bsky.social

I've just discovered dynamic backgrounds in Keynote!

From now on my lectures will be, well, probably not at all more interesting, but at least 123.5% more psychedelic!

November 25, 2024 at 7:14 PM

David Schlangen

@davidschlangen.bsky.social

My writing peaked early.

(Also, remarkable constancy in research interests, although one didn’t talk about consciousness too much for most of the intermediate 25 years.)

Title page of what apparently is a term paper for a course called “Cognitive Psychology” in Autumn Term 1999: “What is to be done? Computational advantages of having a consciousness”

1 Introduction
Although it's probably bad practice to open with the conclusion, here it is: The advantage for an organism of having a consciousness is to be able to give a better answer to Lenin's notorious question: "What is to be done?" (Lenin 1902).
And just as the answer to this question was ultimatly of vital importance to the russian Tsar family, giving a good answer to the question "What (do I do) next?" is of similarly vital importance to the organism that asks this question of itself.

November 20, 2024 at 5:33 PM

David Schlangen

@davidschlangen.bsky.social

Random find on my hard drive. Dragomir Radev’s 1995 FAQ on what NLP is, for the comp.ai.nat-lang newsgroup. (Note the sorting under “AI”…)

Screenshot of an email sent in 1995, summary: “This posting contains Frequently Asked Questions (FAQ) about natural language processing and their answers. It should be read by anyone who wishes to post to the comp.ai.nat-lang newsgroup.”

November 20, 2024 at 5:08 PM

David Schlangen

@davidschlangen.bsky.social

Irgendwie wird der Pollesch-Quatsch schon fehlen.

(Das kichersüchtige Pollesch-Volksbühnen-Publikum aber nicht.)

(Diese Google-Auskunft hätte ihm vielleicht gefallen.)

Screenshot einer Infobox von Google: „René Pollesch ist dauerhaft geschlossen.“

November 19, 2024 at 9:16 PM

David Schlangen

@davidschlangen.bsky.social

Ok, I would like to give this thing here a chance. But I can’t until I find out whether there’s a setting that makes this app give me a chronological feed *that keeps my last position*. I can’t handle scrolling back in time.

Tom Gauld cartoon using the Kierkegaard line on how life can only be understood backwards, but must be lived forwards.

October 14, 2023 at 10:51 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news