David Schlangen
davidschlangen.bsky.social
David Schlangen
@davidschlangen.bsky.social
Prof of Computational Linguistics / NLP @ Uni Potsdam, Germany. Working on embodied / multimodal / conversational AI. In a way. Also affiliated w/ DFKI Berlin (German Research Center for AI).
Bonus post advertising this other thread through the medium of "memes" which I've been told is what you have to do on social media.
July 20, 2025 at 11:44 AM
It's great to see the idea of using games / interactions to evaluate LLMs gain traction, with textarena.ai and now ARC-AGI-3 being latest entrants.
This is something we've been exploring since early 2023 with clembench ( clembench.github.io ), which we've been continuously maintaining & extending. »
July 20, 2025 at 11:21 AM
This was the outcome of a collaboration that started last year at an ELLIS workshop, and that has brought together many labs (and many master's and PhD students, and PIs).

Much more remains to be explored in "learning in interaction" -- maybe by you?

🤖🧠 #NLP #AI #LLM
May 29, 2025 at 8:41 PM
We find that imitation learning through SFT improves performance on unseen game instances, but does not generalise to new games and negatively impacts other skills -- while interactive learning with GRPO shows balanced improvements without loss of skills.
May 29, 2025 at 8:41 PM
Playpen is a training environment for post-training LLMs through learning in interaction, by self-play of "dialogue games": goal-oriented language-based activities that generate verifiable rewards.
May 29, 2025 at 8:41 PM
🚨 New pre-print! (Well, new & much improved version in any case.) 🚨
If you're interested in LLM post-training techniques and in how to make LLMs better "language users", read this thread, introducing the "LM Playpen".
May 29, 2025 at 8:41 PM
Nice baseline results as well: learning via SFT from transcripts does a bit, but only "real"(-ish) learning in interaction (GRPO) generalises. (Basically, you want to see the whole row being green in this table.)

2/2
April 15, 2025 at 6:51 PM
Update 2: New pre-print! Outcome of an ELLIS workshop last year, & more than a year of discussions and work, across labs and countries: Meet the Playpen, an environment for exploring learning in dialogic interaction.

arxiv.org/abs/2504.08590

1/2
April 15, 2025 at 6:51 PM
Update 1: New models added to our dialogue game-based agentic LLM leaderboard. TL;DR: GPT-4.1 as good as 4o, but much cheaper. Llama4 indeed not very good (decisively worse than 3.2 70B!). OLMo decent, but there's still a secret sauce that only closed labs have.

clembench.github.io
April 15, 2025 at 6:35 PM
I'm not on X, so I'll use the opportunity of @karpathy.bsky.social 's post over there to plug our "clembench" project here. We've been doing exactly this--evaluating LLMs w/ conversational games--since early 2023, with several papers out by now (e.g. EMNLP 23).
clembench.github.io
February 3, 2025 at 10:22 AM
I just randomly found this book on my bookshelf. It must have been transported there from an alternate timeline. “20 years of research on agents”? Preposterous! We all know that the very idea of software agents has only been invented last year by the LLM folks!
January 16, 2025 at 11:50 AM
Looking forward to the first lecture of next year, where I can again use this meme I made a couple of years ago and multiply confuse the students in my "intro to NLP" class. (What is an "LP cover"? Who is that person?)
December 20, 2024 at 12:38 PM
Some good news: The world now has one more doctor! Brielen Madureira passed her viva with flying colours (or, as we say in German, summa cum laude). She gave us quite some material to discuss in the viva, ending with the attached theses. (Remote: Luciana Benotti as fantastic examiner.)
November 28, 2024 at 5:45 PM
I've just discovered dynamic backgrounds in Keynote!

From now on my lectures will be, well, probably not at all more interesting, but at least 123.5% more psychedelic!
November 25, 2024 at 7:14 PM
My writing peaked early.

(Also, remarkable constancy in research interests, although one didn’t talk about consciousness too much for most of the intermediate 25 years.)
November 20, 2024 at 5:33 PM
Random find on my hard drive. Dragomir Radev’s 1995 FAQ on what NLP is, for the comp.ai.nat-lang newsgroup. (Note the sorting under “AI”…)
November 20, 2024 at 5:08 PM
Irgendwie wird der Pollesch-Quatsch schon fehlen.

(Das kichersüchtige Pollesch-Volksbühnen-Publikum aber nicht.)

(Diese Google-Auskunft hätte ihm vielleicht gefallen.)
November 19, 2024 at 9:16 PM
Ok, I would like to give this thing here a chance. But I can’t until I find out whether there’s a setting that makes this app give me a chronological feed *that keeps my last position*. I can’t handle scrolling back in time.
October 14, 2023 at 10:51 AM