Lukas Thede
@lukasthede.bsky.social
130 followers 160 following 13 posts
IMPRS-IS PhD Student with Zeynep Akata and Matthias Bethge at the University of Tübingen and Helmholtz Munich, working on continually adapting foundation models.
Posts Media Videos Starter Packs
Reposted by Lukas Thede
lciernik.bsky.social
🎉 Presenting at #ICML2025 tomorrow!
Come and explore how representational similarities behave across datasets :)

📅 Thu Jul 17, 11 AM-1:30 PM PDT
📍 East Exhibition Hall A-B #E-2510

Huge thanks to @lorenzlinhardt.bsky.social, Marco Morik, Jonas Dippel, Simon Kornblith, and @lukasmut.bsky.social!
lukasthede.bsky.social
🚨 Poster at #ICML2025!
How can LLMs really keep up with the world?

Come by E-2405 on July 15th (4:30–7:00pm) to check out WikiBigEdit – our new benchmark to test lifelong knowledge editing in LLMs at scale.

🔗 Real-world updates
📈 500k+ QA edits
🧠 Editing vs. RAG vs. CL
Reposted by Lukas Thede
bethgelab.bsky.social
🧠🤖 We’re hiring a Postdoc in NeuroAI!

Join CRC1233 "Robust Vision" (Uni Tübingen) to build benchmarks & evaluation methods for vision models, bridging brain & AI. Work with top faculty & shape vision research.

Apply: tinyurl.com/3jtb4an6

#NeuroAI #Jobs
Postdoctoral Researcher (m/f/d, E13 TV-L, 100%)
tinyurl.com
Reposted by Lukas Thede
eml-munich.bsky.social
📢 Landed in Nashville🎺 for #CVPR2025! The EML group is presenting 4 exciting papers — come say hi at our poster sessions! More details in the thread — see you there! 🏁🌟
Reposted by Lukas Thede
eml-munich.bsky.social
🚨 Happy to announce that one paper, "Understanding the Limits of Lifelong Knowledge Editing in LLMs", is accepted at #icml2025 ! Congrats to @lukasthede.bsky.social , @confusezius.bsky.social , Matthias Bethge, @zeynepakata.bsky.social , and @tomhartvigsen.bsky.social . 👇 Highlights in the thread
Reposted by Lukas Thede
eml-munich.bsky.social
🎓PhD Spotlight: Jae Myung Kim

We’re thrilled to celebrate Jae Myung Kim, who will defend his PhD on 25th June! 🎉

Jae Myung began his PhD at @unituebingen.bsky.social as part of the ELLIS & IMPRS-IS programs, advised by @zeynepakata.bsky.social and collaborating closely with Cordelia Schmid.
Reposted by Lukas Thede
eml-munich.bsky.social
We’ve landed in Singapore for #ICLR2025!
The EML group is presenting 4 exciting papers — come say hi at our poster sessions! 👇Let's chat!

More details in the thread — see you there! 🌟
lukasthede.bsky.social
10/
This project was a joint effort with amazing collaborators:
👥 @confusezius.bsky.social , Matthias Bethge, @zeynepakata.bsky.social , and @tomhartvigsen.bsky.social
Huge thanks to them for the ideas, feedback, and countless hours that made this work possible. 🙏
lukasthede.bsky.social
8/
🔍 TL;DR:
✅ We release WikiBigEdit - a new large-scale benchmark for real-world factual updates
🚨 Existing editing methods fail to scale
💡 Finetuning + merging is a surprisingly strong baseline
🧩 RAG wins - but with trade-offs
lukasthede.bsky.social
7/
Surprisingly, simple continual finetuning (LoRA) outperforms all editing baselines - at equal inference cost.
And when paired with model merging, performance improves even further over time.
💪 More scalable, more robust, and better retention across time steps.
lukasthede.bsky.social
6/
RAG performs best overall - nearly tripling accuracy on edit and generalization tasks.
But:
⏳ It comes with significantly higher inference cost
🔄 And still struggles with multi-hop reasoning over updated facts
lukasthede.bsky.social
5/
The result? 📉
Most editing methods struggle at scale.
ROME and MEMIT collapse within a few hundred updates.
Even WISE, built for lifelong edits, degrades quickly - converging to pre-edit performance.
➡️ These techniques aren’t yet ready for real-world demands.
lukasthede.bsky.social
4/
We put popular editing methods to the test:
🔧 ROME, MEMIT, WISE
🔁 LoRA finetuning & merging
🔍 Retrieval-augmented generation (RAG)

How do they stack up on update accuracy, reasoning, generalization, and locality?
lukasthede.bsky.social
3/
Unlike synthetic edit datasets, WikiBigEdit tracks real-world knowledge changes over time.

It probes multi-hop reasoning, semantic generalization, and whether new edits interfere with existing knowledge.
And it’s built to continuously grow - for future-proof evaluation.
lukasthede.bsky.social
2/
📣 Introducing WikiBigEdit: a new benchmark for lifelong knowledge editing.

It includes:
📌 500K+ real-world QA pairs based on Wikidata
📆 8 time steps over 6 months (Feb–Jul 2024) and continuously updatable
🧪 Rich evaluations: reasoning, generalization, locality, …
lukasthede.bsky.social
1/
Most LLMs are static snapshots of past knowledge.
But facts change constantly - and retraining is far too costly.
Knowledge editing offers a cheaper fix.
But how far can it actually take us?
We put it to the test - at realistic, deployment-scale.
lukasthede.bsky.social
🧠 Keeping LLMs factually up to date is a common motivation for knowledge editing.

But what would it actually take to support this in practice at the scale and speed the real world demands?

We explore this question and really push the limits of lifelong knowledge editing in the wild.
👇
Reposted by Lukas Thede
eml-munich.bsky.social
Happy to share that we have 4 papers to be presented in the coming #ICLR2025 in the beautiful city of #Singapore . Check out our website for more details: eml-munich.de/publications. We will introduce the talented authors with their papers very soon, stay tuned😉
Reposted by Lukas Thede
fededagos.bsky.social
🚨 New paper alert! 🚨
We’ve just launched openretina, an open-source framework for collaborative retina modeling across datasets and species.
A 🧵👇 (1/9)
Reposted by Lukas Thede
ahochlehnert.bsky.social
CuratedThoughts: Data Curation for RL Datasets 🚀

Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.

Here's why 👇🧵
Reposted by Lukas Thede
bayesiankitten.bsky.social
🔥 #CVPR2025 Submit your cool papers to Workshop on
Emergent Visual Abilities and Limits of Foundation Models 📷📷🧠🚀✨

sites.google.com/view/eval-fo...

Submission Deadline: March 12th!
EVAL-FoMo 2
A Vision workshop on Evaluations and Analysis
sites.google.com
Reposted by Lukas Thede
wielandbrendel.bsky.social
🚀 We’re hiring! Join Bernhard Schölkopf & me at @ellisinsttue.bsky.social to push the frontier of #AI in education!

We’re building cutting-edge, open-source AI tutoring models for high-quality, adaptive learning for all pupils with support from the Hector Foundation.

👉 forms.gle/sxvXbJhZSccr...
Hiring announcement: ELLIS Institute Tübingen is looking for ML Researchers & Engineers for Open-Source AI Tutoring (m/f/d). The image features a white background with bold black text and the colorful ELLIS logo at the bottom.
Reposted by Lukas Thede
joschkastrueber.bsky.social
🚨Great Models Think Alike and this Undermines AI Oversight🚨
New paper quantifies LM similarity
(1) LLM-as-a-judge favor more similar models🤥
(2) Complementary knowledge benefits Weak-to-Strong Generalization☯️
(3) More capable models have more correlated failures 📈🙀
🧵👇