Lightnews — Scholar-powered news

Reposted by Lukas Thede

Laure Ciernik @lciernik.bsky.social · Jul 16

🎉 Presenting at #ICML2025 tomorrow!
Come and explore how representational similarities behave across datasets :)

📅 Thu Jul 17, 11 AM-1:30 PM PDT
📍 East Exhibition Hall A-B #E-2510

Huge thanks to @lorenzlinhardt.bsky.social, Marco Morik, Jonas Dippel, Simon Kornblith, and @lukasmut.bsky.social!

3 9

Lukas Thede @lukasthede.bsky.social · Jul 14

📝 Paper: arxiv.org/abs/2503.05683
🗂️ Dataset: huggingface.co/datasets/luk...
💻 Code: github.com/ExplainableM...

Understanding the Limits of Lifelong Knowledge Editing in LLMs

Keeping large language models factually up-to-date is crucial for deployment, yet costly retraining remains a challenge. Knowledge editing offers a promising alternative, but methods are only tested o...

arxiv.org

Lukas Thede @lukasthede.bsky.social · Jul 14

🚨 Poster at #ICML2025!
How can LLMs really keep up with the world?

Come by E-2405 on July 15th (4:30–7:00pm) to check out WikiBigEdit – our new benchmark to test lifelong knowledge editing in LLMs at scale.

🔗 Real-world updates
📈 500k+ QA edits
🧠 Editing vs. RAG vs. CL

1 1 5

Reposted by Lukas Thede

Bethge Lab @bethgelab.bsky.social · Jul 3

🧠🤖 We’re hiring a Postdoc in NeuroAI!

Join CRC1233 "Robust Vision" (Uni Tübingen) to build benchmarks & evaluation methods for vision models, bridging brain & AI. Work with top faculty & shape vision research.

Apply: tinyurl.com/3jtb4an6

#NeuroAI #Jobs

Postdoctoral Researcher (m/f/d, E13 TV-L, 100%)

tinyurl.com

13 17

Reposted by Lukas Thede

ExplainableML @eml-munich.bsky.social · Jun 11

📢 Landed in Nashville🎺 for #CVPR2025! The EML group is presenting 4 exciting papers — come say hi at our poster sessions! More details in the thread — see you there! 🏁🌟

1 2 7

Reposted by Lukas Thede

ExplainableML @eml-munich.bsky.social · May 14

🚨 Happy to announce that one paper, "Understanding the Limits of Lifelong Knowledge Editing in LLMs", is accepted at #icml2025 ! Congrats to @lukasthede.bsky.social , @confusezius.bsky.social , Matthias Bethge, @zeynepakata.bsky.social , and @tomhartvigsen.bsky.social . 👇 Highlights in the thread

1 2 10

Reposted by Lukas Thede

ExplainableML @eml-munich.bsky.social · May 12

🎓PhD Spotlight: Jae Myung Kim

We’re thrilled to celebrate Jae Myung Kim, who will defend his PhD on 25th June! 🎉

Jae Myung began his PhD at @unituebingen.bsky.social as part of the ELLIS & IMPRS-IS programs, advised by @zeynepakata.bsky.social and collaborating closely with Cordelia Schmid.

1 2 12

Reposted by Lukas Thede

ExplainableML @eml-munich.bsky.social · Apr 22

We’ve landed in Singapore for #ICLR2025!
The EML group is presenting 4 exciting papers — come say hi at our poster sessions! 👇Let's chat!

More details in the thread — see you there! 🌟

1 2 5

Lukas Thede @lukasthede.bsky.social · Apr 8

10/
This project was a joint effort with amazing collaborators:
👥 @confusezius.bsky.social , Matthias Bethge, @zeynepakata.bsky.social , and @tomhartvigsen.bsky.social
Huge thanks to them for the ideas, feedback, and countless hours that made this work possible. 🙏

4

Lukas Thede @lukasthede.bsky.social · Apr 8

9/
📘 Want to test your method at scale?
📄 Paper: arxiv.org/abs/2503.05683
🗂️ Benchmark: huggingface.co/datasets/luk...
💻 Code: github.com/ExplainableM...
Let’s build LLMs that truly stay up to date. 🔄
Excited to see what the community does with this!

Understanding the Limits of Lifelong Knowledge Editing in LLMs

Keeping large language models factually up-to-date is crucial for deployment, yet costly retraining remains a challenge. Knowledge editing offers a promising alternative, but methods are only tested o...

arxiv.org

1 1 5

Lukas Thede @lukasthede.bsky.social · Apr 8

8/
🔍 TL;DR:
✅ We release WikiBigEdit - a new large-scale benchmark for real-world factual updates
🚨 Existing editing methods fail to scale
💡 Finetuning + merging is a surprisingly strong baseline
🧩 RAG wins - but with trade-offs

1

Lukas Thede @lukasthede.bsky.social · Apr 8

7/
Surprisingly, simple continual finetuning (LoRA) outperforms all editing baselines - at equal inference cost.
And when paired with model merging, performance improves even further over time.
💪 More scalable, more robust, and better retention across time steps.

1

Lukas Thede @lukasthede.bsky.social · Apr 8

6/
RAG performs best overall - nearly tripling accuracy on edit and generalization tasks.
But:
⏳ It comes with significantly higher inference cost
🔄 And still struggles with multi-hop reasoning over updated facts

1 1

Lukas Thede @lukasthede.bsky.social · Apr 8

5/
The result? 📉
Most editing methods struggle at scale.
ROME and MEMIT collapse within a few hundred updates.
Even WISE, built for lifelong edits, degrades quickly - converging to pre-edit performance.
➡️ These techniques aren’t yet ready for real-world demands.

2 1

Lukas Thede @lukasthede.bsky.social · Apr 8

4/
We put popular editing methods to the test:
🔧 ROME, MEMIT, WISE
🔁 LoRA finetuning & merging
🔍 Retrieval-augmented generation (RAG)

How do they stack up on update accuracy, reasoning, generalization, and locality?

1 1

Lukas Thede @lukasthede.bsky.social · Apr 8

3/
Unlike synthetic edit datasets, WikiBigEdit tracks real-world knowledge changes over time.

It probes multi-hop reasoning, semantic generalization, and whether new edits interfere with existing knowledge.
And it’s built to continuously grow - for future-proof evaluation.

1 1

Lukas Thede @lukasthede.bsky.social · Apr 8

2/
📣 Introducing WikiBigEdit: a new benchmark for lifelong knowledge editing.

It includes:
📌 500K+ real-world QA pairs based on Wikidata
📆 8 time steps over 6 months (Feb–Jul 2024) and continuously updatable
🧪 Rich evaluations: reasoning, generalization, locality, …

1 1 1

Lukas Thede @lukasthede.bsky.social · Apr 8

1/
Most LLMs are static snapshots of past knowledge.
But facts change constantly - and retraining is far too costly.
Knowledge editing offers a cheaper fix.
But how far can it actually take us?
We put it to the test - at realistic, deployment-scale.

1 1

Lukas Thede @lukasthede.bsky.social · Apr 8

🧠 Keeping LLMs factually up to date is a common motivation for knowledge editing.

But what would it actually take to support this in practice at the scale and speed the real world demands?

We explore this question and really push the limits of lifelong knowledge editing in the wild.
👇

1 8 29

Reposted by Lukas Thede

ExplainableML @eml-munich.bsky.social · Mar 19

Happy to share that we have 4 papers to be presented in the coming #ICLR2025 in the beautiful city of #Singapore . Check out our website for more details: eml-munich.de/publications. We will introduce the talented authors with their papers very soon, stay tuned😉

4 7

Reposted by Lukas Thede

Federico D’Agostino @fededagos.bsky.social · Mar 14

🚨 New paper alert! 🚨
We’ve just launched openretina, an open-source framework for collaborative retina modeling across datasets and species.
A 🧵👇 (1/9)

1 20 38

Reposted by Lukas Thede

Andreas Hochlehnert @ahochlehnert.bsky.social · Feb 17

CuratedThoughts: Data Curation for RL Datasets 🚀

Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.

Here's why 👇🧵

1 9 13

Reposted by Lukas Thede

Ameya P. @bayesiankitten.bsky.social · Feb 12

🔥 #CVPR2025 Submit your cool papers to Workshop on
Emergent Visual Abilities and Limits of Foundation Models 📷📷🧠🚀✨

sites.google.com/view/eval-fo...

Submission Deadline: March 12th!

EVAL-FoMo 2

A Vision workshop on Evaluations and Analysis

sites.google.com

2 3

Reposted by Lukas Thede

Wieland Brendel @wielandbrendel.bsky.social · Feb 11

🚀 We’re hiring! Join Bernhard Schölkopf & me at @ellisinsttue.bsky.social to push the frontier of #AI in education!

We’re building cutting-edge, open-source AI tutoring models for high-quality, adaptive learning for all pupils with support from the Hector Foundation.

👉 forms.gle/sxvXbJhZSccr...

1 14 8

Reposted by Lukas Thede

Joschka Strüber @ICML2025 🇨🇦 @joschkastrueber.bsky.social · Feb 7

🚨Great Models Think Alike and this Undermines AI Oversight🚨
New paper quantifies LM similarity
(1) LLM-as-a-judge favor more similar models🤥
(2) Complementary knowledge benefits Weak-to-Strong Generalization☯️
(3) More capable models have more correlated failures 📈🙀
🧵👇

2 9 20