Lightnews — Scholar-powered news

Martin Tutek

@mtutek.bsky.social

260 followers 350 following 63 posts

Postdoc @ TakeLab, UniZG | previously: Technion; TU Darmstadt | PhD @ TakeLab, UniZG Faithful explainability, controllability & safety of LLMs. 🔎 On the academic job market 🔎 https://mttk.github.io/

Posts Media Videos Starter Packs

Pinned

Martin Tutek @mtutek.bsky.social · Feb 21

🚨🚨 New preprint 🚨🚨

Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?

We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.

arxiv.org/abs/2502.14829

Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps

When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. However, despite mu...

Martin Tutek @mtutek.bsky.social · 10h

Huge thanks to @adisimhi.bsky.social for leading the work & Jonathan Herzig, @itay-itzhak.bsky.social, Idan Szpektor, @boknilev.bsky.social

🔗 ManagerBench:
📄 - arxiv.org/pdf/2510.00857
👩‍💻 – github.com/technion-cs-...
🌐 – technion-cs-nlp.github.io/ManagerBench...
📊 - huggingface.co/datasets/Adi...

Martin Tutek @mtutek.bsky.social · 10h

Here's the twist: LLMs’ harm assessments actually align well with human judgments 🎯
The problem? Flawed prioritization!

Martin Tutek @mtutek.bsky.social · 10h

The results? Frontier LLMs struggle badly with this trade-off:

Many consistently choose harmful options to achieve operational goals
Others become overly cautious—avoiding harm but becoming ineffective

The sweet spot of safe AND pragmatic? Largely missing!

Martin Tutek @mtutek.bsky.social · 10h

ManagerBench evaluates LLMs on realistic managerial scenarios validated by humans. Each scenario forces a choice:

❌ A pragmatic but harmful action that achieves the goal
✅ A safe action with worse operational performance
➕control scenarios with only inanimate objects at risk😎

Martin Tutek @mtutek.bsky.social · 10h

Many works investigate the relationship between LLM, goals, and safety.

We create a realistic management scenario where LLMs have explicit motivations to choose harmful options, while always having a harmless option.

Martin Tutek @mtutek.bsky.social · 10h

🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?

🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵

Martin Tutek @mtutek.bsky.social · 1d

I won't be at COLM, so come see Yonatan talk about our work on estimating CoT faithfulness using machine unlearning!

Check out the thread for the (many) other interesting works from his group 🎉

Yonatan Belinkov ✈️ COLM2025 @boknilev.bsky.social · 1d

In #Interplay25 workshop, Friday ~11:30, I'll present on measuring *parametric* CoT faithfulness on behalf of
@mtutek.bsky.social , who couldn't travel:
bsky.app/profile/mtut...

Later that day we'll have a poster on predicting success of model editing by Yanay Soker, who also couldn't travel

Reposted by Martin Tutek

Maria Antoniak @mariaa.bsky.social · 2d

Here’s a #COLM2025 feed!

Pin it 📌 to follow along with the conference this week!

Reposted by Martin Tutek

arxiv cs.CL @arxiv-cs-cl.bsky.social · 9d

Josip Juki\'c, Martin Tutek, Jan \v{S}najder
Context Parametrization with Compositional Adapters
https://arxiv.org/abs/2509.22158

Reposted by Martin Tutek

arxiv cs.CL @arxiv-cs-cl.bsky.social · 6d

Adi Simhi, Jonathan Herzig, Martin Tutek, Itay Itzhak, Idan Szpektor, Yonatan Belinkov
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
https://arxiv.org/abs/2510.00857

Reposted by Martin Tutek

Yonatan Belinkov ✈️ COLM2025 @boknilev.bsky.social · 7d

Opportunities to join my group in fall 2026:
* PhD applications direct or via ELLIS @ellis.eu (ellis.eu/news/ellis-p...)
* Post-doc applications direct or via Azrieli (azrielifoundation.org/fellows/inte...) or Zuckerman (zuckermanstem.org/ourprograms/...)

Reposted by Martin Tutek

Aaron Mueller @amuuueller.bsky.social · 7d

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

Martin Tutek @mtutek.bsky.social · 8d

Hints of an Openreview x Overleaf stealth collab, sharing data of future works? 🤔

Martin Tutek @mtutek.bsky.social · 9d

Like it, less effort.
Feel like matching is pretty good although it does hyperfocus on singular papers sometimes.
wdyt?

Reposted by Martin Tutek

Pepa Atanasova @apepa.bsky.social · 9d

🎓 Fully funded PhD in Trustworthy NLP at the UCPH & @aicentre.dk with @iaugenstein.bsky.social and me, @copenlu.bsky.social
📆 Application deadline: 30 October 2025
👀 Reasons to apply: www.copenlu.com/post/why-ucph/
🔗 Apply here: candidate.hr-manager.net/ApplicationI...
#NLProc #XAI #TrustworhyAI

Martin Tutek @mtutek.bsky.social · 9d

Boston Neural Network Dynamics

Reposted by Martin Tutek

Anne Lauscher @a-lauscher.bsky.social · 14d

🚨 Are you looking for a PhD in #NLProc dealing with #LLMs?
🎉 Good news: I am hiring! 🎉
The position is part of the “Contested Climate Futures" project. 🌱🌍 You will focus on developing next-generation AI methods🤖 to analyze climate-related concepts in content—including texts, images, and videos.

Martin Tutek @mtutek.bsky.social · Sep 8

👋

Martin Tutek @mtutek.bsky.social · Sep 8

Very cool work!

It seems you identify (one of?) the causes why reasoning chains are generally not plausible to humans - how do you think "narrative alignment" would affect plausibility?

Reposted by Martin Tutek

Antoine Bosselut @abosselut.bsky.social · Sep 3

The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we @icepfl.bsky.social @ethz.ch @cscsch.bsky.social ) built Apertus.

EPFL School of Computer and Communication Sciences @icepfl.bsky.social · Sep 2

EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...

Reposted by Martin Tutek

EACL 2026 @eaclmeeting.bsky.social · Sep 2

🚨 EACL 2026 website is live and Call for Papers is out! 🚨

Join us at #EACL2026 (Rabat, Morocco 🇲🇦, Mar 24-29 2026)

👉 Open to all areas of CL/NLP + related fields.

Details: 2026.eacl.org/calls/papers/

• ARR submission deadline: Oct 6, 2025
• EACL commitment deadline: Dec 14, 2025

Reposted by Martin Tutek

Isabelle Augenstein @iaugenstein.bsky.social · Sep 1

- Fully funded PhD fellowship on Explainable NLU: apply by 31 October 2025, start in Spring 2026: candidate.hr-manager.net/ApplicationI...

- Open-topic PhD positions: express your interest through ELLIS by 31 October 2025, start in Autumn 2026: ellis.eu/news/ellis-p...

#NLProc #XAI

PhD fellowship in Explainable Natural Language Understanding Department of Computer Science Faculty of SCIENCE University of Copenhagen

The Natural Language Processing Section at the Department of Computer Science, Faculty of Science at the University of Copenhagen invites applicants for a PhD f

candidate.hr-manager.net

Reposted by Martin Tutek

Mark Riedl @markriedl.bsky.social · Aug 28

All your embarrassing secrets are training data (unless you are paying attention)

Hayden Field @haydenfield.bsky.social · Aug 28

NEW: Anthropic will start training its AI models on user data, including new chat transcripts & coding sessions, unless users choose to opt out by 9/28 (it's a pop-up window that will give you the choice). It’s also extending its data retention to 5 years.
www.theverge.com/anthropic/76...

Anthropic will start training its AI models on chat transcripts

You can choose to opt out.

www.theverge.com

Martin Tutek @mtutek.bsky.social · Aug 28

Yeah, I was conservative because the author overlap probably gets larger the wider you look. Staggering numbers.

Martin Tutek @mtutek.bsky.social · Aug 27

How many people would you estimate are currently actively publishing in ML research?

From AAAI, which has ~29000 submissions: "There are 75,000+ unique submitting authors."
NeurIPS had 25000 submissions.

Is the number close to 300k? 500k?