Lightnews — Scholar-powered news

Stephan Rabanser @stvrb.bsky.social · Aug 11

This seems to be a good effort to try to make progress on more holistic evals: hal.cs.princeton.edu

The Holistic Agent Leaderboard (HAL) is the standardized, cost-aware, and third-party leaderboard for evaluating agents.

hal.cs.princeton.edu

1 1

Stephan Rabanser @stvrb.bsky.social · Jul 23

More on this work:

📄 Our workshop paper: openreview.net/pdf?id=qYI4f...
🖼️ Our award-winning poster: rabanser.dev/assets/poste...
🛠️ Check out the workshop for more new research on efficient on-device machine learning: ttodlerfm.gitlab.io

openreview.net

Stephan Rabanser @stvrb.bsky.social · Jul 23

Thanks to all my amazing collaborators at Google for hosting me for this internship in Zurich and for making this work possible: Nathalie Rauschmayr, Achin (Ace) Kulshrestha, Petra Poklukar, Wittawat Jitkrittum, Sean Augenstein, Congchao Wang, and Federico Tombari!

1

Stephan Rabanser @stvrb.bsky.social · Jul 23

In our work, we introduce Gatekeeper: a novel loss function that calibrates smaller models in cascade setups to confidently handle easy tasks while deferring complex ones. Gatekeeper significantly improves deferral performance across a diverse set of architectures and tasks.

1

Stephan Rabanser @stvrb.bsky.social · Jul 23

🏅 Very excited to share that my recent Google internship project on model cascading has received the 𝗕𝗲𝘀𝘁 𝗣𝗼𝘀𝘁𝗲𝗿 𝗔𝘄𝗮𝗿𝗱 at the 𝘛𝘛𝘖𝘋𝘓𝘦𝘳-𝘍𝘔 𝘞𝘰𝘳𝘬𝘴𝘩𝘰𝘱 @ 𝘐𝘊𝘔𝘓! Thanks a lot to the organizers for setting up this amazing workshop!

1

Stephan Rabanser @stvrb.bsky.social · Jul 11

📄 Gatekeeper: Improving Model Cascades Through Confidence Tuning
Paper ➡️ arxiv.org/abs/2502.19335
Workshop ➡️ Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM)
Poster ➡️ West Meeting Room 215-216 on Sat 19 Jul 3:00 p.m. — 3:45 p.m.

Stephan Rabanser @stvrb.bsky.social · Jul 11

📄 Selective Prediction Via Training Dynamics
Paper ➡️ arxiv.org/abs/2205.13532
Workshop ➡️ 3rd Workshop on High-dimensional Learning Dynamics (HiLD)
Poster ➡️ West Meeting Room 118-120 on Sat 19 Jul 10:15 a.m. — 11:15 a.m. & 4:45 p.m. — 5:30 p.m.

1 1

Stephan Rabanser @stvrb.bsky.social · Jul 11

📄 Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings (✨ oral paper ✨)
Paper ➡️ arxiv.org/abs/2505.22356
Poster ➡️ E-504 on Thu 17 Jul 4:30 p.m. — 7 p.m.
Oral Presentation ➡️ West Ballroom C on Thu 17 Jul 4:15 p.m. — 4:30 p.m.

1 1

Stephan Rabanser @stvrb.bsky.social · Jul 11

📄 Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention
TL;DR ➡️ We show that a model owner can artificially introduce uncertainty and provide a detection mechanism.
Paper ➡️ arxiv.org/abs/2505.23968
Poster ➡️ E-1002 on Wed 16 Jul 11 a.m. — 1:30 p.m.

1 1

Stephan Rabanser @stvrb.bsky.social · Jul 11

📣 I will be at #ICML2025 in Vancouver next week to present two main conference papers (including one oral paper ✨) and two workshop papers! Say hi if you are around and want to chat about ML uncertainty & reliability! 😊

🧵 Papers in order of presentation below: