Bálint Mucsányi
@bmucsanyi.bsky.social
320 followers 130 following 15 posts
ELLIS & IMPRS-IS PhD Student at the University of Tübingen. Excited about uncertainty quantification, weight spaces, and deep learning theory.
Posts Media Videos Starter Packs
Reposted by Bálint Mucsányi
mkirchhof.bsky.social
Many LLM uncertainty estimators perform similarly, but does that mean they do the same? No! We find that they use different cues, and combining them gives even better performance. 🧵1/5

📄 openreview.net/forum?id=QKR...
NeurIPS: Sunday, East Exhibition Hall A, Safe Gen AI workshop
bmucsanyi.bsky.social
Excited to present our spotlight paper on uncertainty disentanglement at #NeurIPS! Drop by today between 11 am and 2 pm PST at West Ballroom A-D #5509 and let's chat!
bmucsanyi.bsky.social
Yes, sampling is a possibility. For Dirichlets, the predictive, epistemic, and aleatoric estimators of the information-theoretical decomposition (Eq. (1) in the paper: arxiv.org/abs/2402.19460) are available in closed form, so we can do better than sampling (lines 3198 and 3234 of validate.py).
bmucsanyi.bsky.social
The alphas are the parameters of the predictive Dirichlet distribution. If you normalize the alpha vector by the sum of its elements (called the "evidence"), you get the predictive mean which you can use for the ECE/MCE calculation. This is done automatically in line 3186 of validate.py. :)
bmucsanyi.bsky.social
Correct! The EDLWrapper does not add extra parameters but changes the interpretation of logits, as the EDLLoss is a scheduled + regularized L2 loss instead of the usual cross-entropy.
bmucsanyi.bsky.social
Of course, ask away! :)
bmucsanyi.bsky.social
For more details, check out our paper: arxiv.org/abs/2402.19460! Our GitHub repo (github.com/bmucsanyi/un...) contains performant implementations of the 19 benchmarked uncertainty methods, out-of-the-box OOD perturbation support, handling of label uncertainty, and support for over 50 metrics. 7/7
bmucsanyi.bsky.social
A promising avenue for disentanglement is to combine such specialized estimators. As a simple baseline, combining the Mahalanobis OOD detector's epistemic estimates and the aleatoric estimates of evidential methods leads to well-performing but only mildly correlated estimators. 6/7
bmucsanyi.bsky.social
All these insights point to the conclusion that the best uncertainty method depends on the type of uncertainty and, even more importantly, the exact task we want to solve. Thinking only in terms of the 'aleatoric vs. epistemic' dichotomy is not fine-grained enough to obtain specialized methods. 5/7
bmucsanyi.bsky.social
Predictive uncertainty encompasses all the aforementioned sources of uncertainty. Almost all methods perform well on predictive uncertainty metrics, but the best-performing one depends on the exact metric (see podiums below for different metrics). 4/7
bmucsanyi.bsky.social
Instead, we found specialized estimators to perform best at capturing these sources of uncertainty. For epistemic uncertainty, a specialized OOD detector works best. For aleatoric uncertainty, evidential methods perform well, but more research is needed to develop dedicated aleatoric estimators. 3/7
bmucsanyi.bsky.social
Decomposition formulas like in the image below are popular approaches for breaking up the total uncertainty into different parts. However, we unveil that these parts are severely internally correlated (rank corr. 0.8 to 0.999), i.e., they "measure the same thing" in practice. 2/7
Information-theoretical uncertainty decomposition formula.
bmucsanyi.bsky.social
Thrilled to share our NeurIPS spotlight on uncertainty disentanglement! ✨ We study how well existing methods disentangle different sources of uncertainty, like epistemic and aleatoric. While all tested methods fail at this task, there are promising avenues ahead. 🧵 👇 1/7

📖: arxiv.org/abs/2402.19460
bmucsanyi.bsky.social
Could you add me to the starter pack? Thank you!