Eugene Berta
@eberta.bsky.social
61 followers 98 following 11 posts
PhD student at INRIA Paris. Working on calibration of machine learning classifiers.
Posts Media Videos Starter Packs
Pinned
eberta.bsky.social
Early stopping on validation loss? This leads to suboptimal calibration and refinement errors—but you can do better!
With @dholzmueller.bsky.social, Michael I. Jordan, and @bachfrancis.bsky.social, we propose a method that integrates with any model and boosts classification performance across tasks.
Reposted by Eugene Berta
potosacho.bsky.social
COLT Workshop on Predictions and Uncertainty was a banger!

I was lucky to present our paper "Minimum Volume Conformal Sets for Multivariate Regression", alongside my colleague @eberta.bsky.social and his awsome work on calibration.

Big thanks to the organizers!

#ConformalPrediction #MarcoPolo
Reposted by Eugene Berta
lchoshen.bsky.social
What if we have been doing early stopping wrong all along?
When you break the validation loss into two terms, calibration and refinement
you can make the simplest (efficient) trick to stop training in a smarter position
eberta.bsky.social
This suggests a clear link with the ROC curve in the binary case, but writing it down formally, the relationship between the two is a bit ugly…
eberta.bsky.social
Isotonic regression minimizes the risk of any « Bregman loss function » (included cross-entropy, see section 2.1 below) up to monotonic relabeling, which looks a lot like our « refinement as a minimiser » formulation. It also find the ROC convex hull.
proceedings.mlr.press/v238/berta24...
Classifier Calibration with ROC-Regularized Isotonic Regression
Calibration of machine learning classifiers is necessary to obtain reliable and interpretable predictions, bridging the gap between model outputs and actual probabilities. One prominent technique, ...
proceedings.mlr.press
eberta.bsky.social
However, for calibration of the final model, adding an intercept or doing matrix scaling might work even better in certain scenario (imbalanced, non-centered). We’ve experimented with existing implementation with limited success for now, maybe we should look at that in more details…
eberta.bsky.social
Not yet! Vector/matrix scaling has more parameters so it is more prone to overfitting the validation set, and simple TS seems to calibrate well empirically, which is why we stuck with that to estimate refinement error for early stopping.
eberta.bsky.social
I’ve observed refinement being minimized before calibration for small (probably under-fitter) neural nets. In many cases, the refinement curve also starts « overfitting » at some point.
eberta.bsky.social
We’ve not tried what you’re suggesting but if the training cost is small this might indeed be a good option!
eberta.bsky.social
Indeed regularisation seems very important. It can have large impact on how calibration error behaves. Combined with learning rate schedulers, this can have surprising effects, like calibration error starting to go down again at some point.
eberta.bsky.social
Thanks! We have experimented with many models, observing various behaviours. The « calibration going up while refinement goes down » seems typical in deep learning from what I’ve seen. With smaller models other things can appear, as suggested by our logistic regression analysis (section 6).
eberta.bsky.social
Early stopping on validation loss? This leads to suboptimal calibration and refinement errors—but you can do better!
With @dholzmueller.bsky.social, Michael I. Jordan, and @bachfrancis.bsky.social, we propose a method that integrates with any model and boosts classification performance across tasks.