Alexander Terenin
@avt.im
5.3K followers 570 following 170 posts
Decision-making under uncertainty, machine learning theory, artificial intelligence · anti-ideological · Assistant Research Professor, Cornell https://avt.im/ · https://scholar.google.com/citations?user=EGKYdiwAAAAJ&sortby=pubdate
Posts Media Videos Starter Packs
Pinned
avt.im
Paper update: our recent work on Thompson sampling has a shiny new - and I hope much better - name!

This new name does much better job of emphasizing what we actually do.

Joint work with Jeff Negrea. Thread below!
avt.im
Check out the work at: arxiv.org/abs/2502.14790

And, again, shoutout to amazing coauthor Jeff Negrea! Working together has been a great pleasure!

Stay tuned for follow-up: we've been working on using this viewpoint to understand other correlated perturbation-based algorithms.
Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces
We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future...
arxiv.org
avt.im
So this sums up the work! If you followed along, thanks for the interest!

I think you'd agree that "Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces" is a much better title than before. The old one was much harder to pronounce.
avt.im
The Bayesian viewpoint proves useful for developing this analysis.

It allows us to guess what a good prior will be, and suggests ways to use probability as a tool to prove the algorithm works.
avt.im
We prove that the Bayesian approach works in this setting too.

To achieve this, we develop a new probabilistic analysis of correlated Gaussian follow-the-perturbed-leader algorithms, of which ours is a special case.

This has been an open challenge in the area.
avt.im
The second one is where X = [0,1]^d and Y is the space of bounded Lipschitz functions.

Here, you can't use a prior with independence across actions. You need to share information between actions.

We do this by using a Gaussian process, with correlations between actions.
avt.im
The first one is the classical discrete setting where standard algorithms such as exponential weights are studied.

You can use a Gaussian prior which is independent across actions.
avt.im
Okay, so we now know what "Bayesian Algorithms for Adversarial Online Learning" are.

What about "from Finite to Infinite Action Spaces"?

This covers the two settings we show the aforementioned results in.
avt.im
This approach appears to not make any sense: the Bayesian model is completely fake.

We're pretending to know a distribution for how the adversary will act in the future.

But, in reality, they can do anything.

And yet... we show that this works!
avt.im
We show that this game secretly has a natural Bayesian strategy - one we show is strong.

What's the strategy?

It's really simple:
- Place a prior distribution of what the adversary will do in the future
- Condition on what the adversary has done
- Sample from the posterior
avt.im
One observation about adversarial online learning is that it appears to have nothing to do with Bayesian learning.

There is a two-player zero-sum game, not a joint probability distribution.

So you can't just solve it by applying Bayes' Rule. Or can you?
avt.im
Okay, so now we understand what "Adversarial Online Learning" is.

We propose "Bayesian Algorithms" for this.

What does that mean? Let's unpack.
avt.im
Online learning is therefore a good model for learning to explore by taking random actions.

In contrast to other approaches to resolving explore-exploit tradeoffs such as upper confidence bounds which produce purely deterministic strategies.
avt.im
So that's our setting. Why's it interesting?

Because many other hard decision problems can be reduced to online learning, including certain forms of reinforcement learning (via decision-estimation coefficients), equilibrium computation (via no-regret dynamics), and others.
avt.im
Specifically, their goal is to minimize regret

R(p,q) = E_{x_t~p_t, y_t~q_t} \sup_{x\in X} \sum_{t=1}^T y_t(x) - \sum_{t=1}^T y_t(x_t).

Meaning, the learner compares the sum of their rewards y_t(x_t) with the sum of y_t(x) for the best possible single non-time-dependent x.
avt.im
The learner's goal is to achieve the highest rewards possible. But at each time, the adversary can choose a different reward function.

So why is this game not impossible?

Because the learner only compares how well they do with the *sum* of the adversary's previous rewards.
avt.im
"Adversarial Online Learning" refers to a two-player zero-sum repeated game between a learner and adversary.

At each time point:
- The learner chooses a distribution of predictions p_t over an action space X.
- The adversary chooses a reward function y_t : X -> R.
avt.im
Paper update: our recent work on Thompson sampling has a shiny new - and I hope much better - name!

This new name does much better job of emphasizing what we actually do.

Joint work with Jeff Negrea. Thread below!
avt.im
At ICML, we're presenting a paper on uncertainty-aware surface reconstruction!

Compared to previous approaches, we are able to completely remove the need for recursive linear solves for reconstruction and interpolation, using geometric GP machinery.

Check it out!
avt.im
This week’s virtual seminar on Bayesian Decision-making and Uncertainty is happening now!

Noémie Jaquier (KTH Royal Institute of Technology)
On Riemannian Latent Variable Models and Pullback Metrics

Livestream link: www.youtube.com/watch?v=61Be...
avt.im
We obtain a very simple condition which relates the prior covariance kernel with the adversary's function class in an easy-to-verify way that works in the bounded Lipschitz case.

I am very interested in extensions to more-general smoothness classes, and have ideas. Stay tuned!
avt.im
This stands in contrast with prior arguments, which are linear-algebraic in flavor, involve bounding certain matrix norms by certain traces, and essentially-require independence in order to give sharp rates.