Lightnews — Scholar-powered news

Erfan Mirzaei

@erfunmirzaei.bsky.social

450 followers 160 following 18 posts

Researcher @PontilGroup.bsky.social| Ph.D. Student @ellis.eu, @Polytechnique, and @UniGenova.
Interested in (deep) learning theory and others.

Posts Replies Media Videos

Erfan Mirzaei

@erfunmirzaei.bsky.social

If you’re curious about the intersection of statistical learning theory, sampling-based optimization, generalization in deep learning, and PAC-Bayesian analysis, check out our paper.We’d love to hear your thoughts, feedback, or questions. If you spot interesting connections to your work, let’s chat!

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

😱 A second, equally striking factor: by applying a single scalar calibration factor computed from the data, the resulting upper bounds become not only tighter for true labels but also better aligned with the test error curve.

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

🙀 One surprising insight: Generalization in the under-regularized low-temperature regime (β > n) is already signaled by small training errors in the over-regularized high-temperature regime.

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

Empirical results on MNIST and CIFAR-10 show:
1) Non-trivial upper bounds on test error for both true and random labels
2) Meaningful distinction between structure-rich and structure-poor datasets

The figures: Binary classification with FCNNs using SGLD using 8k MNIST images

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

We show that it can be effectively approximated via Langevin Monte Carlo (LMC) algorithms, such as Stochastic Gradient Langevin Dynamics (SGLD), and crucially,

📎 Our bounds remain stable under this approximation (in both total variation and W₂ distance).

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

Then comes our first contribution:
✅ We derive high-probability, data-dependent bounds on the test error for hypotheses sampled from the Gibbs posterior (for the first time in the low-temperature regime β > n).
Sampling from the Gibbs posterior is, however, typically difficult.

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

This leads naturally to the Gibbs posterior, which assigns higher probabilities to hypotheses with smaller training errors (exponentially decaying with loss).

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

To probe this question, we turn to randomized predictors rather than deterministic ones.
Here, predictors are sampled from a prescribed probability distribution, allowing us to apply PAC-Bayesian theory to study their generalization properties.

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

In the figure below from the famous paper, the same model achieves nearly zero training error on both random and true labels. Therefore, the key to generalization must lie within the structure of the data itself.
arxiv.org/abs/1611.03530

November 14, 2025 at 2:11 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

This paves the way for more data-dependent generalization guarantees in dependent-data settings.

May 2, 2025 at 6:35 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

Technique highlights:
🔹 Uses blocking methods
🔹 Captures fast-decaying correlations
🔹 Results in tight O(1/n) bounds when decorrelation is fast

Applications:
📊 Covariance operator estimation
🔄 Learning transfer operators for stochastic processes

May 2, 2025 at 6:35 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

Our contribution:
We propose empirical Bernstein-type concentration bounds for Hilbert space-valued random variables arising from mixing processes.
🧠 Works for both stationary and non-stationary sequences.

May 2, 2025 at 6:35 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

Challenge:
Standard i.i.d. assumptions fail in many learning tasks, especially those involving trajectory data (e.g., molecular dynamics, climate models).
👉 Temporal dependence and slow mixing make it hard to get sharp generalization bounds.

May 2, 2025 at 6:35 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

Could add me to the list?

December 4, 2024 at 10:29 PM

Erfan Mirzaei

@erfunmirzaei.bsky.social

Hi Gaspard. I wonder what you are currently working on in regard to sequence models and world models. Since I have similar interests as you, and in the lab, we had worked on the intersection of the topics (bsky.app/profile/marc...).

Marco Pratticò @marcopra.bsky.social · Nov 26

🎉 I am happy to share that I co-authored my first paper, “Operator World Models for Reinforcement Learning,” published at #NeurIPS2024! 🚀 Glad to present it in Vancouver 🇨🇦. See you there! #AI #ReinforcementLearning
@pontilgroup.bsky.social

arxiv.org/pdf/2406.19861

arxiv.org

November 27, 2024 at 2:43 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news