Sarthak Mittal
@sarthmit.bsky.social
220 followers 23 following 11 posts
Posts Media Videos Starter Packs
sarthmit.bsky.social
🤯 Unexpected Finding: Continuous-time diffusion models underperform in posterior estimation—sometimes worse than simple Gaussian assumptions!

This highlights the need for better model design for parameter estimation. 🚀

Open-sourced code: github.com/sarthmit/par...
sarthmit.bsky.social
For full posterior estimation, we explore forward- and reverse-KL minimization (+combining them) with various modeling choices

🔹 Gaussian approx.
🔹 Normalizing Flows
🔹 Advanced models: Diffusion, Flow-Matching, Iterated Denoising Energy Matching

Surprising insight next!
sarthmit.bsky.social
🚀 Key Finding: In high-dimensional spaces, amortized point estimation significantly outperforms full posterior approaches!

For point estimation, we use:
🔹 Maximum Likelihood (MLE)
🔹 Maximum-a-Posteriori (MAP)

But what about posterior estimation?
sarthmit.bsky.social
We study amortized inference, where a learner estimates the underlying parameters in its forward pass, explicitly conditioned on data.

Through extensive in- and out-of-distribution evaluations, we compare point estimation vs. full posterior estimation.
sarthmit.bsky.social
🔍 Parametric Inference: Point vs Full Posterior Estimation

Two approaches:
📌 Point Estimation (MLE/MAP) – Optimizes for a single parameter value
📊 Full Posterior Estimation – Approximates the full distribution (MCMC, VI)

Which is best for amortized inference? We find out! 👇
sarthmit.bsky.social
It definitely took us a while to get this out but we are excited to provide a thorough and rigorous evaluation benchmark! Exploring in-context learning approaches agnostic to modality is under-explored and we are very excited about this avenue!

Code: github.com/sarthmit/par...
sarthmit.bsky.social
We provide a rigorous comparison of different architecture choices, parameterizations of densities, and training objectives for learning this amortized in-context posterior estimator. Further studies on high-dimensional problems, cases of misspecification, etc. in the paper!
sarthmit.bsky.social
Same insight, beyond language: estimate p(parameters | dataset) for different datasets. What do you gain? 🌟

Posterior over parameters for new datasets provided in-context through just inference instead of MCMC, etc.

Fun connections to learned optimizers, meta-learning, etc.
sarthmit.bsky.social
Diffusion: p(image | text) for different text inputs

ICL in LLMs: p(ans | question, examples) for different examples

Multi-task (RL or otherwise) = p(next action | environment) for different environments

Key insight: Train across diverse contexts using a shared language.