Desi R Ivanova
@desirivanova.bsky.social
4.3K followers 220 following 64 posts
Research fellow @OxfordStats @OxCSML, spent time at FAIR and MSR Former quant 📈 (@GoldmanSachs), former former gymnast 🤸‍♀️ My opinions are my own 🇧🇬-🇬🇧 sh/ssh
Posts Media Videos Starter Packs
Pinned
desirivanova.bsky.social
Decent 0-shot performance (conditional on it has been like 100 years since retirement) 😅
Reposted by Desi R Ivanova
oxfordstatistics.bsky.social
🏢 Vacancy: Novo Nordisk Postdoctoral Research Fellow (4 posts)
📍 Department of Statistics, University of Oxford
📃 Contract: Full time, fixed-term for 3 years
💷 Salary Range: £36,024 – £44,263
⏲️ Deadline: 12pm UK, 30 Apr 2025

Full details & how to apply 👉 shorturl.at/3l47e
desirivanova.bsky.social
Lecture 5: Backpropagation and Autodifferentiation

Thank god the days of computing gradients by hand are over! Nevertheless, it’s good to know what backprop is and why we do it

open.substack.com/pub/probappr...
Lecture 5: Backprop and Autodiff
Order matters
open.substack.com
desirivanova.bsky.social
Along with the lightweight library, we provide short code snippets in the paper.
desirivanova.bsky.social
…and for constructing error bars on more complicated metrics, such as F1 score, that require the flexibility of Bayes.
desirivanova.bsky.social
...and treated without an independence assumption (e.g. using the same eval questions on both LLMs)...
desirivanova.bsky.social
...for making comparisons between two LLMs treated independently...
desirivanova.bsky.social
We also suggest simple methods for the clustered-question setting (where we don't assume all questions are IID -- instead we have T groups of N/T IID questions)...
desirivanova.bsky.social
Or, in this IID question setting, if you want to stay frequentist you can use Wilson-score intervals: en.wikipedia.org/wiki/Binomial_…
https://en.wikipedia.org/wiki/Binomial_….
desirivanova.bsky.social
We suggest using Bayesian credible intervals for your error bars instead, with a simple Beta-Binomial model. (The aim is for the methods to achieve nominal 1-alpha coverage i.e. match the dotted line in the top row. A 95% confidence interval should be right 95% of the time.)
desirivanova.bsky.social
This, along with the CLT's ignorance of typically binary eval data (correct/incorrect responses to an eval question) lead to poor error bars which collapse to zero-width or extend past [0,1].
desirivanova.bsky.social
As LLMs get better, benchmarks to evaluate their capabilities are getting smaller (and harder). This starts to violate the CLT's large N assumption. Meanwhile, we have lots of eval settings in which questions aren't IID (e.g. questions in a benchmark often aren't independent).
desirivanova.bsky.social
Our paper on the best way to add error bars to LLM evals is on arXiv! TL;DR: Avoid the Central Limit Theorem -- there are better, simple Bayesian and frequentist methods you should be using instead.

We also provide a super lightweight library: github.com/sambowyer/baye… 🧵👇
desirivanova.bsky.social
NHS boss was sacked (well, “resigned”), so there’s some hope for major reforms and improvements in the health system (I hope 🤞)
Reposted by Desi R Ivanova
natolambert.bsky.social
Come work with me!
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs

job-boards.greenhouse.io/thealleninst...
The Allen Institute for AI
job-boards.greenhouse.io
desirivanova.bsky.social
Nice. Are the materials publicly available?
desirivanova.bsky.social
We currently do 2 lectures on GPs 😅 one could certainly do a whole course (bayesopt, automl) - could be fun!
desirivanova.bsky.social
Indeed, the course is already really quite tight. So if DPs are to be covered, something has to be dropped. I’m thinking for next year potentially dropping constrained optimisation/SVMs (done in the first half) and covering BNP more thoroughly
desirivanova.bsky.social
It’s a mix - first part was ERM, SVMs and kernels; second part (which is the one I’m teaching) - Bayesian ML (GPs), deep learning and VI
desirivanova.bsky.social
Teaching is super undervalued by universities (at least in UK) so there’s very little incentive to do it well. I think this is wrong and thoughtful pedagogy matters deeply. I hope these “teaching blogs” series will help me get up to speed and improve more quickly

open.substack.com/pub/probappr...
Lecture 1: Gaussian Processes and GP Regression
Nice and easy when everything is Gaussian
open.substack.com