Dan Roy
@roydanroy.bsky.social
8.2K followers 550 following 150 posts
Research Director, Founding Faculty, Canada CIFAR AI Chair @VectorInst. Full Prof @UofT - Statistics and Computer Sci. (x-appt) danroy.org I study assumption-free prediction and decision making under uncertainty, with inference emerging from optimality.
Posts Media Videos Starter Packs
roydanroy.bsky.social
Tian and Karolina and team are at ICLR. Come say hi.
tjin.bsky.social
📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N
roydanroy.bsky.social
Curious. Didn’t know meta had a PPL team.
roydanroy.bsky.social
I like to think about non-reasoning model responses as vibes.
roydanroy.bsky.social
So who’s read the 2027 article? What do you think?
roydanroy.bsky.social
Someone has suggested I check out bsky again. So I'm back looking around here. Notification list is kinda boring. So any good conversations going on? Perhaps about LLM/AI reasoning?
roydanroy.bsky.social
Anyone else have the worry that a lot of LLM research is .... just bad psychology?
roydanroy.bsky.social
And, to achieve the results in this paper, what was the most challenging part? Why had previous attempts fallen short? What was your key new insight?
roydanroy.bsky.social
Very interesting. So, what was the biggest hole to fill, in terms of hypotheses?
Reposted by Dan Roy
mbhunzaker.bsky.social
Okay, so just a few* thoughts (*this got longer as I wrote 😅….long thread)-
mbhunzaker.bsky.social
Having a lot of thoughts & feelings today as someone who’s worked both on FBs misinfo interventions (back when they were making investments) and Birdwatch (pre-Musk Community Notes) 💔. Debating if a thread is worth the inevitable headache 😓
roydanroy.bsky.social
Acknowledgments.
roydanroy.bsky.social
I got to ski Revelstoke this winter break.

Couple observations: the price of receiving 600 cm of snow by Jan 8 is that it is constantly snowing. Saw almost no sun the whole time and the peak was often in whiteout conditions (though North Bowl was always clear…).

See image for more.
roydanroy.bsky.social
Multiple friends have likely lost their homes in Los Angeles. Can’t imagine how disorienting this would be. They had only minutes to flee and grab belongings.
roydanroy.bsky.social
What are the key papers to read?
roydanroy.bsky.social
OK. Practical question times. How are you adjusting your research given progress in reasoning style models? Also how are you adjusting the way you work?
roydanroy.bsky.social
A $100,000,000 experiment is no longer "consequence" free. Ilya is saying "scaling is over", but this may simply be that the scaling "laws" (not laws) are no longer accurate. Also, those laws are tied to hyperparameter tunings.
roydanroy.bsky.social
Sure some were empirical. Some were not.
roydanroy.bsky.social
I'd say no in a sense. Xavier-He initialization was theoretical work. And that was absolutely critical.
roydanroy.bsky.social
Pretraining is not done. It's just that theorists haven't told the hackers how to do it better.
natolambert.bsky.social
ILYA: "PRETRAINING IS DONE. WE ARE NOW IN THE POST TRAINING ERA."
roydanroy.bsky.social
Annoying. If it could be automatic, sure.
roydanroy.bsky.social
I'd say wait then.
roydanroy.bsky.social
That's part of the spec. I don't think this is too problematic. The example they give is problems in NP, where there is a polynomial time checker (i.e., a polytime EV), but generating an instance that passes the checker is hard in the worst case.
roydanroy.bsky.social
Now that I've had a taste of X without post length limitations, I've got to say that it is quite annoying have to fit tweets into 256 characters here on bsky. On X, when they get to long, they go below the fold, and so you're still incentivized to make it short. Can't we have that here?
roydanroy.bsky.social
@gkdziugaite.bsky.social. Works at GDM and Mila. Influential, technical work.