Jason Lee

Jason Lee

@jasondeanlee.bsky.social

Our new work on scaling laws that includes compute, model size, and number of samples. The analysis involves an extremely fine-grained analysis of online sgd built up over the last 8 years of understanding sgd on simple toy models (tensors, single index models, multi index model)

Eshaan Nichani @eshaannichani.bsky.social · May 5

Excited to announce a new paper with Yunwei Ren, Denny Wu,
@jasondeanlee.bsky.social!

We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks.

arxiv.org/abs/2504.19983

🧵below (1/10)

1 5

Reposted by Jason Lee

Eshaan Nichani

@eshaannichani.bsky.social

Excited to announce a new paper with Yunwei Ren, Denny Wu,
@jasondeanlee.bsky.social!

We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks.

arxiv.org/abs/2504.19983

🧵below (1/10)

1 4

Stand Up for Science!

@standupforscience.bsky.social

Welcome to the Bluesky account for Stand Up for Science 2025!

Keep an eye on this space for updates, event information, and ways to get involved. We can't wait to see everyone #standupforscience2025 on March 7th, both in DC and locations nationwide!

#scienceforall #sciencenotsilence

290 5.5K 12K

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

Duck in Vancouver! Mott32

1 1 13

Reposted by Jason Lee

Peyman Milanfar

@docmilanfar.bsky.social

“On a log-log plot, my grandmother fits on a straight line.”
-Physicist Fritz Houtermans

There's a lot of truth to this. log-log plots are often abused and can be very misleading

1/5

1 13 43

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

Lool

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

Representative results:
Settling the sampling complexity of RL: arxiv.org/abs/2307.13586
Optimal Muti-Distribution Learning (solved a COLT 2023 open problem): arxiv.org/abs/2312.05134
Anytime Acceleration of Gradient Descent (solved a COLT 2024 open problem): arxiv.org/abs/2411.17668

Settling the Sample Complexity of Online Reinforcement Learning

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these...

arxiv.org

2 3

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

Zihan Zhang (tinyurl.com/4nks7f9b) is a postdoc with Yuxin Chen, Simon Du, and me.

1 1 2

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

What's known about the 1.27 lower bound? It's a guess or there is a reason ppl believe it's fundamental?

1 1

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

Send your colt open problems to Zihan, with high probability he will solve it!

Jason Lee @jasondeanlee.bsky.social · Nov 27

arxiv.org/abs/2411.17668 Our postdoc zihan slays another COLT open problem! proceedings.mlr.press/v247/kornows...

Anytime Acceleration of Gradient Descent

This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that all...

arxiv.org

20

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

arxiv.org/abs/2411.17668 Our postdoc zihan slays another COLT open problem! proceedings.mlr.press/v247/kornows...

Anytime Acceleration of Gradient Descent

This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that all...

arxiv.org

1 11 68

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

What's the point of @perplexity_ai given chatgpt also does search?

3 2

by Jason Lee

Jason Lee

@jasondeanlee.bsky.social

Yo add me to your starter packs!

1 2 19

Reposted by Jason Lee

Andrea Montanari

@andrea-montanari.bsky.social

Assume that the nodes of a social network can choose between two alternative technologies: B and X.
A node using B receives a benefit with respect to X, but there is a benefit to using the same tech as the majority of your neighbors.
Assume everyone uses X at time t=0. Will they switch to B?