Lightnews — Scholar-powered news

Reposted by Yuda Song

Miro Dudik

@mdudik.bsky.social

🚨Microsoft Research NYC is hiring🚨

We're hiring postdocs and senior researchers in AI/ML broadly, and in specific areas like test-time scaling and science of DL. Postdoc applications due Oct 22, 2025. Senior researcher applications considered on a rolling basis.

Links to apply: aka.ms/msrnyc-jobs

Microsoft Research Lab - New York City - Microsoft Research

Apply for a research position at Microsoft Research New York & collaborate with academia to advance economics research, prediction markets & ML.

aka.ms

September 18, 2025 at 2:37 PM

Reposted by Yuda Song

Jacob Springer

@jacobspringer.bsky.social

Training with more data = better LLMs, right? 🚨

False! Scaling language models by adding more pre-training data can decrease your performance after post-training!
Introducing "catastrophic overtraining." 🥁🧵👇

arxiv.org/abs/2503.19206

1/10

March 26, 2025 at 6:35 PM

Reposted by Yuda Song

Gokul Swamy

@gokul.dev

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to 🤿:

March 4, 2025 at 8:59 PM

Reposted by Yuda Song

Antoine Moulin

@antoine-mln.bsky.social

super happy about this preprint! we can *finally* perform efficient exploration and find near-optimal stationary policies in infinite-horizon linear MDPs, and even use it for imitation learning :) working with @neu-rips.bsky.social and @lviano.bsky.social on this was so much fun!!

February 20, 2025 at 5:45 PM

Reposted by Yuda Song

Dylan Foster 🐢

@djfoster.bsky.social

What are the minimal supervised learning primitives required to perform RL efficiently?

New paper led by my amazing intern Dhruv Rohatgi:

Necessary and Sufficient Oracles: Toward a Computational Taxonomy for Reinforcement Learning

arxiv.org/abs/2502.08632

1/

February 20, 2025 at 11:39 PM

Reposted by Yuda Song

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Models can self-improve🥷 by knowing they were wrong🧘‍♀️ but when can they do it?

Across LLM families, tasks and mechanisms
This ability scales with pretraining, prefers CoT, non QA tasks and more in 🧵

alphaxiv.org/abs/2412.02674
@yus167.bsky.social @shamkakade.bsky.social
📈🤖
#NLP #ML

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models | alphaXiv

View 3 comments: Delete the space?

alphaxiv.org

December 13, 2024 at 11:55 PM

Yuda Song

@yus167.bsky.social

I will present two papers at #NeurIPS2024!

Happy to meet old and new friends and talk about all aspects of RL: data, environment structure, and reward! 😀

In Wed 11am-2pm poster session I will present HyPO-- best of both worlds of offline and online RLHF: neurips.cc/virtual/2024...

NeurIPS Poster The Importance of Online Data: Understanding Preference Fine-tuning via CoverageNeurIPS 2024

neurips.cc

December 9, 2024 at 7:49 PM

Yuda Song

@yus167.bsky.social

LLM self-improvement has critical implications in synthetic data, post-training and test-time inference. To understand LLMs' true capability of self-improvement, we perform large-scale experiments with multiple families of LLMs, tasks and mechanisms. Here is what we found: (1/9)

December 6, 2024 at 6:02 PM

Reposted by Yuda Song

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Yuda Song, Hanlin Zhang, Carson Eisenach, Sham Kakade, Dean Foster, Udaya Ghai
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
https://arxiv.org/abs/2412.02674

December 4, 2024 at 9:09 AM

Reposted by Yuda Song

Gokul Swamy

@gokul.dev

I think the main difference in terms of interpolation / extrapolation between DPO and RLHF is that the former only guarantees closeness to the reference policy on the training data, while RLHF usually tacks on an on-policy KL penalty. We explored this point in arxiv.org/abs/2406.01462.

November 22, 2024 at 3:38 PM

Reposted by Yuda Song

Sham Kakade

@shamkakade.bsky.social

(1/n) 💡How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping point where the gains of data parallelism balance with diminishing efficiency. Doubling batch size halves the optimization steps—until we hit CBS, beyond which returns diminish.

November 22, 2024 at 8:19 PM

Reposted by Yuda Song

Steph Milani

@stephmilani.bsky.social

I created a starter pack for people who are or have been affiliated with the Machine Learning Department at CMU. Let me know if I missed someone!

go.bsky.app/QLTVEph

#AcademicSky

November 18, 2024 at 3:46 PM

Reposted by Yuda Song

arxiv stat.ML

@arxiv-stat-ml.bsky.social

Ojash Neopane, Aaditya Ramdas, Aarti Singh
Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect
https://arxiv.org/abs/2411.14341

November 22, 2024 at 5:01 AM

Reposted by Yuda Song

Zhengyi "Zen" Luo

@zhengyiluo.bsky.social

Intro 🦋

I am a final-year PhD student from CMU Robotics. I work on humanoid control, perception, and behavior in both simulation and real life, using mostly RL:

🏃🏻PHC: zhengyiluo.com/PHC
💫PULSE: zhengyiluo.com/PULSE
🔩Omnigrasp: zhengyiluo.com/Omnigrasp
🤖OmniH2O: omni.human2humanoid.com

November 19, 2024 at 8:34 PM

Reposted by Yuda Song

Steph Milani

@stephmilani.bsky.social

Hi Bsky people 👋 I'm a PhD candidate in Machine Learning at Carnegie Mellon University.
My research focuses on interactive AI, involving:
🤖 reinforcement learning,
🧠 foundation models, and
👩‍💻 human-centered AI.

Also a founding co-organizer of the MineRL competitions 🖤 Follow me for ML updates!

November 18, 2024 at 3:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news