Lightnews — Scholar-powered news

Saumya Malik

@saumyamalik.bsky.social

55 followers 8 following 7 posts

Predoc at Ai2 | prev. Princeton CS '24

Posts Media Videos Starter Packs

Saumya Malik @saumyamalik.bsky.social · Jun 2

Thank you to co-authors @natolambert.bsky.social, @valentinapy.bsky.social, @jacobcares.bsky.social, Sander Land, @nlpnoah.bsky.social, @hanna-nlp.bsky.social!
Read more in the paper here (ArXiv soon!): github.com/allenai/rewa...
Dataset, leaderboard, and models here: huggingface.co/collections/...

Reward Bench 2 - a allenai Collection

Datasets, spaces, and models for Reward Bench 2 benchmark and paper!

huggingface.co

1 2

Saumya Malik @saumyamalik.bsky.social · Jun 2

Interestingly, we find that RLHF performance degrades if the lineages of the reward model and policy model don’t match 🤔 So, instead of simply taking the top model on RewardBench 2 off-the-shelf, one should take the recipe for that model and integrate it into their RLHF workflow

1 1

Saumya Malik @saumyamalik.bsky.social · Jun 2

We find that RewardBench 2 is highly correlated with downstream performance when RMs are used at inference time in Best-of-N selection and it also provides a helpful signal of downstream performance in RLHF 🔥

1 1

Saumya Malik @saumyamalik.bsky.social · Jun 2

We trained and released 70 reward models to study their performance on RB2 and in downstream applications like inference time Best-of-N sampling and RLHF training. Even top RMs still have plenty of room to improve on RB2, particularly in Precise Instruction Following and Math

1 2

Saumya Malik @saumyamalik.bsky.social · Jun 2

RewardBench 2 spans six domains, sources new human prompts, and carefully constructs and combines completions to build out a best-of-4 dataset. Using fresh prompts is an important step in making reward model evaluation independent from downstream evaluations

1 1

Saumya Malik @saumyamalik.bsky.social · Jun 2

I’m thrilled to share RewardBench 2 📊— We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!

2 6 22

Saumya Malik @saumyamalik.bsky.social · Dec 4

I'm having a great time as a PYI at Ai2! Definitely consider applying for this great program :)

Nathan Lambert @natolambert.bsky.social · Dec 3

We're hiring another predoctoral researcher for my team at Ai2/OLMo next year. The goal of this position is to mentor and grow future academic stars of NLP/AI over 1-2 years before grad school.

This ends up being people done with BS or MS who want to continue to a PhD soon.
https://buff.ly/49nuggo