Lightnews — Scholar-powered news

Reposted by Vishaal Udandarao

Andreas Hochlehnert @ahochlehnert.bsky.social · Feb 17

CuratedThoughts: Data Curation for RL Datasets 🚀

Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.

Here's why 👇🧵

1 9 13

Reposted by Vishaal Udandarao

Andreas Kirsch @blackhc.bsky.social · Jan 7

Ever wondered why presenting more facts can sometimes *worsen* disagreements, even among rational people? 🤔

It turns out, Bayesian reasoning has some surprising answers - no cognitive biases needed! Let's explore this fascinating paradox quickly ☺️

8 78 230

Reposted by Vishaal Udandarao

Paul Vicol @paulvicol.bsky.social · Dec 19

🎉 Had fun at #NeurIPS2024 Workshop on #AdaptiveFoundationModels!

🚀 Speakers: @rsalakhu.bsky.social @sedielem.bsky.social Kate Saenko, Matthias Bethge / @vishaalurao.bsky.social Minjoon Seo, Bing Liu, Tianqi Chen

🌐Posters: adaptive-foundation-models.org/papers

🎬 neurips.cc/virtual/2024...

🧵Recap!

1 2 9

Reposted by Vishaal Udandarao

Paul Vicol @paulvicol.bsky.social · Dec 19

Our workshop in numbers:
🖇️ 128 Papers
💬 8 Orals
🖋️ 564 Authors
✅ 40 Reviewers
🔊 7 Invited Speakers
👕 100 T-Shirts

🔥 Organizers: Paul Vicol, Mengye Ren, Renjie Liao, Naila Murray, Wei-Chiu Ma, Beidi Chen

#NeurIPS2024 #AdaptiveFoundationModels

1 1 1

Reposted by Vishaal Udandarao

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

🚨Looking to test your foundation model on an arbitrary and open-ended set of capabilities, not explicitly captured by static benchmarks? 🚨

Check out ✨ONEBench✨, where we show how sample-level evaluation is the solution.

🔎 arxiv.org/abs/2412.06745

1 5 18

Reposted by Vishaal Udandarao

Karsten Roth @confusezius.bsky.social · Dec 10

😵‍💫 Continually pretraining large multimodal models to keep them up-to-date all-the-time is tough, covering everything from adapters, merging, meta-scheduling to data design and more!

So I'm really happy to present our large-scale study at #NeurIPS2024!

Come drop by to talk about all that and more!

1 6 40

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

This was work done during my internship with amazing folks @google @deep-mind.bsky.social: @nikparth1.bsky.social (joint-first) Ferjad Talfan @samuelalbanie.bsky.social Federico Yongqin Alessio & @olivierhenaff.bsky.social

Super excited about this direction of strong pretraining for smol models!

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

Bonus: Along the way, we found current state of CLIP zero-shot benchmarking in disarray—some test datasets have a seed std of ~12%!

We construct a stable & reliable set of evaluations (StableEval) inspired by the inverse-variance-weighting method, to prune out unreliable evals!

1

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

Finally, we scale all our insights to pretrain SoTA FLOP-efficient models across three different FLOP-scales: ACED-F{0,1,2}

Outperforming strong baselines including Apple's MobileCLIP, TinyCLIP and @datologyai.com CLIP models!

1

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

There's more! ACID and KD are complementary — they can be profitably combined, at scale! Our simple pretraining recipe ACED-ACIDistill showcases continued benefits as we scale to 26B samples seen!

1

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

We also show that ACID strongly outperforms KD across different reference/teacher training datasets, KD objectives, and student sizes.

1

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

Our ACID method shows very strong scaling properties as the size of the reference model increases, until we hit a saturation point — the optimal reference-student capacity ratio.

Further, ACID significantly outperforms KD as we scale up the reference/teacher sizes.

1

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

As our ACID method performs implicit distillation, we can further combine our data curation strategy with an explicit distillation objective, and conduct a series of experiments to determine the optimal combination strategy.

1

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

Our online curation method (ACID) uses large pretrained reference models (adopting from prior work: JEST) & we show a theoretical equivalence b/w KD and ACID (appx C in paper).

1

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

TLDR: We introduce an online data curation method that when coupled with simple softmax knowledge distillation produces a very effective pretraining recipe yielding SoTA inference-efficient two-tower contrastive VLMs!

1 1

Vishaal Udandarao @vishaalurao.bsky.social · Dec 2

🚀New Paper: Active Data Curation Effectively Distills Multimodal Models
arxiv.org/abs/2411.18674

Smol models are all the rage these days & knowledge distillation (KD) is key for model compression!

We show how data curation can effectively distill to yield SoTA FLOP-efficient {C/Sig}LIPs!!
🧵👇

1 6 23

Reposted by Vishaal Udandarao

Ari Morcos @arimorcos.bsky.social · Nov 29

ICYMI, check out our latest results @datologyai.com on curating data for LLMs.

Intervening only on training data, our pipeline can train models faster (7.7x less compute), better (+8.5% performance), and smaller (models half the size outperform by >5%)!

www.datologyai.com/post/technic...

Technical Deep-Dive: Curating Our Way to a State-of-the-Art Text Dataset

Our data curation pipeline to obtain substantial improvements in LLM quality, training speed, and inference efficiency.

www.datologyai.com

2 5

Vishaal Udandarao @vishaalurao.bsky.social · Nov 29

Great paper! Why do you think it doesn’t make sense for pretraining to be made aware of the model being used in a few-shot setting downstream? Do you see any potential downsides of this kind of approach?

1 3

Reposted by Vishaal Udandarao

Karsten Roth @confusezius.bsky.social · Nov 28

🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist?

Turns out you can, and here is how: arxiv.org/abs/2411.15099

Really excited to this work on multimodal pretraining for my first bluesky entry!

🧵 A short and hopefully informative thread:

2 24 140

Vishaal Udandarao @vishaalurao.bsky.social · Nov 28

@bayesiankitten.bsky.social @dziadzio.bsky.social and I also work on continual pretraining :)

2

Vishaal Udandarao @vishaalurao.bsky.social · Nov 22

Congrats, super exciting!!

1

Reposted by Vishaal Udandarao

Akari Asai @akariasai.bsky.social · Nov 19

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai

6 39 160

Reposted by Vishaal Udandarao

Sebastian Dziadzio @dziadzio.bsky.social · Nov 19

Here's a fledgling starter pack for the AI community in Tübingen. Let me know if you'd like to be added!

go.bsky.app/NFbVzrA

Tübingen AI

Join the conversation

go.bsky.app

18 13 24