Lightnews — Scholar-powered news

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Jul 27

Excited to be in Vienna for #ACL2025 🇦🇹!You'll find @dziadzio.bsky.social and I by our ONEBench poster, so do drop by!

🗓️Wed, July 30, 11-12:30 CET
📍Hall 4/5

I’m also excited to talk about lifelong and personalised benchmarking, data curation and vision-language in general! Let’s connect!

1 4

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Jun 25

Stumbled upon this blogpost recently and found some very useful tips to improve the Bluesky experience. This seemed almost tailored to me - I don't live in the USA and the politics there don't affect me personally. Settings -> Moderation -> Muted Words & Tags cleaned up my feed - strongly recommend!

Naomi Saphra @nsaphra.bsky.social · Apr 26

I wrote something up for AI people who want to get into bluesky and either couldn't assemble an exciting feed or gave up doomscrolling when their Following feed switched to talking politics 24/7.

The AI Researcher's Guide to a Non-Boring Bluesky Feed | Naomi Saphra

How to migrate to bsky without a boring feed.

nsaphra.net

1 11

Reposted by Adhiraj Ghosh@ACL2025

Jia-Bin Huang @jbhuang0604.bsky.social · Jun 24

Why More Researchers Should be Content Creators

Just trying something new! I recorded one of my recent talks, sharing what I learned from starting as a small content creator.

youtu.be/0W_7tJtGcMI

We all benefit when there are more content creators!

1 2 8

Reposted by Adhiraj Ghosh@ACL2025

Shyamgopal Karthik @shyamgopal.bsky.social · Jun 11

I'm in Nashville this week attending #CVPR2025. Excited to discuss post-training VLMs and diffusion models!

1 10

Reposted by Adhiraj Ghosh@ACL2025

Niladri Shekhar Dutt @niladridutt.bsky.social · May 27

🧵1/10 Excited to share our #SIGGRAPH paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟
We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching
#MLLM

1 2 4

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · May 17

🏆ONEBench accepted to ACL main! ✨
Stay tuned for the official leaderboard and real-time personalised benchmarking release!

If you’re attending ACL or are generally interested in the future of foundation model benchmarking, happy to talk!

#ACL2025NLP #ACL2025
@aclmeeting.bsky.social

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

🚨Looking to test your foundation model on an arbitrary and open-ended set of capabilities, not explicitly captured by static benchmarks? 🚨

Check out ✨ONEBench✨, where we show how sample-level evaluation is the solution.

🔎 arxiv.org/abs/2412.06745

2 7

Reposted by Adhiraj Ghosh@ACL2025

Lukas Thede @lukasthede.bsky.social · Apr 8

🧠 Keeping LLMs factually up to date is a common motivation for knowledge editing.

But what would it actually take to support this in practice at the scale and speed the real world demands?

We explore this question and really push the limits of lifelong knowledge editing in the wild.
👇

1 8 29

Reposted by Adhiraj Ghosh@ACL2025

Thaddäus Wiedemer @thwiedemer.bsky.social · Feb 18

Check out our newest paper!

As always, it was super fun working on this with @prasannamayil.bsky.social

Prasanna Mayilvahanan @prasannamayil.bsky.social · Feb 18

New preprint out! 🎉

How does LLM training loss translate to downstream performance?

We show that pretraining data and tokenizer shape loss-to-loss scaling, while architecture and other factors play a surprisingly minor role!
brendel-group.github.io/llm-line/ 🧵1/8

1 5

Reposted by Adhiraj Ghosh@ACL2025

Joschka Strüber @ICML2025 🇨🇦 @joschkastrueber.bsky.social · Feb 7

🚨Great Models Think Alike and this Undermines AI Oversight🚨
New paper quantifies LM similarity
(1) LLM-as-a-judge favor more similar models🤥
(2) Complementary knowledge benefits Weak-to-Strong Generalization☯️
(3) More capable models have more correlated failures 📈🙀
🧵👇

2 9 20

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Feb 7

Godsend

Apoorv Khandelwal @apoorvkh.com · Feb 7

I started a blog! First post is everything I know about setting up (fast, reproducible, error-proof) Python project environments using the latest tools. These methods have saved me a lot of grief. Also a short guide to CUDA in appendix :)

blog.apoorvkh.com/posts/projec...

Managing Project Dependencies

blog.apoorvkh.com

3

Reposted by Adhiraj Ghosh@ACL2025

Andi @andimara.bsky.social · Jan 31

Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s 🔥
Inspired by our team's effort to open-source DeepSeek's R1, we are releasing the training and evaluation code on top of the weights 🫡
Now you can train any SmolVLM—or create your own custom VLMs!

2 5 24

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Jan 27

Added you!

1

Reposted by Adhiraj Ghosh@ACL2025

Paola Cascante-Bonilla @pcascanteb.bsky.social · Jan 23

NLI Improves Compositionality in Vision-Language Models is accepted to #ICLR2025!

CECE enables interpretability and achieves significant improvements in hard compositional benchmarks without fine-tuning (e.g., Winoground, EqBen) and alignment (e.g., DrawBench, EditBench). + info: cece-vlm.github.io

1 2 13

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Jan 12

I feel like my “following” and “popular with friends” feeds are well tuned as I have complete control over them. Just that people still are posting less on bsky and are more active on Twitter. Once that changes (and I think it will), we’ll have the same experience as it is on Twitter right now.

2

Reposted by Adhiraj Ghosh@ACL2025

Sebastian Dziadzio @dziadzio.bsky.social · Dec 11

📄 New Paper: "How to Merge Your Multimodal Models Over Time?"

arxiv.org/abs/2412.06712

Model merging assumes all finetuned models are available at once. But what if they need to be created over time?

We study Temporal Model Merging through the TIME framework to find out!

🧵

How to Merge Your Multimodal Models Over Time?

Model merging combines multiple expert models - finetuned from a base foundation model on diverse tasks and domains - into a single, more capable model. However, most existing model merging approaches...

arxiv.org

1 7 24

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 11

Added you!

1

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

Sure!

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

Welcome, stranger

1

Reposted by Adhiraj Ghosh@ACL2025

Ameya P. @bayesiankitten.bsky.social · Dec 10

How do we benchmark the vast capabilities of foundation models? Introducing ONEBench – a unifying benchmark to test them all, led by
@adhirajghosh.bsky.social and
@dziadzio.bsky.social!⬇️

Sample-level benchmarks could be the new generation- reusable, recombinable & evaluate lots of capabilities!

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

🚨Looking to test your foundation model on an arbitrary and open-ended set of capabilities, not explicitly captured by static benchmarks? 🚨

Check out ✨ONEBench✨, where we show how sample-level evaluation is the solution.

🔎 arxiv.org/abs/2412.06745

1 2

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

This extremely ambitious project would not have been possible without @dziadzio.bsky.social @bayesiankitten.bsky.social @vishaalurao.bsky.social @samuelalbanie.bsky.social and Matthias Bethge!
Special thanks to everyone at @bethgelab.bsky.social, Bo Li, Yujie Lu and Palzer Lama for all your help!

3

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

In summary, we release ONEBench as a valuable tool for comprehensively evaluating foundation models and generating customised benchmarks, in the hopes of sparking a restructuring how benchmarking is done. We plan on publishing the code, benchmark and metadata for capability probing very soon.

1 2

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

Finally, we probe open-ended capabilities by defining a query pool to test, as proof-of-concept, and generating personalised model rankings. Expanding ONEBench can only improve reliability and scale of these queries and we’re excited to extend this framework.
More insights like these in the paper!

1 3

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

Let's look under the hood! ONEBench comprises ONEBench-LLM, and ONEBench-LMM: the largest pool of evaluation samples for foundation models(~50K for LLMs and ~600K for LMMs), spanning various domains and tasks. ONEBench will be continually expanded to accommodate more models and datasets.

1 3

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Dec 10

We compare our Plackett-Luce implementation to ELO and ELO-distribution based ranking methods, not only showing superior correlation to the aggregated mean model scores for each test set but also extremely stable correlations to missing datapoints and missing measurements, even up to 95% sparsity!

1 3