Shyamgopal Karthik
@shyamgopal.bsky.social
430 followers 260 following 7 posts
PhD at Tübingen. Working on post-training diffusion and multimodal models. Previous research interns at Snapchat and Naver Labs. https://sgk98.github.io/
Posts Media Videos Starter Packs
shyamgopal.bsky.social
Wonderful story behind some very nice SSL work!
abursuc.bsky.social
1/ New & old work on self-supervised representation learning (SSL) with ViTs:
MOCA ☕ - Predicting Masked Online Codebook Assignments w/ @spyrosgidaris.bsky.social O. Simeoni, A. Vobecky, @matthieucord.bsky.social, N. Komodakis, @ptrkprz.bsky.social #TMLR #ICLR2025
Grab a ☕ & brace for a story & a🧵
shyamgopal.bsky.social
I'm in Nashville this week attending #CVPR2025. Excited to discuss post-training VLMs and diffusion models!
Reposted by Shyamgopal Karthik
ml4science.bsky.social
We're super happy: Our Cluster of Excellence will continue to receive funding from the German Research Foundation @dfg.de ! Here’s to 7 more years of exciting research at the intersection of #machinelearning and science! Find out more: uni-tuebingen.de/en/research/... #ExcellenceStrategy
The members of the Cluster of Excellence "Machine Learning: New Perspectives for Science" raise their glasses and celebrate securing another funding period.
shyamgopal.bsky.social
Oh yes, nobody tells you in game that there's a tactic in this position. And you need to calculate a sacrifice fully in a game, and not play one move at a time. So it's not too hard to overfit, but doing online tactics well is a necessary but not sufficient condition to play chess well.
shyamgopal.bsky.social
Maybe if people tried to overfit to online tactics ratings, sure. But having good calculation skills and awareness of tactical patterns is essential to being a good chess player, while "leetcode" is not essential to being a good programmer?
Reposted by Shyamgopal Karthik
davidpicard.bsky.social
🚨 New preprint!
How far can we go with ImageNet for Text-to-Image generation? w. @arrijitghosh.bsky.social @lucasdegeorge.bsky.social @nicolasdufour.bsky.social @vickykalogeiton.bsky.social
TL;DR: Train a text-to-image model using 1000 less data in 200 GPU hrs!

📜https://arxiv.org/abs/2502.21318
🧵👇
shyamgopal.bsky.social
These are some ridiculously good results from training tiny T2I models purely on ImageNet! It's almost too good to be true. Do check it out!
nicolasdufour.bsky.social
Check out our latest work on Text-to-Image generation! We've successfully trained a T2I model using only ImageNet data by leveraging captioning and data augmentation.
davidpicard.bsky.social
🚨 New preprint!
How far can we go with ImageNet for Text-to-Image generation? w. @arrijitghosh.bsky.social @lucasdegeorge.bsky.social @nicolasdufour.bsky.social @vickykalogeiton.bsky.social
TL;DR: Train a text-to-image model using 1000 less data in 200 GPU hrs!

📜https://arxiv.org/abs/2502.21318
🧵👇
Reposted by Shyamgopal Karthik
eugenevinitsky.bsky.social
I've been talking about writing this paper to anyone who would listen since 2020. I bombed a bunch of job talks trying to convince companies to work on this. It's so nice to finally just be able to say, yes, self-play RL in a diverse world gives you immense capabilities
arxiv.org/abs/2502.03349
Robust Autonomy Emerges from Self-Play
Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic drivi...
arxiv.org
Reposted by Shyamgopal Karthik
joschkastrueber.bsky.social
🚨Great Models Think Alike and this Undermines AI Oversight🚨
New paper quantifies LM similarity
(1) LLM-as-a-judge favor more similar models🤥
(2) Complementary knowledge benefits Weak-to-Strong Generalization☯️
(3) More capable models have more correlated failures 📈🙀
🧵👇
Reposted by Shyamgopal Karthik
nicolasdufour.bsky.social
ReNO shows that some initial noise are better for some prompts! This is great to improve image generation, but i think it also shows a deeper property of diffusion models.
lucaeyring.bsky.social
Can we enhance the performance of T2I models without any fine-tuning?

We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
#NeurIPS2024
Reposted by Shyamgopal Karthik
fatemehx2.bsky.social
This is maybe my favorite thing I've seen out of #NeurIPS2024.

Head over to HuggingFace and play with this thing. It's quite extraordinary.
lucaeyring.bsky.social
Thanks to @fffiloni.bsky.social and @natanielruiz.bsky.social, we have a running live Demo of ReNO, play around with it here:

🤗: huggingface.co/spaces/fffil...

We are excited to present ReNO at #NeurIPS2024 this week!
Join us tomorrow from 11am-2pm at East Exhibit Hall A-C #1504!
Reposted by Shyamgopal Karthik
lucaeyring.bsky.social
Can we enhance the performance of T2I models without any fine-tuning?

We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
#NeurIPS2024
Reposted by Shyamgopal Karthik
trappmartin.bsky.social
I will present ✌️ BDU workshop papers @ NeurIPS: one by Rui Li (looking for internships) and one by Anton Baumann.

🔗 to extended versions:

1. 🙋 "How can we make predictions in BDL efficiently?" 👉 arxiv.org/abs/2411.18425

2. 🙋 "How can we do prob. active learning in VLMs" 👉 arxiv.org/abs/2412.06014
shyamgopal.bsky.social
After a break of over 2 years, I'm attending a conference again! Excited to attend NeurIPS, even more so to be presenting ReNO, getting inference-time scaling and preference optimization to work for text-to-image generation.
Do reach out if you'd like to chat!
Reposted by Shyamgopal Karthik
patelmaitreya.bsky.social
🚨New Paper Alert🚨

🚀 Introducing FlowChef, "Steering Rectified Flow Models in the Vector Field for Controlled Image Generation"! 🌌✨

- Perform image editing, solve inverse problems, and more.
- Achieved inversion-free, gradient-free, & training-free inference time steering! 🤯

👇👇
Reposted by Shyamgopal Karthik
giffmana.ai
Some recent discussions made me write up a short read on how I think about doing computer vision research when there's clear potential for abuse.

Alternative title: why I decided to stop working on tracking.

Curious about other's thoughts on this.

lb.eyer.be/s/cv-ethics....
shyamgopal.bsky.social
Check out this nice work by @confusezius.bsky.social on designing VLMs for few-shot adaptation!
confusezius.bsky.social
🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist?

Turns out you can, and here is how: arxiv.org/abs/2411.15099

Really excited to this work on multimodal pretraining for my first bluesky entry!

🧵 A short and hopefully informative thread:
Reposted by Shyamgopal Karthik
giffmana.ai
A real-time (or very fast) open-source txt2video model dropped: LTXV.

HF: huggingface.co/Lightricks/L...
Gradio: huggingface.co/spaces/Light...
Github: github.com/Lightricks/L...

Look at that prompt example though. Need to be a proper writer to get that quality.
Reposted by Shyamgopal Karthik
erogol.com
Learning from one continuous video stream

- use a video stream to learn a predictive model
- everything is in pixel space
- update the model less frequently and don’t use momentum optimizer
- pre training with iid improves performance
- continual learning for robots

arxiv.org/html/2312.00...
Learning from One Continuous Video Stream
arxiv.org
Reposted by Shyamgopal Karthik
dziadzio.bsky.social
Here's a fledgling starter pack for the AI community in Tübingen. Let me know if you'd like to be added!

go.bsky.app/NFbVzrA
Tübingen AI
Join the conversation
go.bsky.app