Lightnews — Scholar-powered news

Ivana Balazevic

@ibalazevic.bsky.social

Disentanglement is an intriguing phenomenon that arises in generative latent variable models for reasons that are not fully understood.

If you’re interested in learning why, I highly recommend giving Carl’s blog a read!

Carl Allen @carl-allen.bsky.social · Dec 18

Machine learning has made incredible breakthroughs, but our theoretical understanding lags behind.

We take a step towards unravelling its mystery by explaining why the phenomenon of disentanglement arises in generative latent variable models.

Blog post: carl-allen.github.io/theory/2024/...

December 18, 2024 at 5:08 PM

Reposted by Ivana Balazevic

Aida Nematzadeh

@aidanematzadeh.bsky.social

I am hiring for RS/RE positions! If you are interested in language-flavored multimodal learning, evaluation, or post-training apply here 🦎 boards.greenhouse.io/deepmind/job...

I will also be #NeurIPS2024 so come say hi! (Please email me to find time to chat)

Research Scientist, Language

London, UK

boards.greenhouse.io

December 6, 2024 at 11:07 PM

Reposted by Ivana Balazevic

Lucas Beyer (bl16)

@giffmana.ai

Our big_vision codebase is really good! And it's *the* reference for ViT, SigLIP, PaliGemma, JetFormer, ... including fine-tuning them.

However, it's criminally undocumented. I tried using it outside Google to fine-tune PaliGemma and SigLIP on GPUs, and wrote a tutorial: lb.eyer.be/a/bv_tuto.html

December 3, 2024 at 12:18 AM

Reposted by Ivana Balazevic

Carl Allen

@carl-allen.bsky.social

I think this comes down to the model behind p(x,y). If features of x cause y, e.g. aspects of a website (x) -> clicks (y); age/health -> disease, then p(y|x) is a (regression) fn of x. But if x|y is a distrib'n of different y's (e.g. cats) then p(y|x) is given by Bayes rule (squint at softmax).

December 2, 2024 at 8:20 AM

Reposted by Ivana Balazevic

Dima Damen

@dimadamen.bsky.social

Read our paper:
Context-Aware Multimodal Pretraining

Now on ArXiv

Can you turn vision-language models into strong any-shot models?

Go beyond zero-shot performance in SigLixP (x for context)

Read @confusezius.bsky.social thread below…

And follow Karsten … a rising star!

Karsten Roth @confusezius.bsky.social · Nov 28

🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist?

Turns out you can, and here is how: arxiv.org/abs/2411.15099

Really excited to this work on multimodal pretraining for my first bluesky entry!

🧵 A short and hopefully informative thread:

November 28, 2024 at 5:03 PM

Ivana Balazevic

@ibalazevic.bsky.social

We maintain strong zero-shot transfer of CLIP / SigLIP across model size and data scale, while achieving up to 4x few-shot sample efficiency and up to +16% performance gains!

Fun project with @confusezius.bsky.social, @zeynepakata.bsky.social, @dimadamen.bsky.social and
@olivierhenaff.bsky.social.

Karsten Roth @confusezius.bsky.social · Nov 28

🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist?

Turns out you can, and here is how: arxiv.org/abs/2411.15099

Really excited to this work on multimodal pretraining for my first bluesky entry!

🧵 A short and hopefully informative thread:

November 28, 2024 at 2:43 PM

Reposted by Ivana Balazevic

Marc Lanctot

@sharky6000.bsky.social

Just a heads up to everyone: @deep-mind.bsky.social is unfortunately a fake account and has been reported. Please do not follow it nor repost anything from it.

November 25, 2024 at 11:24 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news