Yuki Asano
@yukimasano.bsky.social
1.2K followers 56 following 18 posts
Professor at University of Technology Nuremberg Head of Fundamental AI Lab
Posts Media Videos Starter Packs
yukimasano.bsky.social
Today, we release Franca, a new vision Foundation Model that matches and often outperforms DINOv2.
The data, the training code and the model weights are open-source.

This is the result of a close and fun collaboration
@valeoai.bsky.social (in France) and @funailab.bsky.social (in Franconia)🚀
abursuc.bsky.social
1/ Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research.
yukimasano.bsky.social
Our Lab is now also on bsky! 🥳
funailab.bsky.social
Hello world!
We're the Fundamental AI Lab, lead by @yukimasano.bsky.social at the UTN in Nuremberg.

We research computer vision, multimodal learning and adapting Foundation Models! Follow us :)
Reposted by Yuki Asano
andreaspsteiner.bsky.social
🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.

1/7
Reposted by Yuki Asano
dimadamen.bsky.social
Pls RT
Permanent Assistant Professor (Lecturer) position in Computer Vision @bristoluni.bsky.social [DL 6 Jan 2025]
This is a research+teaching permanent post within MaVi group uob-mavi.github.io in Computer Science. Suitable for strong postdocs or exceptional PhD graduates.
t.co/k7sRRyfx9o
1/2
https://tinyurl.com/BristolCVLectureship
t.co
yukimasano.bsky.social
Today we had a joint workshop between our FunAI Lab, UTN and AIST Japan. 13 talks, 1 cake and lots of Bavarian food really get research discussions going!
Towards more collaborations in AI between 🇩🇪 & 🇯🇵.
@hirokatukataoka.bsky.social
yukimasano.bsky.social
Also @phdcomics.bsky.social is on 🦋 👏. slowly nesting here.
phdcomics.com
Marriage vs PhD
yukimasano.bsky.social
Nice 👏! We love small (M)LLMs :) will training code also be released?
thomwolf.bsky.social
Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.

Outperforms all models at similar GPU RAM usage and tokens throughputs

Blog post: huggingface.co/blog/smolvlm
Reposted by Yuki Asano
nanne.bsky.social
Sam next to his poster; I'm still very impressed he did all this for his MSc thesis! #BMVC2024
yukimasano.bsky.social
exactly. hence the new post-(pre)training term perhaps? post-training seems to be a good generic term for the RLHF/preference tuning etc in NLP allenai.org/papers/tulu-.... so by saying post-pretraining, we could emphasize the fact it's unsupervised
allenai.org
yukimasano.bsky.social
"Post-pretraining", "unsupervised domain adaptation" fits, but I think is used for different tasks
yukimasano.bsky.social
This means we can simply send an adapted RGB image to the server to get a personalised output.
We also show that the gains don't just come from adding a new learnable model, but instead from the interplay between the pretrained one and the PGN.
yukimasano.bsky.social
This CNN (e.g. running on a phone) outputs a softmax over a set of learned tokens. These are then combined and used for the adaptation. This allows efficient learning, but also for moving the signal back into pixel-space via pseudo-inverse.
yukimasano.bsky.social
Also known as reprogramming, works from @phillipisola.bsky.social showed that even adjusting singular pixels allows adapting a model. We take this one step further and make the input-only adaptation signal dependent on the image itself: We introduce a lightweight CNN, the Prompt Generation Network.
yukimasano.bsky.social
LoRA is great but one disadvantage is that if you have 1000s of these adapters and want to serve them in an efficient way, it's very difficult: GPUs are inefficient when you e.g. use one adapter for only one sample in a large batch. The solution is to adapt the model strictly in input-space.
yukimasano.bsky.social
LoRA et al. enable personalised model generation and serving, which is crucial as finetuned models still outperform general ones in many tasks. However, serving a base model with many LoRAs is very inefficient! Now, there's a better way: enter Prompt Generation Networks, presented today #BMVC
yukimasano.bsky.social
Hello world!
Is there any tool to sync twitter and bluesky posting?
Reposted by Yuki Asano
csprofkgd.bsky.social
My growing list of #computervision researchers on Bsky.

Missed you? Let me know.

go.bsky.app/M7HGC3Y
Reposted by Yuki Asano
howard.fm
The thingie that brings over your twitter followers worked jolly well for me. Very cool! I am following another 500 people now thanks to that…
chromewebstore.google.com/detail/sky-f...
Sky Follower Bridge - Chrome Web Store
Instantly find and follow the same users from your Twitter follows on Bluesky.
chromewebstore.google.com