Gabriele Berton
@berton-gabri.bsky.social
650 followers 470 following 140 posts
Postdoc at Amazon on MLLM - ex CMU, PoliTo, IIT https://gmberton.github.io/
Posts Media Videos Starter Packs
berton-gabri.bsky.social
data quality.

The main difference is that DataDecide splits the data according to its data source (usually training datasets are a collection of multiple datasets), while CLIMB creates clusters with each documents embeddings (meaning documents from ...
berton-gabri.bsky.social
large LLM on many subsets would be unfeasibly expensive).

Here some similarities and differences between these two papers:

Both papers split the whole available training data into subsets, train a small LLM on the subsets, and see how this performs: its performance is used as a proxy for ...
berton-gabri.bsky.social
How to select pre-training data for LLMs?

Two papers came out last week from AllenAI and Nvidia that do it in a similar way, building on the intuition that good data is good regardless the size of the LLM.

This intuition can be used to select good data in a cheap manner (training a ...
Reposted by Gabriele Berton
ericzzj.bsky.social
To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition

Davide Sferrazza, @berton-gabri.bsky.social, @gabtriv.bsky.social, Carlo Masone

tl;dr:VPR datasets saturate;re-ranking not good;image matching->uncertainty->inlier counts->confidence

arxiv.org/abs/2504.06116
berton-gabri.bsky.social
When I read a paper, the only way I have to remember something about it six months from now is to use Anki
berton-gabri.bsky.social
Probably nobody knows how to pronounce his name and so they avoid talking about him
berton-gabri.bsky.social
And it gets better... for MCoT (Multimodal Chain-of-Thought) they should say "in recent weeks" 😂
berton-gabri.bsky.social
I find mindblowing that LLM papers should start saying "in recent months" instead of years. OpenAI O1 and DeepSeek R1 are literally a few months old
berton-gabri.bsky.social
The FastAPLoss gave us worse results than average, but again, it was preliminary results with batch size 32.

The SmoothAP and Recall@k are not in the PML so we didn't even consider them (we had already over 30 losses to try). It might be helpful to add your Recall@k to PML :)
berton-gabri.bsky.social
berton-gabri.bsky.social
Yeah intuitively it makes sense to perturb the student's images, not sure why it doesn't work in the 2021 distillation paper.
Someone should make a benchmark for distillation across tasks...
berton-gabri.bsky.social
Yeah intuitively it makes sense to perturb the student's images, not sure why it doesn't work in the 2021 distillation paper.
Someone should make a benchmark for distillation across tasks...
berton-gabri.bsky.social
I believe Beyer et al 2021 distillation paper says the images should be the same for teacher and student
berton-gabri.bsky.social
🚀 Big news! Just got my O-1 visa, booked my flight to San Francisco, and I’m really happy to join Amazon in Palo Alto! Ready for this exciting new chapter 🚀

I'll be doing a PostDoc on Vision-Language Models!
berton-gabri.bsky.social
The line is so blurry...

Two images of the same car are the same instance? (yes)

If it's the same car but re-painted?

If it's the same car but re-made?

If it's two different cars, same model with same color?

If same model, different color?

Same brand, different model?
berton-gabri.bsky.social
Interesting work, happy to see people working on the field!

Also a bit disappointed not to see them comparing with methods that we found to be SOTA on the task, like RoMa and SIFT+LightGlue
berton-gabri.bsky.social
I won't have time to run new experiments (starting a new job on Monday) but if anyone wants to add results with other losses or anything else I'm happy to update the paper :)
berton-gabri.bsky.social
Interesting point, are you referring to e.g. the FastAPLoss?

To be fair, our preliminary results, which were used to select the shortlist of 12 losses (out of 34, all those in the pytorch-metric-learning library), were run on a batch size of 32, so there's a chance we missed out on good losses
berton-gabri.bsky.social
I think I see your point, for you image retrieval is about retrieving an image of exactly the same object (e.g. exactly that one car, not a car of the same model)?

Then isn't that instance retrieval?

But anyway, naming conventions are very blurry in our field
berton-gabri.bsky.social
Also, the paper is only on arxiv, we have no plans to submit, and the code is super simple

If anyone wants to add results we're pretty flexible with it, and we can add new authors

My main goal is to have a good reference paper for anyone doing retrieval, so I'm happy to update the paper as needed
berton-gabri.bsky.social
And I'd call GLD, Oxford, etc "landmark retrieval" 😆
To be fair they're all image retrieval datasets, but GLD-oxford and CUB-Cars are just different subcategories of it

The nice things about the datasets we used is that train-test sets are well defined, whereas e.g. oxford, paris have no train sets
berton-gabri.bsky.social
I'll have to pay a visit 🪴
berton-gabri.bsky.social
The one and only fern! Where is it?

While writing this I've realized that fern is an anagram of NeRF, definitely not a coincidence