Lightnews — Scholar-powered news

Bill Psomas

@billpsomas.bsky.social

530 followers 200 following 41 posts

MSCA Postdoctoral Fellow @ Visual Recognition Group, CTU in Prague. Deep Learning for Computer Vision. Former IARAI, Inria, Athena RC intern. Photographer. Crossfit freak.

📍Prague, CZ. 🔗 http://users.ntua.gr/psomasbill/

Posts Replies Media Videos

Bill Psomas

@billpsomas.bsky.social

🚀New task: Instance-level Image+Text→Image Retrieval

🔎Given a query image + an edit (“during night”), retrieve the same specific instance after the change — not just any similar object.

🛢New dataset on HF: i-CIR huggingface.co/datasets/bil...

🔥Download, run, and share results!

January 6, 2026 at 8:00 PM

Bill Psomas

@billpsomas.bsky.social

10/n Faster convergence🔥

REGLUE (SiT-B/2) achieves 12.9 and 28.7 FID at 400K iterations in conditional and unconditional generation, respectively, outperforming REPA, ReDi, and REG. REGLUE (SiT-XL/2) matches 1M-step SOTA performance in just 700k iterations (~30% fewer steps).

December 27, 2025 at 10:30 AM

Bill Psomas

@billpsomas.bsky.social

7/n Semantic preservation under compression📉

Do compressed patch features retain VFM semantics?

Points show frozen compressed DINOv2 semantics (x: ImageNet top-1 / Cityscapes mIoU) vs SiT-B generation quality (y: ImageNet FID) when trained on VAE latents + compressed features.

December 27, 2025 at 10:29 AM

Bill Psomas

@billpsomas.bsky.social

6/n Non-linear compression matters 💎

Linear PCA can limit patch-level semantics (e.g., ReDi). We introduce a lightweight non-linear semantic compressor that aggregates multi-layer VFM features into a compact, semantics-preserving space, boosting quality (21.4 → 13.3 FID).

December 27, 2025 at 10:28 AM

Bill Psomas

@billpsomas.bsky.social

5/n Our method 🧠

REGLUE puts these into one unified model and jointly models:

1️⃣ VAE latents (pixels)
2️⃣ local semantics (compressed patch features)
3️⃣ global [CLS] (concept)
➕ alignment loss as a complementary auxiliary boost.

December 27, 2025 at 10:28 AM

Bill Psomas

@billpsomas.bsky.social

4/n Main insight 💡

Jointly modeling compressed patch-level semantics ➕ VAE latents provides spatial guidance and yields larger gains than alignment-only (REPA) or global-only (REG).

Alignment loss and a global [CLS] token stay complementary, orthogonal signals.

December 27, 2025 at 10:27 AM

Bill Psomas

@billpsomas.bsky.social

1/n REGLUE Your Latents! 🚀

We introduce REGLUE: a unified framework that entangles VAE latents ➕ Global ➕ Local semantics for faster, higher-fidelity image generation.

Links (paper + code) at the end👇

December 27, 2025 at 10:26 AM

Bill Psomas

@billpsomas.bsky.social

Heading to #NeurIPS2025?
Come by Poster Session 6, Fri 16:30, #4514 🧵
We present instance-level composed image retrieval, the new i-CIR dataset, and our training-free method BASIC.
Drop in and say hi!

December 4, 2025 at 5:31 PM

Bill Psomas

@billpsomas.bsky.social

A method for i-CIR and CIR in general:

⚡BASIC: training-free pipeline (centering, projection with PCA, textual contextualization, Harris-style fusion) with strong results across i-CIR and class-level CIR benchmarks.

November 6, 2025 at 12:07 PM

Bill Psomas

@billpsomas.bsky.social

Compact ⚖️ but hard 🔥:

📊~750K images, 202 instances, ~1,900 composed queries. Despite small per-query DBs (~3.7K images), i-CIR matches the difficulty of searching with >40M random distractors.

November 6, 2025 at 12:05 PM

Bill Psomas

@billpsomas.bsky.social

How i-CIR is structured:

🗂️ Per instance we share a database and define:

- composed positives (same object + modification)
- hard negatives:
- visual (same/similar object, wrong text)
- textual (right text, wrong instance)
- composed (near-miss on both).

November 6, 2025 at 12:04 PM

Bill Psomas

@billpsomas.bsky.social

Why this matters:

🔎 Gap in the community: Existing CIR benchmarks are class-level, ambiguous, without explicit hard negatives, and often reward text-only behaviour. We needed a dataset that truly requires both image and text, at the instance level. i-CIR fills that gap.

November 6, 2025 at 12:03 PM

Bill Psomas

@billpsomas.bsky.social

🎉 Instance-level Composed Image Retrieval @ #NeurIPS2025

🎨 Task: given (image of an object instance) + (text modification), retrieve photos of that exact instance under the change.

E.g.: Temple of Poseidon 🏛️ ➕ during sunset 🌅

📦 Project page: vrg.fel.cvut.cz/icir/

November 6, 2025 at 12:02 PM

Bill Psomas

@billpsomas.bsky.social

The Colloquium begins!

April 10, 2025 at 9:07 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news