Lightnews — Scholar-powered news

Bill Psomas

@billpsomas.bsky.social

530 followers 200 following 41 posts

MSCA Postdoctoral Fellow @ Visual Recognition Group, CTU in Prague. Deep Learning for Computer Vision. Former IARAI, Inria, Athena RC intern. Photographer. Crossfit freak.

📍Prague, CZ. 🔗 http://users.ntua.gr/psomasbill/

Posts Replies Media Videos

Bill Psomas

@billpsomas.bsky.social

Would love to try

January 13, 2026 at 6:33 PM

Bill Psomas

@billpsomas.bsky.social

Best promo anyone could make for this position 👏🏾🏰 And, amazingly, everything said is true 🎆

January 9, 2026 at 5:36 AM

Bill Psomas

@billpsomas.bsky.social

12/12 Joint work with Giorgos Petsangourakis, Christos Sgouropoulos, Theodoros Giannakopoulos, Giorgos Sfikas, @ikakogeorgiou.bsky.social.

December 27, 2025 at 10:32 AM

Bill Psomas

@billpsomas.bsky.social

11/n Summary🏁

REGLUE shows that the way we leverage VFM semantics matters for diffusion. Combining compact local semantics with global context yields faster convergence and state-of-the-art image generation.

📄arXiv: arxiv.org/abs/2512.16636
💻Project: reglueyourlatents.github.io

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Latent diffusion models (LDMs) achieve state-of-the-art image synthesis, yet their reconstruction-style denoising objective provides only indirect semantic supervision: high-level semantics emerge slo...

arxiv.org

December 27, 2025 at 10:30 AM

Bill Psomas

@billpsomas.bsky.social

10/n Faster convergence🔥

REGLUE (SiT-B/2) achieves 12.9 and 28.7 FID at 400K iterations in conditional and unconditional generation, respectively, outperforming REPA, ReDi, and REG. REGLUE (SiT-XL/2) matches 1M-step SOTA performance in just 700k iterations (~30% fewer steps).

December 27, 2025 at 10:30 AM

Bill Psomas

@billpsomas.bsky.social

9/n Alignment effects ⚓

External alignment complements joint modeling, but its benefits depend on the signal. Local alignment yields consistent gains, whereas global-only alignment can degrade performance. Spatial joint modeling remains the primary driver.

December 27, 2025 at 10:29 AM

Bill Psomas

@billpsomas.bsky.social

8/n Local > Global Semantics🧩

Our analysis shows that jointly modeling with patch-level semantics drives most gains. The global [CLS] helps, but fine-grained spatial features deliver a strongly larger FID improvement, highlighting the importance of local structure for diffusion.

December 27, 2025 at 10:29 AM

Bill Psomas

@billpsomas.bsky.social

7/n Semantic preservation under compression📉

Do compressed patch features retain VFM semantics?

Points show frozen compressed DINOv2 semantics (x: ImageNet top-1 / Cityscapes mIoU) vs SiT-B generation quality (y: ImageNet FID) when trained on VAE latents + compressed features.

December 27, 2025 at 10:29 AM

Bill Psomas

@billpsomas.bsky.social

6/n Non-linear compression matters 💎

Linear PCA can limit patch-level semantics (e.g., ReDi). We introduce a lightweight non-linear semantic compressor that aggregates multi-layer VFM features into a compact, semantics-preserving space, boosting quality (21.4 → 13.3 FID).

December 27, 2025 at 10:28 AM

Bill Psomas

@billpsomas.bsky.social

5/n Our method 🧠

REGLUE puts these into one unified model and jointly models:

1️⃣ VAE latents (pixels)
2️⃣ local semantics (compressed patch features)
3️⃣ global [CLS] (concept)
➕ alignment loss as a complementary auxiliary boost.

December 27, 2025 at 10:28 AM

Bill Psomas

@billpsomas.bsky.social

4/n Main insight 💡

Jointly modeling compressed patch-level semantics ➕ VAE latents provides spatial guidance and yields larger gains than alignment-only (REPA) or global-only (REG).

Alignment loss and a global [CLS] token stay complementary, orthogonal signals.

December 27, 2025 at 10:27 AM

Bill Psomas

@billpsomas.bsky.social

3/n Key design choice 🧩 Compact spatial semantics matter!

To leverage VFMs effectively, diffusion should jointly model VAE latents with multi-layer VFM spatial (patch-level) semantics, via a compact, non-linearly compressed representation.

December 27, 2025 at 10:27 AM

Bill Psomas

@billpsomas.bsky.social

2/n More semantics are needed! ➕

Existing joint modeling and external alignment approaches (e.g., REPA, REG) inject only a “narrow slice” of VFM features into diffusion. We argue richer semantics are needed to unlock their full potential.

December 27, 2025 at 10:26 AM

Bill Psomas

@billpsomas.bsky.social

⬇️ Grab i-CIR, run your method, tell us how it handles instance-level composed image retrieval.

📄 arxiv.org/abs/2510.25387
🧪 github.com/billpsomas/i...

George Retsinas, @nikos-efth.bsky.social, Panagiotis Filntisis, Yannis Avrithis, Petros Maragos, Ondrej Chum, @gtolias.bsky.social.

Instance-Level Composed Image Retrieval

The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training...

arxiv.org

November 6, 2025 at 12:08 PM

Bill Psomas

@billpsomas.bsky.social

A method for i-CIR and CIR in general:

⚡BASIC: training-free pipeline (centering, projection with PCA, textual contextualization, Harris-style fusion) with strong results across i-CIR and class-level CIR benchmarks.

November 6, 2025 at 12:07 PM

Bill Psomas

@billpsomas.bsky.social

Compact ⚖️ but hard 🔥:

📊~750K images, 202 instances, ~1,900 composed queries. Despite small per-query DBs (~3.7K images), i-CIR matches the difficulty of searching with >40M random distractors.

November 6, 2025 at 12:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news