Lightnews — Scholar-powered news

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

Achyuta Rajaram, Sarah Schwettmann, Jacob Andreas, Arthur Conmy: Line of Sight: On Linear Representations in VLLMs https://arxiv.org/abs/2506.04706 https://arxiv.org/pdf/2506.04706 https://arxiv.org/html/2506.04706

June 6, 2025 at 6:02 AM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

Shalini Maiti, Lourdes Agapito, Filippos Kokkinos: Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects https://arxiv.org/abs/2504.08125 https://arxiv.org/pdf/2504.08125 https://arxiv.org/html/2504.08125

April 14, 2025 at 5:57 AM

arxiv cs.CV

@arxiv-cs-cv.bsky.social

Muhammad Ali, Salman Khan
Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments
https://arxiv.org/abs/2509.00176

September 3, 2025 at 5:13 PM

AI & ML News

@ai-news.at.thenote.app

Fine-Tuning vLLMs for Document Understanding

Learn how you can fine-tune visual language models for specific tasks

#ai #llm #news

Fine-Tuning vLLMs for Document Understanding

Learn how you can fine-tune visual language models for specific tasks

towardsdatascience.com

May 6, 2025 at 3:30 PM

Constantin Pape

@cppape.bsky.social

Clear illustration of the limitations of VLLMs. We still have a long way to go (most likely going beyond current training and model paradigms) before "solving vision".

Tomer Ullman @tomerullman.bsky.social · Dec 1

thinking of calling this "The Illusion Illusion"

(more examples below)

December 1, 2024 at 6:10 PM

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

show that the knowledge boundary identified by our method for one VLLM can be used as a surrogate boundary for other VLLMs. Code will be released at https://github.com/Chord-Chen-30/VLLM-KnowledgeBoundary [6/6 of https://arxiv.org/abs/2502.18023v1]

February 26, 2025 at 5:58 AM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

arXiv:2504.05810v1 Announce Type: new
Abstract: Direct Preference Optimization (DPO) helps reduce hallucinations in Video Multimodal Large Language Models (VLLMs), but its reliance on offline preference data limits adaptability and fails to capture [1/7 of https://arxiv.org/abs/2504.05810v1]

April 9, 2025 at 6:04 AM

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

videos from YouTube, through rigorous annotation and verification, resulting in a benchmark with 101 videos and 806 question-answer pairs. Using MimeQA, we evaluate state-of-the-art video large language models (vLLMs) and [5/7 of https://arxiv.org/abs/2502.16671v1]

February 25, 2025 at 6:27 AM

arxiv cs.CV

@arxiv-cs-cv.bsky.social

Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos
VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning
https://arxiv.org/abs/2404.07078

April 11, 2024 at 10:11 PM

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

significantly higher defect rates compared to single-turn evaluations, highlighting deeper vulnerabilities in VLLMs. Notably, GPT-4o demonstrated the most balanced performance as measured by our Safety-Usability Index (SUI) followed closely by [6/7 of https://arxiv.org/abs/2505.04673v1]

May 9, 2025 at 5:57 AM

arxiv cs.CV

@arxiv-cs-cv.bsky.social

Shalini Maiti, Lourdes Agapito, Filippos Kokkinos
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
https://arxiv.org/abs/2504.08125

April 14, 2025 at 8:00 AM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

evaluation framework and research roadmap for developing VLLMs that meet the safety and robustness requirements for real-world autonomous systems. We released the benchmark toolbox and the fine-tuned model at: https://github.com/tong-zeng/DVBench.git. [8/8 of https://arxiv.org/abs/2504.14526v1]

April 22, 2025 at 6:09 AM

arXiv cs.AI Artificial Intelligence

@csai-bot.bsky.social

flexibility of state-of-the-art VLLMs (GPT-4o, Gemini-1.5 Pro, and Claude-3.5 Sonnet) using the Wisconsin Card Sorting Test (WCST), a classic measure of set-shifting ability. Our results reveal that VLLMs achieve or surpass human-level set-shifting [2/5 of https://arxiv.org/abs/2505.22112v1]

May 29, 2025 at 5:56 AM

Public Yelling Enthusiast; Lustration Encourager

@delicious-carbs.bsky.social

Computer vision has done as well as it has because the computational design very closely matches known parts of the primate visual cortex.

LLMs, including VLLMs, are clearly not biological analogues.

February 28, 2025 at 4:52 PM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

adversarial examples are highly transferable to widely-used proprietary VLLMs such as GPT-4o, Claude, and Gemini. We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information, such as [3/6 of https://arxiv.org/abs/2505.01050v1]

May 5, 2025 at 6:00 AM

InfoQ

@infoq.com

Are you passionate about the latest in #AI? Here's your chance to shine!

✍️ Join the #InfoQ Annual Article Writing Competition!

🏆 Win a #FreeTicket to #QCon or #InfoQDevSummit!

🔗 bit.ly/4hGgNUn

Explore topics like #LLMs, #SLMs, #vLLMs, #GenAI, #VectorDatabases, #ExplainableAI, #RAG & more!

March 19, 2025 at 4:57 PM

Marina Regula Plagiatora

@poldi9967.bsky.social

was sind denn grad gute clients für openai-server vllms?

May 25, 2025 at 4:23 PM

Benno Krojer

@bennokrojer.bsky.social

It's still a mystery to me how easy we can align vision to the LLM embedding space, which is what 99% of VLLMs do. And it kind of works with just 1-2 MLP layers but is somehow not interpretable (see fig from Clip Clap paper below)

So I'm wondering if anyone has seen papers that study this more?

October 24, 2024 at 6:45 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news