Lightnews — Scholar-powered news

Sumit

@reachsumit.com

190 followers 36 following 1.8K posts

Senior MLE at Meta. Trying to keep up with the Information Retrieval domain! Blog: https://blog.reachsumit.com/ Newsletter: https://recsys.substack.com/

blog.reachsumit.com

Posts Media Videos Starter Packs

Pinned

Sumit @reachsumit.com · 21h

LLMs often don't know what they don't know. They'll confidently generate wrong answers rather than admit uncertainty. This miscalibration makes it difficult to determine when external retrieval is actually necessary.

blog.reachsumit.com/posts/2025/0...

Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3

This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...

blog.reachsumit.com

Sumit @reachsumit.com · 1h

The Upside of Bias: Personalizing Long-Tail Item Recommendations with Biased Sampling

Introduces a training algorithm that prioritizes tail items in user interaction histories through biased sampling.

📝 dl.acm.org/doi/10.1145/...
👨🏽‍💻 github.com/lkp411/Biase...

The Upside of Bias: Personalizing Long-Tail Item Recommendations with Biased Sampling | ACM Transactions on Recommender Systems

Recommendation systems drive user engagement across social media, streaming platforms, and e-commerce by learning from past interactions. The relevance of a recommended item depends on the quality of ...

Sumit @reachsumit.com · 2h

Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them

Identifies four key reasoning behaviors for agentic search and proposes Behavior Priming that instills these behaviors via supervised fine-tuning before RL.

📝 arxiv.org/abs/2510.06534

Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them

Agentic search leverages large language models (LLMs) to interpret complex user information needs and execute a multi-step process of planning, searching, and synthesizing information to provide answe...

Sumit @reachsumit.com · 2h

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

Adapts LLMs to generate search embeddings through CoT reasoning, combining supervised learning with RL for improved retrieval performance.

📝 arxiv.org/abs/2510.07048
👨🏽‍💻 github.com/ytgui/Search...

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

Despite their remarkable natural language understanding capabilities, Large Language Models (LLMs) have been underutilized for retrieval tasks. We present Search-R3, a novel framework that addresses t...

Sumit @reachsumit.com · 2h

LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding

Oracle AI constructs symbolic document graphs to capture layout structure and cross-page dependencies, enabling dynamic retrieval for visually-rich documents

📝 arxiv.org/abs/2510.07233

LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding

Question answering over visually rich documents (VRDs) requires reasoning not only over isolated content but also over documents' structural organization and cross-page dependencies. However, conventi...

Sumit @reachsumit.com · 2h

PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs

Introduces a dynamic evaluation protocol that generates meaning-preserving paraphrases at evaluation time to assess embedding model robustness.

📝 arxiv.org/abs/2510.06730

PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs

Current evaluations of sentence embedding models typically rely on static test beds such as the Massive Text Embedding Benchmark (MTEB). While invaluable, repeated tuning on a fixed suite can inflate ...

Sumit @reachsumit.com · 2h

LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations

Google uses LLMs as annotators to achieve nuanced content understanding at scale for video recommendations, significantly improving user participation and satisfaction in production systems.

📝 arxiv.org/abs/2510.06657

LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations

This paper presents a case study on deploying Large Language Models (LLMs) as an advanced "annotation" mechanism to achieve nuanced content understanding (e.g., discerning content "vibe") at scale wit...

Sumit @reachsumit.com · 2h

Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization

Exposes vulnerabilities in LLM-based reranking systems through a two-stage token optimization method.

📝 arxiv.org/abs/2510.06732
👨🏽‍💻 github.com/glad-lab/RAF

Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization

Large language models (LLMs) are increasingly used as rerankers in information retrieval, yet their ranking behavior can be steered by small, natural-sounding prompts. To expose this vulnerability, we...

Sumit @reachsumit.com · 21h

We examine approaches ranging from simple prompt engineering to sophisticated internal state analysis and establish that each method has distinct trade-offs between accuracy, cost, latency, complexity, and model access requirements.

blog.reachsumit.com/posts/2025/0...

Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3

This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...

blog.reachsumit.com

Sumit @reachsumit.com · 21h

Instead of just analyzing the query, what if we could directly probe the model's own confidence? Part 3 of the Adaptive RAG series explores methods that directly assess LLM confidence and knowledge gaps.

Sumit @reachsumit.com · 21h

LLMs often don't know what they don't know. They'll confidently generate wrong answers rather than admit uncertainty. This miscalibration makes it difficult to determine when external retrieval is actually necessary.

blog.reachsumit.com/posts/2025/0...

Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3

This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...

blog.reachsumit.com

Sumit @reachsumit.com · 23h

Demystifying Deep Search: A Holistic Evaluation with Hint-Free Multi-Hop Questions and Factorised Metrics

Presents a benchmark with hint-free multi-hop questions and controlled Wikipedia sandbox.

📝 arxiv.org/abs/2510.05137

Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics

RAG (Retrieval-Augmented Generation) systems and web agents are increasingly evaluated on multi-hop deep search tasks, yet current practice suffers from two major limitations. First, most benchmarks l...

Sumit @reachsumit.com · 23h

Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents

Introduces Stratified GRPO to address cross-stratum bias in training LLM search agents.

📝 arxiv.org/abs/2510.06214

Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents

Large language model (LLM) agents increasingly rely on external tools such as search engines to solve complex, multi-step problems, and reinforcement learning (RL) has become a key paradigm for traini...

Sumit @reachsumit.com · 23h

RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-Style Contexts

Investigate how LLM-based guardrails handle RAG-style contexts with retrieved documents. Finds that adding benign documents changes guardrail judgments in ~11% of input cases.

📝 arxiv.org/abs/2510.05310

RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts

With the increasing adoption of large language models (LLMs), ensuring the safety of LLM systems has become a pressing concern. External LLM-based guardrail models have emerged as a popular solution t...

Sumit @reachsumit.com · 23h

MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts

Introduces a framework using multi-head attention to encode exemplars as soft prompts, improving performance over standard RAG while reducing inference cost.

📝 arxiv.org/abs/2510.05363

MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts

Adapting Foundation Models to new domains with limited training data is challenging and computationally expensive. While prior work has demonstrated the effectiveness of using domain-specific exemplar...

Sumit @reachsumit.com · 23h

Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation

Treats item interaction histories as a native dialect within language models using MoEs to avoid interference between text and catalog modalities.

📝 arxiv.org/abs/2510.05125

Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation

While collaborative filtering delivers predictive accuracy and efficiency, and Large Language Models (LLMs) enable expressive and generalizable reasoning, modern recommendation systems must bring thes...

Sumit @reachsumit.com · 23h

Scalable In-context Ranking with Generative Models

Google DeepMind introduces a method that reduces attention complexity from quadratic to linear for in-context ranking while matching or outperforming existing listwise rankers.

📝 arxiv.org/abs/2510.05396

Scalable In-context Ranking with Generative Models

In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, a...

Sumit @reachsumit.com · 23h

AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents

Proposes a framework where LLM agents delegate full-ranking to recommendation tools while leveraging world knowledge for implicit item-item relationship reasoning.

📝 arxiv.org/abs/2510.05598

AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents

Recent agent-based recommendation frameworks aim to simulate user behaviors by incorporating memory mechanisms and prompting strategies, but they struggle with hallucinating non-existent items and ful...

Sumit @reachsumit.com · 23h

Limitations of Current Evaluation Practices for Conversational Recommender Systems and the Potential of User Simulation

Introduces a reward-cost framework for measuring user utility in conversational recommender systems.

📝 arxiv.org/abs/2510.05624

Limitations of Current Evaluation Practices for Conversational Recommender Systems and the Potential of User Simulation

Research and development on conversational recommender systems (CRSs) critically depends on sound and reliable evaluation methodologies. However, the interactive nature of these systems poses signific...

Sumit @reachsumit.com · 23h

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision

Xiaohongshu models RAG as a Markov Decision Process with decision-making and execution stages.

📝 arxiv.org/abs/2510.05691
👨🏽‍💻 github.com/sdsxdxl/DecE...

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision

Agentic Retrieval-Augmented Generation (Agentic RAG) enhances the processing capability for complex tasks through dynamic retrieval and adaptive workflows. Recent advances (e.g., Search-R1) have shown...

Sumit @reachsumit.com · 2d

Think Then Embed: Generative Context Improves Multimodal Embedding

Meta introduces a framework where models first generate reasoning traces to explain complex queries, then produce embeddings conditioned on both the original input and the reasoning.

📝 arxiv.org/abs/2510.05014

Think Then Embed: Generative Context Improves Multimodal Embedding

There is a growing interest in Universal Multimodal Embeddings (UME), where models are required to generate task-specific representations. While recent studies show that Multimodal Large Language Mode...

Sumit @reachsumit.com · 2d

Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents

Proposes a two-stage training framework that separates search optimization from answer generation, reducing deficient search behaviors.

📝 arxiv.org/abs/2510.04695
👨🏽‍💻 github.com/yiding-w/DeSA

Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents

Enabling large language models (LLMs) to utilize search tools offers a promising path to overcoming fundamental limitations such as knowledge cutoffs and hallucinations. Recent work has explored reinf...

Sumit @reachsumit.com · 2d

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

Introduces a large-scale benchmark for multimodal RAG built from 70k real PDF pages across 8 domains with 1,600 QA pairs.

📝 arxiv.org/abs/2510.03663
👨🏽‍💻 github.com/SalesforceAI...

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

Multimodal retrieval-augmented generation (MM-RAG) is a key approach for applying large language models (LLMs) and agents to real-world knowledge bases, yet current evaluations are fragmented, focusin...

Sumit @reachsumit.com · 2d

Omni-Embed-Nemotron: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video

NVIDIA introduces a unified retrieval model that handles text, images, audio, and video in a single embedding space, enabling cross-modal and joint-modal search

📝 arxiv.org/abs/2510.03458

Omni-Embed-Nemotron: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video

We present Omni-Embed-Nemotron, a unified multimodal retrieval embedding model developed to handle the increasing complexity of real-world information needs. While Retrieval-Augmented Generation (RAG)...

Sumit @reachsumit.com · 2d

Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness

Incorporates document structure throughout the RAG process, using an LLM-based router to navigate document hierarchies.

📝 arxiv.org/abs/2510.04293
👨🏽‍💻 github.com/XuLingnan/RDR2

GitHub - XuLingnan/RDR2: Code, data and model for the paper "Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness" (EMNLP 2025 Findings).

Code, data and model for the paper "Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness" (EMNLP 2025 Findings). - XuLingnan/RDR2

Sumit @reachsumit.com · 2d

Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards

Introduces PS-GRPO to train RAG generators for consistent outputs across paraphrased queries using group similarity rewards.

📝 arxiv.org/abs/2510.04392

Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards

RAG systems are increasingly deployed in high-stakes domains where users expect outputs to be consistent across semantically equivalent queries. However, existing systems often exhibit significant inc...