Sumit
@reachsumit.com
190 followers 36 following 1.8K posts
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain! Blog: https://blog.reachsumit.com/ Newsletter: https://recsys.substack.com/
Posts Media Videos Starter Packs
reachsumit.com
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them

Identifies four key reasoning behaviors for agentic search and proposes Behavior Priming that instills these behaviors via supervised fine-tuning before RL.

📝 arxiv.org/abs/2510.06534
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them
Agentic search leverages large language models (LLMs) to interpret complex user information needs and execute a multi-step process of planning, searching, and synthesizing information to provide answe...
arxiv.org
reachsumit.com
LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding

Oracle AI constructs symbolic document graphs to capture layout structure and cross-page dependencies, enabling dynamic retrieval for visually-rich documents

📝 arxiv.org/abs/2510.07233
LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding
Question answering over visually rich documents (VRDs) requires reasoning not only over isolated content but also over documents' structural organization and cross-page dependencies. However, conventi...
arxiv.org
reachsumit.com
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs

Introduces a dynamic evaluation protocol that generates meaning-preserving paraphrases at evaluation time to assess embedding model robustness.

📝 arxiv.org/abs/2510.06730
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Current evaluations of sentence embedding models typically rely on static test beds such as the Massive Text Embedding Benchmark (MTEB). While invaluable, repeated tuning on a fixed suite can inflate ...
arxiv.org
reachsumit.com
LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations

Google uses LLMs as annotators to achieve nuanced content understanding at scale for video recommendations, significantly improving user participation and satisfaction in production systems.

📝 arxiv.org/abs/2510.06657
LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations
This paper presents a case study on deploying Large Language Models (LLMs) as an advanced "annotation" mechanism to achieve nuanced content understanding (e.g., discerning content "vibe") at scale wit...
arxiv.org
reachsumit.com
We examine approaches ranging from simple prompt engineering to sophisticated internal state analysis and establish that each method has distinct trade-offs between accuracy, cost, latency, complexity, and model access requirements.

blog.reachsumit.com/posts/2025/0...
Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3
This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...
blog.reachsumit.com
reachsumit.com
Instead of just analyzing the query, what if we could directly probe the model's own confidence? Part 3 of the Adaptive RAG series explores methods that directly assess LLM confidence and knowledge gaps.
reachsumit.com
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-Style Contexts

Investigate how LLM-based guardrails handle RAG-style contexts with retrieved documents. Finds that adding benign documents changes guardrail judgments in ~11% of input cases.

📝 arxiv.org/abs/2510.05310
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts
With the increasing adoption of large language models (LLMs), ensuring the safety of LLM systems has become a pressing concern. External LLM-based guardrail models have emerged as a popular solution t...
arxiv.org
reachsumit.com
MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts

Introduces a framework using multi-head attention to encode exemplars as soft prompts, improving performance over standard RAG while reducing inference cost.

📝 arxiv.org/abs/2510.05363
MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts
Adapting Foundation Models to new domains with limited training data is challenging and computationally expensive. While prior work has demonstrated the effectiveness of using domain-specific exemplar...
arxiv.org
reachsumit.com
Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation

Treats item interaction histories as a native dialect within language models using MoEs to avoid interference between text and catalog modalities.

📝 arxiv.org/abs/2510.05125
Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation
While collaborative filtering delivers predictive accuracy and efficiency, and Large Language Models (LLMs) enable expressive and generalizable reasoning, modern recommendation systems must bring thes...
arxiv.org
reachsumit.com
Scalable In-context Ranking with Generative Models

Google DeepMind introduces a method that reduces attention complexity from quadratic to linear for in-context ranking while matching or outperforming existing listwise rankers.

📝 arxiv.org/abs/2510.05396
Scalable In-context Ranking with Generative Models
In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, a...
arxiv.org
reachsumit.com
AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents

Proposes a framework where LLM agents delegate full-ranking to recommendation tools while leveraging world knowledge for implicit item-item relationship reasoning.

📝 arxiv.org/abs/2510.05598
AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents
Recent agent-based recommendation frameworks aim to simulate user behaviors by incorporating memory mechanisms and prompting strategies, but they struggle with hallucinating non-existent items and ful...
arxiv.org
reachsumit.com
Think Then Embed: Generative Context Improves Multimodal Embedding

Meta introduces a framework where models first generate reasoning traces to explain complex queries, then produce embeddings conditioned on both the original input and the reasoning.

📝 arxiv.org/abs/2510.05014
Think Then Embed: Generative Context Improves Multimodal Embedding
There is a growing interest in Universal Multimodal Embeddings (UME), where models are required to generate task-specific representations. While recent studies show that Multimodal Large Language Mode...
arxiv.org
reachsumit.com
Omni-Embed-Nemotron: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video

NVIDIA introduces a unified retrieval model that handles text, images, audio, and video in a single embedding space, enabling cross-modal and joint-modal search

📝 arxiv.org/abs/2510.03458
Omni-Embed-Nemotron: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video
We present Omni-Embed-Nemotron, a unified multimodal retrieval embedding model developed to handle the increasing complexity of real-world information needs. While Retrieval-Augmented Generation (RAG)...
arxiv.org