Sumit
@reachsumit.com
190 followers
36 following
1.8K posts
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain!
Blog: https://blog.reachsumit.com/
Newsletter: https://recsys.substack.com/
Posts
Media
Videos
Starter Packs
Pinned
Sumit
@reachsumit.com
· 21h
Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3
This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...
blog.reachsumit.com
Sumit
@reachsumit.com
· 1h
The Upside of Bias: Personalizing Long-Tail Item Recommendations with Biased Sampling | ACM Transactions on Recommender Systems
Recommendation systems drive user engagement across social media, streaming platforms,
and e-commerce by learning from past interactions. The relevance of a recommended
item depends on the quality of ...
dl.acm.org
Sumit
@reachsumit.com
· 2h
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them
Agentic search leverages large language models (LLMs) to interpret complex user information needs and execute a multi-step process of planning, searching, and synthesizing information to provide answe...
arxiv.org
Sumit
@reachsumit.com
· 2h
Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models
Despite their remarkable natural language understanding capabilities, Large Language Models (LLMs) have been underutilized for retrieval tasks. We present Search-R3, a novel framework that addresses t...
arxiv.org
Sumit
@reachsumit.com
· 2h
LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding
Question answering over visually rich documents (VRDs) requires reasoning not only over isolated content but also over documents' structural organization and cross-page dependencies. However, conventi...
arxiv.org
Sumit
@reachsumit.com
· 2h
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Current evaluations of sentence embedding models typically rely on static test beds such as the Massive Text Embedding Benchmark (MTEB). While invaluable, repeated tuning on a fixed suite can inflate ...
arxiv.org
Sumit
@reachsumit.com
· 2h
LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations
This paper presents a case study on deploying Large Language Models (LLMs) as an advanced "annotation" mechanism to achieve nuanced content understanding (e.g., discerning content "vibe") at scale wit...
arxiv.org
Sumit
@reachsumit.com
· 2h
Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization
Large language models (LLMs) are increasingly used as rerankers in information retrieval, yet their ranking behavior can be steered by small, natural-sounding prompts. To expose this vulnerability, we...
arxiv.org
Sumit
@reachsumit.com
· 21h
Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3
This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...
blog.reachsumit.com
Sumit
@reachsumit.com
· 21h
Sumit
@reachsumit.com
· 21h
Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3
This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...
blog.reachsumit.com
Sumit
@reachsumit.com
· 23h
Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics
RAG (Retrieval-Augmented Generation) systems and web agents are increasingly evaluated on multi-hop deep search tasks, yet current practice suffers from two major limitations. First, most benchmarks l...
arxiv.org
Sumit
@reachsumit.com
· 23h
Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents
Large language model (LLM) agents increasingly rely on external tools such as search engines to solve complex, multi-step problems, and reinforcement learning (RL) has become a key paradigm for traini...
arxiv.org
Sumit
@reachsumit.com
· 23h
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts
With the increasing adoption of large language models (LLMs), ensuring the safety of LLM systems has become a pressing concern. External LLM-based guardrail models have emerged as a popular solution t...
arxiv.org
Sumit
@reachsumit.com
· 23h
MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts
Adapting Foundation Models to new domains with limited training data is challenging and computationally expensive. While prior work has demonstrated the effectiveness of using domain-specific exemplar...
arxiv.org
Sumit
@reachsumit.com
· 23h
Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation
While collaborative filtering delivers predictive accuracy and efficiency, and Large Language Models (LLMs) enable expressive and generalizable reasoning, modern recommendation systems must bring thes...
arxiv.org
Sumit
@reachsumit.com
· 23h
AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents
Recent agent-based recommendation frameworks aim to simulate user behaviors by incorporating memory mechanisms and prompting strategies, but they struggle with hallucinating non-existent items and ful...
arxiv.org
Sumit
@reachsumit.com
· 23h
Limitations of Current Evaluation Practices for Conversational Recommender Systems and the Potential of User Simulation
Research and development on conversational recommender systems (CRSs) critically depends on sound and reliable evaluation methodologies. However, the interactive nature of these systems poses signific...
arxiv.org
Sumit
@reachsumit.com
· 23h
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision
Agentic Retrieval-Augmented Generation (Agentic RAG) enhances the processing capability for complex tasks through dynamic retrieval and adaptive workflows. Recent advances (e.g., Search-R1) have shown...
arxiv.org
Sumit
@reachsumit.com
· 2d
Think Then Embed: Generative Context Improves Multimodal Embedding
There is a growing interest in Universal Multimodal Embeddings (UME), where models are required to generate task-specific representations. While recent studies show that Multimodal Large Language Mode...
arxiv.org
Sumit
@reachsumit.com
· 2d
Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents
Enabling large language models (LLMs) to utilize search tools offers a promising path to overcoming fundamental limitations such as knowledge cutoffs and hallucinations. Recent work has explored reinf...
arxiv.org
Sumit
@reachsumit.com
· 2d
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Multimodal retrieval-augmented generation (MM-RAG) is a key approach for applying large language models (LLMs) and agents to real-world knowledge bases, yet current evaluations are fragmented, focusin...
arxiv.org
Sumit
@reachsumit.com
· 2d
Omni-Embed-Nemotron: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video
We present Omni-Embed-Nemotron, a unified multimodal retrieval embedding model developed to handle the increasing complexity of real-world information needs. While Retrieval-Augmented Generation (RAG)...
arxiv.org
Sumit
@reachsumit.com
· 2d
GitHub - XuLingnan/RDR2: Code, data and model for the paper "Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness" (EMNLP 2025 Findings).
Code, data and model for the paper "Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness" (EMNLP 2025 Findings). - XuLingnan/RDR2
github.com
Sumit
@reachsumit.com
· 2d
Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards
RAG systems are increasingly deployed in high-stakes domains where users expect outputs to be consistent across semantically equivalent queries. However, existing systems often exhibit significant inc...
arxiv.org