Sumit
@reachsumit.com
190 followers 36 following 1.8K posts
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain! Blog: https://blog.reachsumit.com/ Newsletter: https://recsys.substack.com/
Posts Media Videos Starter Packs
reachsumit.com
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Generates complex QA pairs from web sources and uses dynamic sliding window management to enable sustained interactions of nearly 100 turns within 32k context.

📝 arxiv.org/abs/2510.08276
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
While recent advances in reasoning models have demonstrated cognitive behaviors through reinforcement learning, existing approaches struggle to invoke deep reasoning capabilities in multi-turn agents ...
arxiv.org
reachsumit.com
QAgent: A modular Search Agent with Interactive Query Understanding

Alibaba introduces a modular search agent that optimizes retrieval through interactive query understanding and multi-round reasoning.

📝 arxiv.org/abs/2510.08383
👨🏽‍💻 github.com/OpenStellarT...
GitHub - OpenStellarTeam/QAgent: the open-source code of QAgent
the open-source code of QAgent. Contribute to OpenStellarTeam/QAgent development by creating an account on GitHub.
github.com
reachsumit.com
Retentive Relevance: Capturing Long-Term User Value in Recommendation Systems

Meta introduces a survey-based measure that captures users' intent to return for similar content, outperforming engagement signals in predicting next-day retention.

📝 arxiv.org/abs/2510.07621
Retentive Relevance: Capturing Long-Term User Value in Recommendation Systems
Recommendation systems have traditionally relied on short-term engagement signals, such as clicks and likes, to personalize content. However, these signals are often noisy, sparse, and insufficient fo...
arxiv.org
reachsumit.com
PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations

Google adapts pre-trained LLMs for large-scale recommendation through Semantic ID tokenization, continued pre-training, and generative retrieval without embedding tables.

📝 arxiv.org/abs/2510.07784
PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations
Large Language Models (LLMs) pose a new paradigm of modeling and computation for information tasks. Recommendation systems are a critical application domain poised to benefit significantly from the se...
arxiv.org
reachsumit.com
TaoSR-AGRL: Adaptive Guided Reinforcement Learning Framework for E-commerce Search Relevance

Taobao presents an adaptive guided RL framework that addresses reward sparsity in e-commerce search through rule-aware reward shaping and replay mechanisms.

📝 arxiv.org/abs/2510.08048
TaoSR-AGRL: Adaptive Guided Reinforcement Learning Framework for E-commerce Search Relevance
Query-product relevance prediction is fundamental to e-commerce search and has become even more critical in the era of AI-powered shopping, where semantic understanding and complex reasoning directly ...
arxiv.org
reachsumit.com
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them

Identifies four key reasoning behaviors for agentic search and proposes Behavior Priming that instills these behaviors via supervised fine-tuning before RL.

📝 arxiv.org/abs/2510.06534
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them
Agentic search leverages large language models (LLMs) to interpret complex user information needs and execute a multi-step process of planning, searching, and synthesizing information to provide answe...
arxiv.org
reachsumit.com
LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding

Oracle AI constructs symbolic document graphs to capture layout structure and cross-page dependencies, enabling dynamic retrieval for visually-rich documents

📝 arxiv.org/abs/2510.07233
LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding
Question answering over visually rich documents (VRDs) requires reasoning not only over isolated content but also over documents' structural organization and cross-page dependencies. However, conventi...
arxiv.org
reachsumit.com
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs

Introduces a dynamic evaluation protocol that generates meaning-preserving paraphrases at evaluation time to assess embedding model robustness.

📝 arxiv.org/abs/2510.06730
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Current evaluations of sentence embedding models typically rely on static test beds such as the Massive Text Embedding Benchmark (MTEB). While invaluable, repeated tuning on a fixed suite can inflate ...
arxiv.org
reachsumit.com
LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations

Google uses LLMs as annotators to achieve nuanced content understanding at scale for video recommendations, significantly improving user participation and satisfaction in production systems.

📝 arxiv.org/abs/2510.06657
LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations
This paper presents a case study on deploying Large Language Models (LLMs) as an advanced "annotation" mechanism to achieve nuanced content understanding (e.g., discerning content "vibe") at scale wit...
arxiv.org
reachsumit.com
We examine approaches ranging from simple prompt engineering to sophisticated internal state analysis and establish that each method has distinct trade-offs between accuracy, cost, latency, complexity, and model access requirements.

blog.reachsumit.com/posts/2025/0...
Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3
This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...
blog.reachsumit.com
reachsumit.com
Instead of just analyzing the query, what if we could directly probe the model's own confidence? Part 3 of the Adaptive RAG series explores methods that directly assess LLM confidence and knowledge gaps.
reachsumit.com
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-Style Contexts

Investigate how LLM-based guardrails handle RAG-style contexts with retrieved documents. Finds that adding benign documents changes guardrail judgments in ~11% of input cases.

📝 arxiv.org/abs/2510.05310
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts
With the increasing adoption of large language models (LLMs), ensuring the safety of LLM systems has become a pressing concern. External LLM-based guardrail models have emerged as a popular solution t...
arxiv.org