Sumit
banner
reachsumit.com
Sumit
@reachsumit.com
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain!

Blog: https://blog.reachsumit.com/
Newsletter: https://recsys.substack.com/
Pinned
I published Vol. 138 of "Top Information Retrieval Papers of the Week" on Substack.
๐Ÿ”— recsys.substack.com/p/training-f...
Training-Free Text Embeddings from Frozen LLMs, Scalable Relevance Labeling for Enterprise Search, and More!
Vol.138 for Jan 05 - Jan 11, 2026
recsys.substack.com
RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation

Introduces a Python package that wraps retrieval models as HTTP APIs with automatic query batching and caching for dynamic RAG pipelines.

๐Ÿ“ arxiv.org/abs/2601.10644
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/hltcoe/routir
RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation
Retrieval models are key components of Retrieval-Augmented Generation (RAG) systems, which generate search queries, process the documents returned, and generate a response. RAG systems are often dynam...
arxiv.org
January 16, 2026 at 3:17 AM
M3Searcher: Modular Multimodal Information Seeking Agency with Retrieval-Oriented Reasoning

Huawei presents a modular multimodal information-seeking agent that decouples retrieval from answer generation, optimized with retrieval-oriented rewards.

๐Ÿ“ arxiv.org/abs/2601.09278
M$^3$Searcher: Modular Multimodal Information Seeking Agency with Retrieval-Oriented Reasoning
Recent advances in DeepResearch-style agents have demonstrated strong capabilities in autonomous information acquisition and synthesize from real-world web environments. However, existing approaches r...
arxiv.org
January 15, 2026 at 7:39 AM
EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Introduces a structured self-evolving framework that models deep research as a Finite State Machine, enabling controllable agent adaptation.

๐Ÿ“ arxiv.org/abs/2601.09465
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/QuantaAlpha/...
EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
While LLM-based agents have shown promise for deep research, most existing approaches rely on fixed workflows that struggle to adapt to real-world, open-ended queries. Recent work therefore explores s...
arxiv.org
January 15, 2026 at 7:38 AM
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG

Proposes modifying LLM attention mechanisms with explicit relevance signals from retrieved documents, making RAG systems more robust to noise.

๐Ÿ“ arxiv.org/abs/2601.09028
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/fengranMark/...
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG
The development of large language models (LLMs) has achieved superior performance in a range of downstream tasks, including LLM-based retrieval-augmented generation (RAG). The quality of generated con...
arxiv.org
January 15, 2026 at 7:36 AM
LLMs Meet Isolation Kernel: Lightweight, Learning-free Binary Embeddings for Fast Retrieval

Introduces a learning-free method that transforms LLM embeddings into binary codes using Isolation Kernel, achieving up to 16.7x faster retrieval and 16x lower memory.

๐Ÿ“ arxiv.org/abs/2601.09159
LLMs Meet Isolation Kernel: Lightweight, Learning-free Binary Embeddings for Fast Retrieval
Large language models (LLMs) have recently enabled remarkable progress in text representation. However, their embeddings are typically high-dimensional, leading to substantial storage and retrieval ov...
arxiv.org
January 15, 2026 at 7:34 AM
Why not Collaborative Filtering in Dual View? Bridging Sparse and Dense Models

Presents a plug-and-play framework that aligns sparse and dense collaborative filtering views to improve recommendation accuracy, especially for long-tail items.

๐Ÿ“ arxiv.org/abs/2601.09286
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/harris26-G/SaD
Why not Collaborative Filtering in Dual View? Bridging Sparse and Dense Models
Collaborative Filtering (CF) remains the cornerstone of modern recommender systems, with dense embedding--based methods dominating current practice. However, these approaches suffer from a critical li...
arxiv.org
January 15, 2026 at 7:33 AM
Structured Knowledge Representation through Contextual Pages for Retrieval-Augmented Generation

Constructs structured "pages" with cognitive outlines and iteratively fills knowledge slots via retrieval, improving RAG performance.

๐Ÿ“ arxiv.org/abs/2601.09402
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/OpenBMB/PAGER
Structured Knowledge Representation through Contextual Pages for Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by incorporating external knowledge. Recently, some works have incorporated iterative knowledge accumulation processes into R...
arxiv.org
January 15, 2026 at 7:32 AM
Unifying Search and Recommendation in LLMs via Gradient Multi-Subspace Tuning

Unifies search and recommendation in LLMs using multi-subspace decomposition to mitigate gradient conflicts and null-space projection to preserve general-domain knowledge.

๐Ÿ“ arxiv.org/abs/2601.09496
Unifying Search and Recommendation in LLMs via Gradient Multi-Subspace Tuning
Search and recommendation (S&R) are core to online platforms, addressing explicit intent through queries and modeling implicit intent from behaviors, respectively. Their complementary roles motivate a...
arxiv.org
January 15, 2026 at 7:30 AM
TEMPO: A Realistic Multi-Domain Benchmark for Temporal Reasoning-Intensive Retrieval

Presents a benchmark combining temporal reasoning with reasoning-intensive retrieval, featuring 1730 queries across 13 domains.

๐Ÿ“ arxiv.org/abs/2601.09523
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป tempo-bench.github.io
TEMPO: A Realistic Multi-Domain Benchmark for Temporal Reasoning-Intensive Retrieval
Existing temporal QA benchmarks focus on simple fact-seeking queries from news corpora, while reasoning-intensive retrieval benchmarks lack temporal grounding. However, real-world information needs of...
arxiv.org
January 15, 2026 at 7:29 AM
MM-BRIGHT: A Multi-Task Multimodal Benchmark for Reasoning-Intensive Retrieval

Presents a multimodal benchmark for reasoning-intensive retrieval with 2803 queries across 29 technical domains & 4 tasks of increasing complexity.

๐Ÿ“ arxiv.org/abs/2601.09562
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป mm-bright.github.io
MM-BRIGHT: A Multi-Task Multimodal Benchmark for Reasoning-Intensive Retrieval
Existing retrieval benchmarks primarily consist of text-based queries where keyword or semantic matching is usually sufficient. Many real-world queries contain multimodal elements, particularly, image...
arxiv.org
January 15, 2026 at 7:28 AM
Markovian Pre-Trained Transformer for Next-Item Recommendation

Introduces a Transformer pre-trained entirely on synthetic Markov chains that achieves SOTA recommender performance by fine-tuning only a lightweight input adaptor

๐Ÿ“ arxiv.org/abs/2601.08275
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/BDML-lab/MPT
GitHub - BDML-lab/MPT: Markovian Pre-Trained Transformer for Next-Item Recommendation
Markovian Pre-Trained Transformer for Next-Item Recommendation - BDML-lab/MPT
github.com
January 14, 2026 at 3:56 AM
SVFusion: A CPU-GPU Co-Processing Architecture for Large-Scale Real-Time Vector Search

Presents a GPU-CPU-disk collaborative framework for streaming vector search with hierarchical indexing and workload-aware caching, achieving 20.9ร— higher throughput.

๐Ÿ“ arxiv.org/abs/2601.08528
SVFusion: A CPU-GPU Co-Processing Architecture for Large-Scale Real-Time Vector Search
Approximate Nearest Neighbor Search (ANNS) underpins modern applications such as information retrieval and recommendation. With the rapid growth of vector data, efficient indexing for real-time vector...
arxiv.org
January 14, 2026 at 3:55 AM
Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models

Introduces a retrieval-free framework that injects private domain knowledge into frozen LLMs via a single-token interface.

๐Ÿ“ arxiv.org/abs/2601.08209
Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models
In domains such as biomedicine, materials, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving,...
arxiv.org
January 14, 2026 at 3:54 AM
EmbeddingRWKV: State-Centric Retrieval with Reusable States

@houhaowen et al. propose a unified retrieval paradigm using RWKV's matrix-valued states to bridge embedding and reranking stages, achieving 5.4ร—โ€“44.8ร— speedup.

๐Ÿ“ arxiv.org/abs/2601.07861
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/howard-hou/E...
EmbeddingRWKV: State-Centric Retrieval with Reusable States
Current Retrieval-Augmented Generation (RAG) systems typically employ a traditional two-stage pipeline: an embedding model for initial retrieval followed by a reranker for refinement. However, this pa...
arxiv.org
January 14, 2026 at 3:53 AM
Query Suggestion for Retrieval-Augmented Generation via Dynamic In-Context Learning

Presents a self-learning query suggestion method for agentic RAG that uses dynamic few-shot retrieval to suggest answerable alternatives when user queries fail.

๐Ÿ“ arxiv.org/abs/2601.08105
Query Suggestion for Retrieval-Augmented Generation via Dynamic In-Context Learning
Retrieval-augmented generation with tool-calling agents (agentic RAG) has become increasingly powerful in understanding, processing, and responding to user queries. However, the scope of the grounding...
arxiv.org
January 14, 2026 at 3:51 AM
DยฒPLAN: Dual-Agent Dynamic Global Planning for Complex Retrieval-Augmented Reasoning

Introduces a dual-agent paradigm where a Reasoner constructs and adapts global plans while a Purifier filters retrieval noise, improving multi-hop QA performance.

๐Ÿ“ arxiv.org/abs/2601.08282
D$^2$Plan: Dual-Agent Dynamic Global Planning for Complex Retrieval-Augmented Reasoning
Recent search-augmented LLMs trained with reinforcement learning (RL) can interleave searching and reasoning for multi-hop reasoning tasks. However, they face two critical failure modes as the accumul...
arxiv.org
January 14, 2026 at 3:50 AM
POSIR: Position-Aware Heterogeneous Information Retrieval Benchmark

Introduces a benchmark with 310 datasets across 10 languages to diagnose position bias in retrieval models, revealing that most models exhibit primacy bias.

๐Ÿ“ arxiv.org/abs/2601.08363
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/Ziyang1060/P...
PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark
While dense retrieval models have achieved remarkable success, rigorous evaluation of their sensitivity to the position of relevant information (i.e., position bias) remains largely unexplored. Existi...
arxiv.org
January 14, 2026 at 3:49 AM
Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

Presents a training-free RAG framework that treats retrieved documents as parallel experts, aggregating evidence at decode time via contrastive decoding rather than long-context attention.

๐Ÿ“ arxiv.org/abs/2601.08670
Parallel Context-of-Experts Decoding for Retrieval Augmented Generation
Retrieval Augmented Generation faces a trade-off: concatenating documents in a long prompt enables multi-document reasoning but creates prefill bottlenecks, while encoding document KV caches separatel...
arxiv.org
January 14, 2026 at 3:48 AM
MemRec: Collaborative Memory-Augmented Agentic Recommender System

Decouples reasoning from memory management in LLM-based recommender systems, enabling collaborative signals from user-item graphs rather than isolated memory.

๐Ÿ“ arxiv.org/abs/2601.08816
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/rutgerswisel...
MemRec: Collaborative Memory-Augmented Agentic Recommender System
The evolution of recommender systems has shifted preference storage from rating matrices and dense embeddings to semantic memory in the agentic era. Yet existing agents rely on isolated memory, overlo...
arxiv.org
January 14, 2026 at 3:47 AM
Dr. Zero: Self-Evolving Search Agents without Training Data

Meta introduces a data-free self-evolution framework where a proposer generates diverse questions to train a solver, matching or surpassing supervised search agents on QA benchmarks.

๐Ÿ“ arxiv.org/abs/2601.07055
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/facebookrese...
GitHub - facebookresearch/drzero: Dr. Zero Self-Evolving Search Agents without Training Data
Dr. Zero Self-Evolving Search Agents without Training Data - facebookresearch/drzero
github.com
January 13, 2026 at 5:31 AM
TREEPS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG

Introduces a tree-structured RL framework for agentic RAG that enables step-wise credit assignment via Monte Carlo estimation over descendant outcomes.

๐Ÿ“ arxiv.org/abs/2601.06922
TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG
Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval, and has recently been advanced by reinforcement lear...
arxiv.org
January 13, 2026 at 5:30 AM
Is Agentic RAG worth it? An experimental comparison of RAG approaches

Compares Enhanced RAG (fixed pipelines with modules like rerankers) vs Agentic RAG (LLM-orchestrated) across multiple dimensions, finding neither universally superior but Agentic costs up to 3.6x more.

๐Ÿ“ arxiv.org/abs/2601.07711
Is Agentic RAG worth it? An experimental comparison of RAG approaches
Retrieval-Augmented Generation (RAG) systems are usually defined by the combination of a generator and a retrieval component that extracts textual context from a knowledge base to answer user queries....
arxiv.org
January 13, 2026 at 5:28 AM
Towards Building efficient Routed systems for Retrieval

Introduces a routing-based approach that dynamically selects the most informative query representation in late-interaction models, achieving up to 30x speedup while maintaining performance.

๐Ÿ“ arxiv.org/abs/2601.06389
Towards Building efficient Routed systems for Retrieval
Late-interaction retrieval models like ColBERT achieve superior accuracy by enabling token-level interactions, but their computational cost hinders scalability and integration with Approximate Nearest...
arxiv.org
January 13, 2026 at 5:27 AM
Unleashing the Native Recommendation Potential: LLM-Based Generative Recommendation via Structured Term Identifiers

Kuaishou uses structured textual keywords as item identifiers to enable generative recommendation.

๐Ÿ“ arxiv.org/abs/2601.06798
๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป github.com/ZY0025/GRLM
Unleashing the Native Recommendation Potential: LLM-Based Generative Recommendation via Structured Term Identifiers
Leveraging the vast open-world knowledge and understanding capabilities of Large Language Models (LLMs) to develop general-purpose, semantically-aware recommender systems has emerged as a pivotal rese...
arxiv.org
January 13, 2026 at 5:26 AM
CIRAG: Construction-Integration Retrieval and Adaptive Generation for Multi-hop Question Answering

Uses iterative construction-integration to retrieve core knowledge triples and adaptively expands context granularity for multi-hop reasoning.

๐Ÿ“ arxiv.org/abs/2601.06799
CIRAG: Construction-Integration Retrieval and Adaptive Generation for Multi-hop Question Answering
Triple-based Iterative Retrieval-Augmented Generation (iRAG) mitigates document-level noise for multi-hop question answering. However, existing methods still face limitations: (i) greedy single-path e...
arxiv.org
January 13, 2026 at 5:24 AM