Sumit
banner
reachsumit.com
Sumit
@reachsumit.com
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain!

Blog: https://blog.reachsumit.com/
Newsletter: https://recsys.substack.com/
Pinned
I published Vol. 131 of "Top Information Retrieval Papers of the Week" on Substack.
🔗 recsys.substack.com/p/a-gpu-nati...
A GPU-Native Framework for Billion-Scale Recommendation, Efficient Document Reranking Through Group-Level Scoring, and More!
Vol.131 for Nov 17 - Nov 23, 2025
recsys.substack.com
Generative Early Stage Ranking

Meta presents an early-stage ranking with Mixture of Attention modules that capture explicit cross-signals via Hard Matching Attention and implicit signals through target-aware self-attention and cross-attention mechanisms.

📝 arxiv.org/abs/2511.21095
Generative Early Stage Ranking
Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item decoupling"...
arxiv.org
November 27, 2025 at 4:18 AM
Semantics Meet Signals: Dual Codebook Representation Learning for Generative Recommendation

Adaptively allocates tokens between collaborative filtering and semantic codebooks using mixture-of-experts to balance memorization and generalization.

📝 arxiv.org/abs/2511.20673
Semantics Meet Signals: Dual Codebook Representationl Learning for Generative Recommendation
Generative recommendation has recently emerged as a powerful paradigm that unifies retrieval and generation, representing items as discrete semantic tokens and enabling flexible sequence modeling with...
arxiv.org
November 27, 2025 at 4:17 AM
E-GEO: A Testbed for Generative Engine Optimization in E-Commerce

Presents a benchmark for e-commerce generative engine optimization with 7000+ realistic product queries, showing that optimization-based rewriting strategies substantially outperform heuristic methods.

📝 arxiv.org/abs/2511.20867
E-GEO: A Testbed for Generative Engine Optimization in E-Commerce
With the rise of large language models (LLMs), generative engines are becoming powerful alternatives to traditional search, reshaping retrieval tasks. In e-commerce, for instance, conversational shopp...
arxiv.org
November 27, 2025 at 4:15 AM
Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval

Presents an OCR-free multimodal retrieval system using pyramid indexing that achieves strong performance with only 17-27 vectors per page compared to 1024 for patch-based approaches.

📝 arxiv.org/abs/2511.21121
Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval
Document centric RAG pipelines usually begin with OCR, followed by brittle heuristics for chunking, table parsing, and layout reconstruction. These text first workflows are costly to maintain, sensiti...
arxiv.org
November 27, 2025 at 4:14 AM
RIA: A Ranking-Infused Approach for Optimized listwise CTR Prediction

Meituan introduces a framework that integrates pointwise and listwise evaluation for click-through rate prediction, combining fine-grained modeling with hierarchical item dependencies.

📝 arxiv.org/abs/2511.21394
RIA: A Ranking-Infused Approach for Optimized listwise CTR Prediction
Reranking improves recommendation quality by modeling item interactions. However, existing methods often decouple ranking and reranking, leading to weak listwise evaluation models that suffer from com...
arxiv.org
November 27, 2025 at 4:13 AM
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design

Proposes a speculation-based framework that reduces latency in LLM search agents through adaptive two-phase speculation and two-level scheduling mechanisms.

📝 arxiv.org/abs/2511.20048
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
LLM-based search agents achieve strong performance but suffer from severe latency, as each step requires serialized LLM reasoning followed by action of tool execution. We revisit this bottleneck throu...
arxiv.org
November 26, 2025 at 4:07 AM
E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems

Introduces a training-free recommendation framework that combines LLM-based semantic embeddings with collaborative filtering in a two-stage process.

📝 arxiv.org/abs/2511.20564
E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems
Graph Neural Networks (GNNs) have emerged as powerful tools for modeling graph-structured data and have been widely used in recommender systems, such as for capturing complex user-item and item-item r...
arxiv.org
November 26, 2025 at 4:05 AM
NVIDIA Nemotron Parse 1.1

NVIDIA introduces a lightweight document parsing and OCR model with improved capabilities across general OCR, markdown formatting, structured table parsing, and text extraction from pictures, charts and diagrams.

📝 arxiv.org/abs/2511.20478
👨🏽‍💻 huggingface.co/nvidia/NVIDI...
NVIDIA Nemotron Parse 1.1
We introduce Nemotron-Parse-1.1, a lightweight document parsing and OCR model that advances the capabilities of its predecessor, Nemoretriever-Parse-1.0. Nemotron-Parse-1.1 delivers improved capabilit...
arxiv.org
November 26, 2025 at 4:04 AM
SCoTER: Structured Chain-of-Thought Transfer for Enhanced Recommendation

Tencent proposes a framework that transfers LLM reasoning capabilities to recommender systems through automated pattern discovery and structure-preserving integration.

📝 arxiv.org/abs/2511.19514
SCoTER: Structured Chain-of-Thought Transfer for Enhanced Recommendation
Harnessing the reasoning power of Large Language Models (LLMs) for recommender systems is hindered by two fundamental challenges. First, current approaches lack a mechanism for automated, data-driven ...
arxiv.org
November 26, 2025 at 4:01 AM
R2R: A Route-to-Rerank Post-Training Framework for Multi-Domain Decoder-Only Rerankers

Introduces a domain-adaptive reranking framework combining dynamic expert routing with Entity Abstraction for Generalization to enhance decoder-only rerankers.

📝 arxiv.org/abs/2511.19987
$\text{R}^2\text{R}$: A Route-to-Rerank Post-Training Framework for Multi-Domain Decoder-Only Rerankers
Decoder-only rerankers are central to Retrieval-Augmented Generation (RAG). However, generalist models miss domain-specific nuances in high-stakes fields like finance and law, and naive fine-tuning ca...
arxiv.org
November 26, 2025 at 4:00 AM
Enhancing Sequential Recommendation with World Knowledge from Large Language Models

Alibaba integrates LLM-derived world knowledge with sequential recommendation models through generation augmented retrieval.

📝 arxiv.org/abs/2511.20177
👨🏽‍💻 anonymous.4open.science/r/GRASP-SRS/
Enhancing Sequential Recommendation with World Knowledge from Large Language Models
Sequential Recommendation System~(SRS) has become pivotal in modern society, which predicts subsequent actions based on the user's historical behavior. However, traditional collaborative filtering-bas...
arxiv.org
November 26, 2025 at 3:58 AM
DynamiX: Dynamic Resource eXploration for Personalized Ad-Recommendations

Meta introduces a self-supervised framework that dynamically segments users and selectively removes or boosts features, improving inference throughput by 4.2%.

📝 arxiv.org/abs/2511.18331
DynamiX: Dynamic Resource eXploration for Personalized Ad-Recommendations
For online ad-recommendation systems, processing complete user-ad-engagement histories is both computationally intensive and noise-prone. We introduce Dynamix, a scalable, personalized sequence explor...
arxiv.org
November 25, 2025 at 7:12 AM
NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations

Alibaba introduces a speculative decoding architecture for generative recommendation that integrates self-drafting with model-free verification, achieving 2.6x speedup.

📝 arxiv.org/abs/2511.18793
NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations
Generative Recommendation (GR), powered by Large Language Models (LLMs), represents a promising new paradigm for industrial recommender systems. However, their practical application is severely hinder...
arxiv.org
November 25, 2025 at 7:12 AM
Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction

Introduces conformal prediction for RAG systems to filter irrelevant context while preserving relevant evidence with statistical guarantees, reducing context size by 2-3x.

📝 arxiv.org/abs/2511.17908
Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction
Retrieval-Augmented Generation (RAG) enhances factual grounding in large language models (LLMs) by incorporating retrieved evidence, but LLM accuracy declines when long or noisy contexts exceed the mo...
arxiv.org
November 25, 2025 at 7:10 AM
Token-Controlled Re-ranking for Sequential Recommendation via LLMs

Introduces a token-augmented re-ranking framework that empowers users to steer recommendations with precise, attribute-based control while maintaining competitive ranking performance.

📝 arxiv.org/abs/2511.17913
Token-Controlled Re-ranking for Sequential Recommendation via LLMs
The widespread adoption of Large Language Models (LLMs) as re-rankers is shifting recommender systems towards a user-centric paradigm. However, a significant gap remains: current re-rankers often lack...
arxiv.org
November 25, 2025 at 7:08 AM
Save, Revisit, Retain: A Scalable Framework for Enhancing User Retention in Large-Scale Recommender Systems

Pinterest introduces a lightweight framework for modeling user revisitation behavior that led to a 0.1% lift in active users.

📝 arxiv.org/abs/2511.18013
Save, Revisit, Retain: A Scalable Framework for Enhancing User Retention in Large-Scale Recommender Systems
User retention is a critical objective for online platforms like Pinterest, as it strengthens user loyalty and drives growth through repeated engagement. A key indicator of retention is revisitation, ...
arxiv.org
November 25, 2025 at 7:08 AM
LLM Reasoning for Cold-Start Item Recommendation

Netflix proposes reasoning strategies that leverage LLMs to address cold-start recommendation challenges, outperforming their production ranking model by up to 8% in certain cases.

📝 arxiv.org/abs/2511.18261
LLM Reasoning for Cold-Start Item Recommendation
Large Language Models (LLMs) have shown significant potential for improving recommendation systems through their inherent reasoning capabilities and extensive knowledge base. Yet, existing studies pre...
arxiv.org
November 25, 2025 at 7:07 AM
Multi-Agent Collaborative Filtering: Orchestrating Users and Items for Agentic Recommendations

Introduces a framework that instantiates similar users and relevant items as LLM agents with unique profiles for collaborative filtering in agentic systems.

📝 arxiv.org/abs/2511.18413
Multi-Agent Collaborative Filtering: Orchestrating Users and Items for Agentic Recommendations
Agentic recommendations cast recommenders as large language model (LLM) agents that can plan, reason, use tools, and interact with users of varying preferences in web applications. However, most exist...
arxiv.org
November 25, 2025 at 7:06 AM
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Apple presents a unified framework that performs embedding-based compression and joint optimization in a shared continuous space for retrieval-augmented generation.

📝 arxiv.org/abs/2511.18659
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In this work, we...
arxiv.org
November 25, 2025 at 7:05 AM
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models

Alibaba introduces a unified token-based ranking framework that tackles representation and computational bottlenecks in recommendation systems.

📝 arxiv.org/abs/2511.18805
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models
Ranking models have become an important part of modern personalized recommendation systems. However, significant challenges persist in handling high-cardinality, heterogeneous, and sparse feature spac...
arxiv.org
November 25, 2025 at 7:04 AM
What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models

Evaluates cross-lingual retrieval interventions, finding that multilingual dense retrieval models outperform lexical methods and contrastive learning improves encoders' alignment.

📝 arxiv.org/abs/2511.19324
What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models
Cross-lingual information retrieval (CLIR) enables access to multilingual knowledge but remains challenging due to disparities in resources, scripts, and weak cross-lingual semantic alignment in embed...
arxiv.org
November 25, 2025 at 7:03 AM
Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval

Evaluates multilingual LLMs for cross-lingual query expansion, finding that query length determines effective prompting techniques and fine-tuning benefits depend on data similarity

📝 arxiv.org/abs/2511.19325
Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval
Query expansion is the reformulation of a user query by adding semantically related information, and is an essential component of monolingual and cross-lingual information retrieval used to ensure tha...
arxiv.org
November 25, 2025 at 7:01 AM
Revisiting Feedback Models for HyDE

Improves HyDE's effectiveness by applying traditional feedback models like Rocchio to weight expansion terms from LLM-generated hypothetical documents for BM25 retrieval.

📝 arxiv.org/abs/2511.19349
👨🏽‍💻 github.com/nourj98/hyde...
Revisiting Feedback Models for HyDE
Recent approaches that leverage large language models (LLMs) for pseudo-relevance feedback (PRF) have generally not utilized well-established feedback models like Rocchio and RM3 when expanding querie...
arxiv.org
November 25, 2025 at 7:00 AM
Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search

Dynamically adapts retrieval resolution per query, reducing memory usage by 1.8x and accelerating search by 5.7x.

📝 arxiv.org/abs/2511.16681
👨🏽‍💻 github.com/FastLM/SPI_V...
Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search
Retrieval-Augmented Generation (RAG) systems have become a dominant approach to augment large language models (LLMs) with external knowledge. However, existing vector database (VecDB) retrieval pipeli...
arxiv.org
November 24, 2025 at 4:24 AM
RASTP: Representation-Aware Semantic Token Pruning for Generative Recommendation with Semantic Identifiers

Dynamically prunes less informative semantic tokens in generative recommender systems, reducing training time by 26.7%.

📝 arxiv.org/abs/2511.16943
👨🏽‍💻 github.com/Yuzt-zju/RASTP
RASTP: Representation-Aware Semantic Token Pruning for Generative Recommendation with Semantic Identifiers
Generative recommendation systems typically leverage Semantic Identifiers (SIDs), which represent each item as a sequence of tokens that encode semantic information. However, representing item ID with...
arxiv.org
November 24, 2025 at 4:22 AM