Lightnews — Scholar-powered news

Light up
your news

About Privacy Terms Help

Sumit

Sumit

@reachsumit.com

190 followers 36 following 2.1K posts

Senior MLE at Meta. Trying to keep up with the Information Retrieval domain!

Blog: https://blog.reachsumit.com/
Newsletter: https://recsys.substack.com/

Posts Replies Media Videos

Pinned

Sumit @reachsumit.com · 3d

I published Vol. 131 of "Top Information Retrieval Papers of the Week" on Substack.
🔗 recsys.substack.com/p/a-gpu-nati...

A GPU-Native Framework for Billion-Scale Recommendation, Efficient Document Reranking Through Group-Level Scoring, and More!

Vol.131 for Nov 17 - Nov 23, 2025

recsys.substack.com

Sumit

@reachsumit.com

Generative Early Stage Ranking

Meta presents an early-stage ranking with Mixture of Attention modules that capture explicit cross-signals via Hard Matching Attention and implicit signals through target-aware self-attention and cross-attention mechanisms.

📝 arxiv.org/abs/2511.21095

Generative Early Stage Ranking

Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item decoupling"...

November 27, 2025 at 4:18 AM

Sumit

@reachsumit.com

Semantics Meet Signals: Dual Codebook Representation Learning for Generative Recommendation

Adaptively allocates tokens between collaborative filtering and semantic codebooks using mixture-of-experts to balance memorization and generalization.

📝 arxiv.org/abs/2511.20673

Semantics Meet Signals: Dual Codebook Representationl Learning for Generative Recommendation

Generative recommendation has recently emerged as a powerful paradigm that unifies retrieval and generation, representing items as discrete semantic tokens and enabling flexible sequence modeling with...

November 27, 2025 at 4:17 AM

Sumit

@reachsumit.com

E-GEO: A Testbed for Generative Engine Optimization in E-Commerce

Presents a benchmark for e-commerce generative engine optimization with 7000+ realistic product queries, showing that optimization-based rewriting strategies substantially outperform heuristic methods.

📝 arxiv.org/abs/2511.20867

E-GEO: A Testbed for Generative Engine Optimization in E-Commerce

With the rise of large language models (LLMs), generative engines are becoming powerful alternatives to traditional search, reshaping retrieval tasks. In e-commerce, for instance, conversational shopp...

November 27, 2025 at 4:15 AM

Sumit

@reachsumit.com

Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval

Presents an OCR-free multimodal retrieval system using pyramid indexing that achieves strong performance with only 17-27 vectors per page compared to 1024 for patch-based approaches.

📝 arxiv.org/abs/2511.21121

Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval

Document centric RAG pipelines usually begin with OCR, followed by brittle heuristics for chunking, table parsing, and layout reconstruction. These text first workflows are costly to maintain, sensiti...

November 27, 2025 at 4:14 AM

Sumit

@reachsumit.com

RIA: A Ranking-Infused Approach for Optimized listwise CTR Prediction

Meituan introduces a framework that integrates pointwise and listwise evaluation for click-through rate prediction, combining fine-grained modeling with hierarchical item dependencies.

📝 arxiv.org/abs/2511.21394

RIA: A Ranking-Infused Approach for Optimized listwise CTR Prediction

Reranking improves recommendation quality by modeling item interactions. However, existing methods often decouple ranking and reranking, leading to weak listwise evaluation models that suffer from com...

November 27, 2025 at 4:13 AM

Sumit

@reachsumit.com

Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design

Proposes a speculation-based framework that reduces latency in LLM search agents through adaptive two-phase speculation and two-level scheduling mechanisms.

📝 arxiv.org/abs/2511.20048

Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design

LLM-based search agents achieve strong performance but suffer from severe latency, as each step requires serialized LLM reasoning followed by action of tool execution. We revisit this bottleneck throu...

November 26, 2025 at 4:07 AM

Sumit

@reachsumit.com

E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems

Introduces a training-free recommendation framework that combines LLM-based semantic embeddings with collaborative filtering in a two-stage process.

📝 arxiv.org/abs/2511.20564

E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems

Graph Neural Networks (GNNs) have emerged as powerful tools for modeling graph-structured data and have been widely used in recommender systems, such as for capturing complex user-item and item-item r...

November 26, 2025 at 4:05 AM

Sumit

@reachsumit.com

NVIDIA Nemotron Parse 1.1

NVIDIA introduces a lightweight document parsing and OCR model with improved capabilities across general OCR, markdown formatting, structured table parsing, and text extraction from pictures, charts and diagrams.

📝 arxiv.org/abs/2511.20478
👨🏽‍💻 huggingface.co/nvidia/NVIDI...

NVIDIA Nemotron Parse 1.1

We introduce Nemotron-Parse-1.1, a lightweight document parsing and OCR model that advances the capabilities of its predecessor, Nemoretriever-Parse-1.0. Nemotron-Parse-1.1 delivers improved capabilit...

November 26, 2025 at 4:04 AM

Sumit

@reachsumit.com

SCoTER: Structured Chain-of-Thought Transfer for Enhanced Recommendation

Tencent proposes a framework that transfers LLM reasoning capabilities to recommender systems through automated pattern discovery and structure-preserving integration.

📝 arxiv.org/abs/2511.19514

SCoTER: Structured Chain-of-Thought Transfer for Enhanced Recommendation

Harnessing the reasoning power of Large Language Models (LLMs) for recommender systems is hindered by two fundamental challenges. First, current approaches lack a mechanism for automated, data-driven ...

November 26, 2025 at 4:01 AM

Sumit

@reachsumit.com

R2R: A Route-to-Rerank Post-Training Framework for Multi-Domain Decoder-Only Rerankers

Introduces a domain-adaptive reranking framework combining dynamic expert routing with Entity Abstraction for Generalization to enhance decoder-only rerankers.

📝 arxiv.org/abs/2511.19987

$\text{R}^2\text{R}$: A Route-to-Rerank Post-Training Framework for Multi-Domain Decoder-Only Rerankers

Decoder-only rerankers are central to Retrieval-Augmented Generation (RAG). However, generalist models miss domain-specific nuances in high-stakes fields like finance and law, and naive fine-tuning ca...

November 26, 2025 at 4:00 AM

Sumit

@reachsumit.com

Enhancing Sequential Recommendation with World Knowledge from Large Language Models

Alibaba integrates LLM-derived world knowledge with sequential recommendation models through generation augmented retrieval.

📝 arxiv.org/abs/2511.20177
👨🏽‍💻 anonymous.4open.science/r/GRASP-SRS/

Enhancing Sequential Recommendation with World Knowledge from Large Language Models

Sequential Recommendation System~(SRS) has become pivotal in modern society, which predicts subsequent actions based on the user's historical behavior. However, traditional collaborative filtering-bas...

November 26, 2025 at 3:58 AM

Sumit

@reachsumit.com

DynamiX: Dynamic Resource eXploration for Personalized Ad-Recommendations

Meta introduces a self-supervised framework that dynamically segments users and selectively removes or boosts features, improving inference throughput by 4.2%.

📝 arxiv.org/abs/2511.18331

DynamiX: Dynamic Resource eXploration for Personalized Ad-Recommendations

For online ad-recommendation systems, processing complete user-ad-engagement histories is both computationally intensive and noise-prone. We introduce Dynamix, a scalable, personalized sequence explor...

November 25, 2025 at 7:12 AM

Sumit

@reachsumit.com

NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations

Alibaba introduces a speculative decoding architecture for generative recommendation that integrates self-drafting with model-free verification, achieving 2.6x speedup.

📝 arxiv.org/abs/2511.18793

NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations

Generative Recommendation (GR), powered by Large Language Models (LLMs), represents a promising new paradigm for industrial recommender systems. However, their practical application is severely hinder...

November 25, 2025 at 7:12 AM

Sumit

@reachsumit.com

Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction

Introduces conformal prediction for RAG systems to filter irrelevant context while preserving relevant evidence with statistical guarantees, reducing context size by 2-3x.

📝 arxiv.org/abs/2511.17908

Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction

Retrieval-Augmented Generation (RAG) enhances factual grounding in large language models (LLMs) by incorporating retrieved evidence, but LLM accuracy declines when long or noisy contexts exceed the mo...

November 25, 2025 at 7:10 AM

Sumit

@reachsumit.com

Token-Controlled Re-ranking for Sequential Recommendation via LLMs

Introduces a token-augmented re-ranking framework that empowers users to steer recommendations with precise, attribute-based control while maintaining competitive ranking performance.

📝 arxiv.org/abs/2511.17913

Token-Controlled Re-ranking for Sequential Recommendation via LLMs

The widespread adoption of Large Language Models (LLMs) as re-rankers is shifting recommender systems towards a user-centric paradigm. However, a significant gap remains: current re-rankers often lack...

November 25, 2025 at 7:08 AM

Sumit

@reachsumit.com

Save, Revisit, Retain: A Scalable Framework for Enhancing User Retention in Large-Scale Recommender Systems

Pinterest introduces a lightweight framework for modeling user revisitation behavior that led to a 0.1% lift in active users.

📝 arxiv.org/abs/2511.18013

Save, Revisit, Retain: A Scalable Framework for Enhancing User Retention in Large-Scale Recommender Systems

User retention is a critical objective for online platforms like Pinterest, as it strengthens user loyalty and drives growth through repeated engagement. A key indicator of retention is revisitation, ...

November 25, 2025 at 7:08 AM

Sumit

@reachsumit.com

LLM Reasoning for Cold-Start Item Recommendation

Netflix proposes reasoning strategies that leverage LLMs to address cold-start recommendation challenges, outperforming their production ranking model by up to 8% in certain cases.

📝 arxiv.org/abs/2511.18261

LLM Reasoning for Cold-Start Item Recommendation

Large Language Models (LLMs) have shown significant potential for improving recommendation systems through their inherent reasoning capabilities and extensive knowledge base. Yet, existing studies pre...

November 25, 2025 at 7:07 AM

Sumit

@reachsumit.com

Multi-Agent Collaborative Filtering: Orchestrating Users and Items for Agentic Recommendations

Introduces a framework that instantiates similar users and relevant items as LLM agents with unique profiles for collaborative filtering in agentic systems.

📝 arxiv.org/abs/2511.18413

Multi-Agent Collaborative Filtering: Orchestrating Users and Items for Agentic Recommendations

Agentic recommendations cast recommenders as large language model (LLM) agents that can plan, reason, use tools, and interact with users of varying preferences in web applications. However, most exist...

November 25, 2025 at 7:06 AM

Sumit

@reachsumit.com

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Apple presents a unified framework that performs embedding-based compression and joint optimization in a shared continuous space for retrieval-augmented generation.

📝 arxiv.org/abs/2511.18659

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In this work, we...

November 25, 2025 at 7:05 AM

Sumit

@reachsumit.com

STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models

Alibaba introduces a unified token-based ranking framework that tackles representation and computational bottlenecks in recommendation systems.

📝 arxiv.org/abs/2511.18805

STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models

Ranking models have become an important part of modern personalized recommendation systems. However, significant challenges persist in handling high-cardinality, heterogeneous, and sparse feature spac...

November 25, 2025 at 7:04 AM

Sumit

@reachsumit.com

What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models

Evaluates cross-lingual retrieval interventions, finding that multilingual dense retrieval models outperform lexical methods and contrastive learning improves encoders' alignment.

📝 arxiv.org/abs/2511.19324

What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models

Cross-lingual information retrieval (CLIR) enables access to multilingual knowledge but remains challenging due to disparities in resources, scripts, and weak cross-lingual semantic alignment in embed...

November 25, 2025 at 7:03 AM

Sumit

@reachsumit.com

Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval

Evaluates multilingual LLMs for cross-lingual query expansion, finding that query length determines effective prompting techniques and fine-tuning benefits depend on data similarity

📝 arxiv.org/abs/2511.19325

Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval

Query expansion is the reformulation of a user query by adding semantically related information, and is an essential component of monolingual and cross-lingual information retrieval used to ensure tha...

November 25, 2025 at 7:01 AM

Sumit

@reachsumit.com

Revisiting Feedback Models for HyDE

Improves HyDE's effectiveness by applying traditional feedback models like Rocchio to weight expansion terms from LLM-generated hypothetical documents for BM25 retrieval.

📝 arxiv.org/abs/2511.19349
👨🏽‍💻 github.com/nourj98/hyde...

Revisiting Feedback Models for HyDE

Recent approaches that leverage large language models (LLMs) for pseudo-relevance feedback (PRF) have generally not utilized well-established feedback models like Rocchio and RM3 when expanding querie...

November 25, 2025 at 7:00 AM

Sumit

@reachsumit.com

Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search

Dynamically adapts retrieval resolution per query, reducing memory usage by 1.8x and accelerating search by 5.7x.

📝 arxiv.org/abs/2511.16681
👨🏽‍💻 github.com/FastLM/SPI_V...

Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search

Retrieval-Augmented Generation (RAG) systems have become a dominant approach to augment large language models (LLMs) with external knowledge. However, existing vector database (VecDB) retrieval pipeli...

November 24, 2025 at 4:24 AM

Sumit

@reachsumit.com

RASTP: Representation-Aware Semantic Token Pruning for Generative Recommendation with Semantic Identifiers

Dynamically prunes less informative semantic tokens in generative recommender systems, reducing training time by 26.7%.

📝 arxiv.org/abs/2511.16943
👨🏽‍💻 github.com/Yuzt-zju/RASTP

RASTP: Representation-Aware Semantic Token Pruning for Generative Recommendation with Semantic Identifiers

Generative recommendation systems typically leverage Semantic Identifiers (SIDs), which represent each item as a sequence of tokens that encode semantic information. However, representing item ID with...

November 24, 2025 at 4:22 AM