Lightnews — Scholar-powered news

llm-d

@llm-d.ai

llm-d is a Kubernetes-native distributed inference serving stack providing well-lit paths for anyone to serve large generative AI models at scale.

Learn more at: https://llm-d.ai

Posts Replies Media Videos

llm-d

@llm-d.ai

Standardizing high-performance inference requires deep ecosystem collaboration. 🚀

Huge shoutout to @vllm_project and @IBMResearch on the new KV Offloading Connector. We’re seeing up to 9x throughput gains on H100s and massive TTFT reductions. 🧵

blog.vllm.ai/2026/01/08/k...

Inside vLLM’s New KV Offloading Connector: Smarter Memory Transfer for Maximizing Inference Throughput

In this post, we will describe the new KV cache offloading feature that was introduced in vLLM 0.11.0. We will focus on offloading to CPU memory (DRAM) and its benefits to improving overall inference…

blog.vllm.ai

January 9, 2026 at 6:45 PM

llm-d

@llm-d.ai

AI inference is like a busy airport: without a controller, you get gridlock. ✈️

Check out this breakdown by Cedric Clyburn from Red Hat on how llm-d intelligently routes distributed LLM requests.

🔹 Solves "round robin" congestion
🔹 Disaggregates P/D to save costs

www.youtube.com/watch?v=CNKG...

LLM‑D Explained: Building Next‑Gen AI with LLMs, RAG & Kubernetes

YouTube video by IBM Technology

www.youtube.com

January 8, 2026 at 7:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news