Lightnews — Scholar-powered news

llm-d

@llm-d.ai

llm-d is a Kubernetes-native distributed inference serving stack providing well-lit paths for anyone to serve large generative AI models at scale.

Learn more at: https://llm-d.ai

Posts Replies Media Videos

llm-d

@llm-d.ai

How we’re using it:

⚫️ Tiered-Prefix-Cache: We use the new connector to bridge GPU HBM and CPU RAM, creating a massive, multi-tier cache hierarchy.

⚫️ Intelligent Scheduling: Our scheduler now routes requests to pods where KV blocks are already warm (in GPU or CPU).

January 9, 2026 at 6:45 PM

llm-d

@llm-d.ai

🚀 Announcing llm-d v0.4! This release focuses on achieving SOTA inference performance across accelerators. From ultra-low latency for MoE models to new auto-scaling capabilities, we’re pushing the boundaries of open-source inference. Blog: https://t.co/qlQnzcT9O3 🧵👇

January 12, 2026 at 3:16 PM

llm-d

@llm-d.ai

🚀 llm-d v0.3.1 is LIVE! 🚀 This patch release is packed with key follow-ups from v0.3.0, including new hardware support, expanded cloud provider integration, and streamlined image builds. Dive into the full changelog: https://t.co/Wh6OGJ0KdO #llmd #OpenSource #vLLM #Release

January 12, 2026 at 3:15 PM

llm-d

@llm-d.ai

🚀 Evolving for Impact! We're updating our llm-d SIG meeting schedule to a bi-weekly cadence. This gives our community more time for deep work between calls, making our sessions even more focused and productive. Here are the details 👇

January 12, 2026 at 3:15 PM

llm-d

@llm-d.ai

We are thrilled to announce the release of llm-d v0.3! 🚀 This release is a huge milestone, powered by our incredible community, as we continue to build wider, well-lit paths for high-performance, hardware-agnostic, and scalable inference. 🧵Let's dive into what's new!

January 12, 2026 at 3:15 PM

llm-d

@llm-d.ai

Running LLMs on Kubernetes? You've likely felt the pain of re-processing the same context tokens over and over (think RAG system prompts). This is a huge source of inefficiency in distributed inference. Let's break down how we're solving this with llm-d. 🧵

January 12, 2026 at 3:15 PM

llm-d

@llm-d.ai

In production LLM inference, this metric matters: KV-Cache hit rate. Why? A cached token is up to 10x cheaper to process than an uncached one. But when you scale out, naive load balancing creates a costly disaster: the "heartbreaking KV-cache miss." https://red.ht/46A4ynW

January 12, 2026 at 3:15 PM

llm-d

@llm-d.ai

The llm-d community is building incredible things! 🚀 Shout-out to Ernest Wong & Sachi Desai from Microsoft for their new blog post pairing llm-d with Retrieval-Augmented Generation (RAG) on Azure Kubernetes Service (AKS)! This is a must-read guide! 👇 https://t.co/DPfRUdTLJB

January 12, 2026 at 3:15 PM

llm-d

@llm-d.ai

Getting started with llm-d v0.2 is now easier than ever! We've launched a full set of quick start guides to walk you through our most powerful features, including P/D disaggregation and deploying large MoE models on Kubernetes. Start here: https://llm-d.ai/docs/guide

January 12, 2026 at 3:14 PM

llm-d

@llm-d.ai

The llm-d community is proud to announce the release of v0.2! Our focus has been on building well-lit paths for large-scale inference on Kubernetes. This release delivers major advancements in performance, scheduling, and support for massive models. https://red.ht/4l4u9uD

January 12, 2026 at 3:14 PM

llm-d

@llm-d.ai

Big news from the llm-d project! Your input on our 5-min survey will define our future roadmap. Plus, we've just launched our YouTube channel with meeting recordings & tutorials. Subscribe and help us build the future of LLM serving! https://llm-d.ai/blog/llm-d-community-update-june-2025

January 12, 2026 at 3:13 PM

llm-d

@llm-d.ai

Two new ways to get involved with the llm-d project! ✅ Help shape our roadmap by taking our 5-min survey on your LLM use cases.
✅ Subscribe to our new YouTube channel for tutorials & SIG meetings! Details in our latest community update: https://llm-d.ai/blog/llm-d-community-update-june-2025

January 12, 2026 at 3:13 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news