Learn more at: https://llm-d.ai
⚫️ Tiered-Prefix-Cache: We use the new connector to bridge GPU HBM and CPU RAM, creating a massive, multi-tier cache hierarchy.
⚫️ Intelligent Scheduling: Our scheduler now routes requests to pods where KV blocks are already warm (in GPU or CPU).
⚫️ Tiered-Prefix-Cache: We use the new connector to bridge GPU HBM and CPU RAM, creating a massive, multi-tier cache hierarchy.
⚫️ Intelligent Scheduling: Our scheduler now routes requests to pods where KV blocks are already warm (in GPU or CPU).
✅ Subscribe to our new YouTube channel for tutorials & SIG meetings! Details in our latest community update: https://llm-d.ai/blog/llm-d-community-update-june-2025
✅ Subscribe to our new YouTube channel for tutorials & SIG meetings! Details in our latest community update: https://llm-d.ai/blog/llm-d-community-update-june-2025