Looking at it, I did need some finer configurations with multi GPUs including the ability to run multiple models on different GPUs at the same time, so it was rather easy for me to do that.
Looking at it, I did need some finer configurations with multi GPUs including the ability to run multiple models on different GPUs at the same time, so it was rather easy for me to do that.
If you run vLLM locally (or want to), I’d love feedback on what would make this a daily driver for you: smarter “keep warm”, routing rules, observability, etc.
If you run vLLM locally (or want to), I’d love feedback on what would make this a daily driver for you: smarter “keep warm”, routing rules, observability, etc.
A single endpoint that can serve multiple models and handle switching for you - so model changes feel like switching tabs, not redeploying infrastructure.
A single endpoint that can serve multiple models and handle switching for you - so model changes feel like switching tabs, not redeploying infrastructure.
Release v0.2.0: github.com/erans/lclq/r...
Release v0.2.0: github.com/erans/lclq/r...
More info: github.com/erans/lclq/r...
More info: github.com/erans/lclq/r...
github.com/erans/pgsqli...
github.com/erans/pgsqli...
Wondering what LLMs your computer can handle?
Check out the new guide - see what runs on PCs with NPUs, GPUs, or plain CPUs.
➡️ selfhostllm.org
Wondering what LLMs your computer can handle?
Check out the new guide - see what runs on PCs with NPUs, GPUs, or plain CPUs.
➡️ selfhostllm.org
Run it yourself.
See what your hardware can really do 💪
🌐 selfhostllm.org
Run it yourself.
See what your hardware can really do 💪
🌐 selfhostllm.org
✅ Privacy-first (no data leaves your device)
✅ Clear compatibility charts
✅ Fast local inference
✅ Simple install guides for GPU, Mac, & Windows
✅ Privacy-first (no data leaves your device)
✅ Clear compatibility charts
✅ Fast local inference
✅ Simple install guides for GPU, Mac, & Windows
• K2 Thinking – great for structured reasoning
• IBM Granite – runs on both GPUs & Apple Silicon
Explore what fits your hardware 👇
🔗 selfhostllm.org
• K2 Thinking – great for structured reasoning
• IBM Granite – runs on both GPUs & Apple Silicon
Explore what fits your hardware 👇
🔗 selfhostllm.org
Keep your agents running, even when your provider says “limit reached.”
👉 Learn more at github.com/erans/lunaro...
Keep your agents running, even when your provider says “limit reached.”
👉 Learn more at github.com/erans/lunaro...
Your agent stays active. You stay in control. ⚡
Your agent stays active. You stay in control. ⚡
LunaRoute detects rate or quota limits and routes requests to another model, provider, or even account - automatically.
LunaRoute detects rate or quota limits and routes requests to another model, provider, or even account - automatically.