Eran Sandler
esandler.bsky.social
Eran Sandler
@esandler.bsky.social
Builder, operator and investor. Infra, AI, and product nerd. Trying to make powerful things simple. Opinions are my own.
I wasn't aware of this project. Thanks for sharing.

Looking at it, I did need some finer configurations with multi GPUs including the ability to run multiple models on different GPUs at the same time, so it was rather easy for me to do that.
December 24, 2025 at 11:35 PM
5/ It’s open source: github.com/erans/vllm-j...
If you run vLLM locally (or want to), I’d love feedback on what would make this a daily driver for you: smarter “keep warm”, routing rules, observability, etc.
GitHub - erans/vllm-jukebox: Server that multiplexes multiple LLM models through vLLM backends with automatic model swapping, multi-GPU scheduling, and graceful request draining
Server that multiplexes multiple LLM models through vLLM backends with automatic model swapping, multi-GPU scheduling, and graceful request draining - erans/vllm-jukebox
github.com
December 24, 2025 at 5:14 PM
4/ If this sounds useful (or you just like the idea), please ⭐ the repo - it helps others find it and keeps me shipping improvements:
December 24, 2025 at 5:14 PM
3/ The goal: make model ops boring. Keep your apps/tools pointed at one URL while you experiment freely on a single GPU box, workstation, or small multi-GPU rig - without the “who’s on which port?” chaos.
December 24, 2025 at 5:14 PM
2/ So I built vLLM Jukebox 🎛️
A single endpoint that can serve multiple models and handle switching for you - so model changes feel like switching tabs, not redeploying infrastructure.
December 24, 2025 at 5:14 PM
5/ If you like the direction lclq is going, please star the repo and share your feedback.

Release v0.2.0: github.com/erans/lclq/r...
December 3, 2025 at 2:57 AM
4/ The worker model in lclq is now lighter and easier to tune. Default is 2 workers and you can change it with LCLQ_PUSH_WORKERS.
December 3, 2025 at 2:57 AM
3/ lclq now supports exponential backoff retry, dead letter topics, and GCP compatible JSON payloads. A solid upgrade for event driven development.
December 3, 2025 at 2:57 AM
2/ New in lclq v0.2.0: automatic webhook delivery for GCP Pub/Sub messages. Return 2xx to ack and retries happen automatically on failures.

More info: github.com/erans/lclq/r...
Release Release v0.2.0 · erans/lclq
🎉 lclq v0.2.0 - Push Subscriptions Release 🚀 Major New Feature: GCP Pub/Sub Push Subscriptions lclq now supports automatic HTTP webhook delivery for Pub/Sub messages! Create push subscriptions and ...
github.com
December 3, 2025 at 2:57 AM
5/ If you want PostgreSQL-like behavior with SQLite’s simplicity (embedded, microservices, local dev), pgsqlite is getting closer every release. Try v0.0.19!

github.com/erans/pgsqli...
Release Release v0.0.19 · erans/pgsqlite
Adds 7 missing PostgreSQL catalog tables to improve protocol completeness: pg_collation - Static handler returning 3 standard collations (default, C, POSIX) pg_replication_slots - Empty stub (SQLi...
github.com
November 26, 2025 at 12:02 AM
4/ Added pg_settings with 41 commonly used PostgreSQL config values. More compatibility, fewer surprises when connecting PG-aware clients.
November 26, 2025 at 12:02 AM
3/ Dynamic handlers now power sequences + triggers, pulling from SQLite’s own metadata. More PG tools and ORMs “just work” with pgsqlite.
November 26, 2025 at 12:02 AM
2/ v0.0.19 adds new catalog tables: pg_collation, pg_sequence, pg_trigger, plus stubs for replication + stats. Huge step toward smoother PG wire-protocol support on SQLite.
November 26, 2025 at 12:02 AM
3/💻 New PC page:
Wondering what LLMs your computer can handle?
Check out the new guide - see what runs on PCs with NPUs, GPUs, or plain CPUs.
➡️ selfhostllm.org
SelfHostLLM - GPU Memory Calculator for LLM Inference
Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more.
selfhostllm.org
November 11, 2025 at 6:41 PM
5/ AI doesn’t have to live in the cloud.
Run it yourself.
See what your hardware can really do 💪
🌐 selfhostllm.org
SelfHostLLM - GPU Memory Calculator for LLM Inference
Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more.
selfhostllm.org
November 11, 2025 at 6:41 PM
4/ Why SelfHostLLM?
✅ Privacy-first (no data leaves your device)
✅ Clear compatibility charts
✅ Fast local inference
✅ Simple install guides for GPU, Mac, & Windows
November 11, 2025 at 6:41 PM
2/🧠 New models added:
• K2 Thinking – great for structured reasoning
• IBM Granite – runs on both GPUs & Apple Silicon
Explore what fits your hardware 👇
🔗 selfhostllm.org
SelfHostLLM - GPU Memory Calculator for LLM Inference
Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more.
selfhostllm.org
November 11, 2025 at 6:41 PM
5/ AI resilience made easy.
Keep your agents running, even when your provider says “limit reached.”
👉 Learn more at github.com/erans/lunaro...
GitHub - erans/lunaroute: LunaRoute is a high-performance local proxy for AI coding assistants like Claude Code, OpenAI Codex CLI, and OpenCode. Get complete visibility into every LLM interaction with...
LunaRoute is a high-performance local proxy for AI coding assistants like Claude Code, OpenAI Codex CLI, and OpenCode. Get complete visibility into every LLM interaction with zero-overhead passthro...
github.com
October 29, 2025 at 4:04 PM
4/ You can even failover across different model dialects (e.g., GPT → Claude → Gemini).
Your agent stays active. You stay in control. ⚡
October 29, 2025 at 4:04 PM
3/ No more downtime. No more manual switching.
LunaRoute detects rate or quota limits and routes requests to another model, provider, or even account - automatically.
October 29, 2025 at 4:04 PM