Lightnews — Scholar-powered news

Eran Sandler

@esandler.bsky.social

I wasn't aware of this project. Thanks for sharing.

Looking at it, I did need some finer configurations with multi GPUs including the ability to run multiple models on different GPUs at the same time, so it was rather easy for me to do that.

December 24, 2025 at 11:35 PM

Eran Sandler

@esandler.bsky.social

5/ It’s open source: github.com/erans/vllm-j...
If you run vLLM locally (or want to), I’d love feedback on what would make this a daily driver for you: smarter “keep warm”, routing rules, observability, etc.

GitHub - erans/vllm-jukebox: Server that multiplexes multiple LLM models through vLLM backends with automatic model swapping, multi-GPU scheduling, and graceful request draining

Server that multiplexes multiple LLM models through vLLM backends with automatic model swapping, multi-GPU scheduling, and graceful request draining - erans/vllm-jukebox

github.com

December 24, 2025 at 5:14 PM

Eran Sandler

@esandler.bsky.social

4/ If this sounds useful (or you just like the idea), please ⭐ the repo - it helps others find it and keeps me shipping improvements:

December 24, 2025 at 5:14 PM

Eran Sandler

@esandler.bsky.social

3/ The goal: make model ops boring. Keep your apps/tools pointed at one URL while you experiment freely on a single GPU box, workstation, or small multi-GPU rig - without the “who’s on which port?” chaos.

December 24, 2025 at 5:14 PM

Eran Sandler

@esandler.bsky.social

2/ So I built vLLM Jukebox 🎛️
A single endpoint that can serve multiple models and handle switching for you - so model changes feel like switching tabs, not redeploying infrastructure.

December 24, 2025 at 5:14 PM

Eran Sandler

@esandler.bsky.social

5/ If you like the direction lclq is going, please star the repo and share your feedback.

Release v0.2.0: github.com/erans/lclq/r...

December 3, 2025 at 2:57 AM

Eran Sandler

@esandler.bsky.social

4/ The worker model in lclq is now lighter and easier to tune. Default is 2 workers and you can change it with LCLQ_PUSH_WORKERS.

December 3, 2025 at 2:57 AM

Eran Sandler

@esandler.bsky.social

3/ lclq now supports exponential backoff retry, dead letter topics, and GCP compatible JSON payloads. A solid upgrade for event driven development.

December 3, 2025 at 2:57 AM

Eran Sandler

@esandler.bsky.social

2/ New in lclq v0.2.0: automatic webhook delivery for GCP Pub/Sub messages. Return 2xx to ack and retries happen automatically on failures.

More info: github.com/erans/lclq/r...

Release Release v0.2.0 · erans/lclq

🎉 lclq v0.2.0 - Push Subscriptions Release 🚀 Major New Feature: GCP Pub/Sub Push Subscriptions lclq now supports automatic HTTP webhook delivery for Pub/Sub messages! Create push subscriptions and ...

github.com

December 3, 2025 at 2:57 AM

Eran Sandler

@esandler.bsky.social

5/ If you want PostgreSQL-like behavior with SQLite’s simplicity (embedded, microservices, local dev), pgsqlite is getting closer every release. Try v0.0.19!

github.com/erans/pgsqli...

Release Release v0.0.19 · erans/pgsqlite

Adds 7 missing PostgreSQL catalog tables to improve protocol completeness: pg_collation - Static handler returning 3 standard collations (default, C, POSIX) pg_replication_slots - Empty stub (SQLi...

github.com

November 26, 2025 at 12:02 AM

Eran Sandler

@esandler.bsky.social

4/ Added pg_settings with 41 commonly used PostgreSQL config values. More compatibility, fewer surprises when connecting PG-aware clients.

November 26, 2025 at 12:02 AM

Eran Sandler

@esandler.bsky.social

3/ Dynamic handlers now power sequences + triggers, pulling from SQLite’s own metadata. More PG tools and ORMs “just work” with pgsqlite.

November 26, 2025 at 12:02 AM

Eran Sandler

@esandler.bsky.social

2/ v0.0.19 adds new catalog tables: pg_collation, pg_sequence, pg_trigger, plus stubs for replication + stats. Huge step toward smoother PG wire-protocol support on SQLite.

November 26, 2025 at 12:02 AM

Eran Sandler

@esandler.bsky.social

3/💻 New PC page:
Wondering what LLMs your computer can handle?
Check out the new guide - see what runs on PCs with NPUs, GPUs, or plain CPUs.
➡️ selfhostllm.org

SelfHostLLM - GPU Memory Calculator for LLM Inference

Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more.

selfhostllm.org

November 11, 2025 at 6:41 PM

Eran Sandler

@esandler.bsky.social

5/ AI doesn’t have to live in the cloud.
Run it yourself.
See what your hardware can really do 💪
🌐 selfhostllm.org

SelfHostLLM - GPU Memory Calculator for LLM Inference

Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more.

selfhostllm.org

November 11, 2025 at 6:41 PM

Eran Sandler

@esandler.bsky.social

4/ Why SelfHostLLM?
✅ Privacy-first (no data leaves your device)
✅ Clear compatibility charts
✅ Fast local inference
✅ Simple install guides for GPU, Mac, & Windows

November 11, 2025 at 6:41 PM

Eran Sandler

@esandler.bsky.social

2/🧠 New models added:
• K2 Thinking – great for structured reasoning
• IBM Granite – runs on both GPUs & Apple Silicon
Explore what fits your hardware 👇
🔗 selfhostllm.org

SelfHostLLM - GPU Memory Calculator for LLM Inference

Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more.

selfhostllm.org

November 11, 2025 at 6:41 PM

Eran Sandler

@esandler.bsky.social

5/ AI resilience made easy.
Keep your agents running, even when your provider says “limit reached.”
👉 Learn more at github.com/erans/lunaro...

GitHub - erans/lunaroute: LunaRoute is a high-performance local proxy for AI coding assistants like Claude Code, OpenAI Codex CLI, and OpenCode. Get complete visibility into every LLM interaction with...

LunaRoute is a high-performance local proxy for AI coding assistants like Claude Code, OpenAI Codex CLI, and OpenCode. Get complete visibility into every LLM interaction with zero-overhead passthro...

github.com

October 29, 2025 at 4:04 PM

Eran Sandler

@esandler.bsky.social

4/ You can even failover across different model dialects (e.g., GPT → Claude → Gemini).
Your agent stays active. You stay in control. ⚡

October 29, 2025 at 4:04 PM

Eran Sandler

@esandler.bsky.social

3/ No more downtime. No more manual switching.
LunaRoute detects rate or quota limits and routes requests to another model, provider, or even account - automatically.

October 29, 2025 at 4:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news