@saganite.bsky.social
170 followers 67 following 200 posts
llm tinkerer. entropy cowboy. iconoclast.
Posts Media Videos Starter Packs
saganite.bsky.social
Thanks Tim! I would also mention though, even if you are rewriting 90% of the code, you can still get a 10% speed improvement just by using this feature. People slave away in the cuda mines for a 10% speedup, and here it is, sitting right in front of you, with 10% even in the WORST case.
saganite.bsky.social
I don't do social media so nobody is going to read this, so I'll just @ some of my favorite LLM bsky accounts begging for some reskeets.
@timkellogg.me @cameron.pfiffer.org @timfduffy.com @natolambert.bsky.social @howard.fm
saganite.bsky.social
The change is in our experimental vLLM fork vLLMx for the moment, but we will be submitting a PR to vLLM main shortly.
saganite.bsky.social
There is already support for this in the OpenAI api specification, and this change brings it to vLLM in a much better form. OpenAI is actually the only other provider I'm aware of providing this feature, and it actually results in SLOWER performance, while ours is much faster.
saganite.bsky.social
Blog Post: cascadetech.ai/blog/vllm-pr...
Demo: app.cascadetech.ai

Think: Speculative decoding, but instead of a draft model (slow, complicated, wrong) you have a static text prediction of the output, and a diff algorithm to keep it aligned when it diverges.
VLLM Predicted Outputs
cascadetech.ai
saganite.bsky.social
I would like to share some work we've been doing at cascadetech.ai: Predicted Outputs in vLLM. If you aren't familiar with PO, it allows you to dramatically speed up generation when you know something about the contents of the output (think: code modification).
saganite.bsky.social
These are INCREDIBLY complex simulations, including multi threading, physics simulations, and many billions of floating point operations that have to be deterministic down to the last bit of the mantissa over the course of hours of play. And they are.
saganite.bsky.social
As a former videogame developer, I can tell you that you can definitely build deterministic software on cpus! One really efficient way to do multiplayer is to replicate input across all nodes and then run a fully deterministic simulation on each node.
saganite.bsky.social
It's hilarious how many accounts on blue sky are just literally an onion article about accounts on blue sky.
saganite.bsky.social
Not to mention the Linux tradition of everything being a text stream is very conducive to LLM integration. I just installed desktop Linux on my new computer, pretty happy with it so far. Mug smoother experience than the last time I tried.
saganite.bsky.social
But there isn't some OTHER festival down the road for people in their 20s. ALL music festivals are for gen x and older millennials. Rock and roll is dying, but even live music is dying with it.
saganite.bsky.social
There is no genre of music where at live shows you see a crowd aged under 30. I just went to a music festival La Route du Rock which would have been 20 year olds 20 years ago. Now it was mostly people over 40.
saganite.bsky.social
Had a long discussion about this with my ethnomusicologist friend last week who teaches history of rock to 18 year olds. Apparently not only are they not forming bands, but they also aren't even attending live music events at all.
saganite.bsky.social
This HAS to be AI slop that humans didn't catch right? Like, this is bullish for gpt-5?
saganite.bsky.social
Wow, amazing. Are you going to explore beyond ulaanbaatar? The country is incredible but the capital city is not remotely representative.
saganite.bsky.social
What they are telling those employees is the honest truth, irrespective of their company's business practices.
saganite.bsky.social
or "Don't bother learning how to effectively use AI, we will still keep employing you even when other prospective employees will work more efficiently for the same salary"?
saganite.bsky.social
or "Don't worry, even as the employees of our competitors adopt AI to be more efficient, we'll just keep doing things the old way so that we don't have to lay anybody off?"
saganite.bsky.social
Sorry, what exactly would YOU say if you were those CEOs? "Don't worry, AI isn't going to affect employment in our industry?"
saganite.bsky.social
have you tried using gemini instead of anthropic? in my experience you can get better quality for a TINY fraction of the price. gemini 2.0 flash lite is 10x cheaper than haiku, and flash 2.0 is like 8x cheaper.
saganite.bsky.social
Meta has a pretty big Zurich office too
saganite.bsky.social
I have had such miserable results with anything cooking related. We did a cocktail night where we drank LLM cocktails and they were so very bad. I feel like llms are in letter counting territory with recipes.