narsilou.bsky.social
@narsilou.bsky.social
Please, the world needs more videos like the color one. I really enjoyed it, learned a ton of stuff from it.

Rust is just a tool, don't care too much about the views, the algorithm sucks and all platforms are filled with bots anyway.
July 2, 2025 at 7:56 AM
That's for sure. And I think aligns with my thinking. The LLMs are good at getting the "unspecified" part of what we ask of it. But we need the compiler/type checker, that is mathematical and rigorous alongside, to guide it.
May 9, 2025 at 3:49 PM
Cursor with claude has been surprisingly good at getting things right in 2/3 shots. I haven't enough experience with others to judge.
May 9, 2025 at 3:47 PM
I had the same observation.

Even in Rust, if I don't hand hold carefully the LLM it will tend to spiral out of control producing random crap.

But not unlike a junior, if you explain carefully what you want, it tends to get it correct, or self correct relatively OK. Just don't ask too much at once.
May 8, 2025 at 2:51 PM
Well TS goes only so far, any use of `any` (which is unfortunately quite common) and you lost all the benefits.
Also there are no runtime checks for the types, happened too many times for me, that the culprit was not my codebase but my sanitation of browser data (which TS doesn't protect against)
.
May 8, 2025 at 2:49 PM
Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments.
December 10, 2024 at 10:10 AM
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead is ~5us. Thanks Daniel de kok for the beast data structure
December 10, 2024 at 10:10 AM
3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime.
December 10, 2024 at 10:09 AM