Robert Knight
robertknight2.bsky.social
Robert Knight
@robertknight2.bsky.social
Frontend Developer. Machine learning in Rust. https://github.com/robertknight.
There are a number of different models that this project runs. I ported inference for one of them manually to create a template, then asked Claude to do the remainder, using the first commit as a guide. This worked quite well, although I did need to make some small corrections for readability.
December 29, 2025 at 2:30 PM
CPU performance is broadly similar to ORT (may better or worse depending on the system), but it has the upside of fewer dependencies, a smaller binary size and avoids some of the complications/limitations that come with integrating a large precompiled C++ binary.
December 29, 2025 at 2:30 PM
What was the motivation for the investigation?
December 29, 2025 at 2:17 PM
Though it is unfortunate that the granularity of opt-out is "everything done through the terminal".
December 24, 2025 at 10:03 PM
I can confirm it makes the orange blobs in `cargo build --timings` reports go away.
December 24, 2025 at 10:00 PM
The other main new feature is support for int4 block-quantized models (aka. "q4" models on Hugging Face), which is a necessity for running small LLMs (1-8B) at a reasonable speed on CPUs.
December 13, 2025 at 12:39 PM
This sent me down a rabbit-hole looking up the etymology of `flatMap`. It appears to be because it originated as a convenient composition of "flatten" and "map" functions (ie. `flat(map(x, f))`) - mitp-content-server.mit.edu/books/conten...
https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-15.html#%_sec_2.2.3
December 11, 2025 at 10:38 PM
So much less embarrassing than my Spotify wrapped.
December 3, 2025 at 9:08 PM
Yes, indeed. I would expect WebNN to generally outperform WebGPU for this reason. An occasional exception might be if you can combine operations in a WebGPU shader but the WebNN backend doesn't know how to do this for the higher-level ops.
December 3, 2025 at 4:44 PM
I think I read somewhere that there is an in-development implementation of github.com/gpuweb/gpuwe... that ORT's WebGPU backend can use, which should enable it to use tensor cores, but I haven't attempted it.
Subgroup matrix · Issue #4195 · gpuweb/gpuweb
All major platform APIs have now released a similar extensions for cooperative matrix: Metal introduced simdgroup_matrix in MSL 3.1 HLSL has support in SM6.8 (currently experimental release) SPIR-V...
github.com
December 3, 2025 at 3:22 PM
Seems worth considering. A few users did report issues in pnpm afterwards when a Docker image build succeeded after a pnpm update, but no longer installed the expected scripts. npm would probably need to find a way to fail more loudly if a post-install script gets added.
November 26, 2025 at 3:33 PM