And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder
[1/3]
And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder
[1/3]
We introduce a cache-aware routing that reduces cache misses of MoEs, improving token generation throughput by 2×. Perfect for memory-constrained devices.
We introduce a cache-aware routing that reduces cache misses of MoEs, improving token generation throughput by 2×. Perfect for memory-constrained devices.