simonpcouch.com, @simonpcouch elsewhere
for source code: github.com/skaltman/mod...
my feeling is "we'll benchmark whatever the default on the API is, don't make me swim through all of your reasoning settings" but we don't strictly do that atm
for source code: github.com/skaltman/mod...
my feeling is "we'll benchmark whatever the default on the API is, don't make me swim through all of your reasoning settings" but we don't strictly do that atm
The crunching of the ~2,000 tok prompt (and thus time to first token) seems to take much longer than the streaming of tokens once it begins! I've been surprised at the tok/s I've been seeing, even without spec dec.
The crunching of the ~2,000 tok prompt (and thus time to first token) seems to take much longer than the streaming of tokens once it begins! I've been surprised at the tok/s I've been seeing, even without spec dec.