Lightnews — Scholar-powered news

@arylwen.bsky.social

51 followers 1.3K following 1 posts

Posts Replies Media Videos

arylwen.bsky.social

@arylwen.bsky.social

50000-100000 tokens? I am using Linux and Nvidia with lmstudio. I can fit about 64k tokens with a 14b model on a 3090. It takes a few seconds with the 4km quant to about a minute. For longer contexts the model would offload on the cpu and the inference time balloons by an order of magnitude.

March 9, 2025 at 6:24 PM

Reposted

Tim Kellogg

@timkellogg.me

I just tried it on deepseek-r1:32b and the full 607b (or whatever), and both stabilize at a consistent 85% confidence

going bigger/smarter doesn't seem to make it any more or less confident after some point. The sweet spot seems to be 7b-8b

February 9, 2025 at 6:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news