giyaibo.bsky.social
@giyaibo.bsky.social
Reposted
DeepSeek-OCR

a tiny 3B-A0.5B MoE OCR model that runs fast on a single A100 40GB with very high precision and excellent compression

why it’s cool — they use images as a way to compress text and get around the O(n^2)

huggingface.co/deepseek-ai/...
October 20, 2025 at 11:12 AM
Reposted
Mistral shared what look like the most detailed numbers yet for the environmental impact of training a frontier LLM - their Mistral Large 2 used 20,4 ktCO₂e and 281,000 m3 of water

I'd love to see numbers like that provided in context though, hard to evaluate alone simonwillison.net/2025/Jul/22/...
Our contribution to a global environmental standard for AI
Mistral have released environmental impact numbers for their largest model, Mistral Large 2, in more detail than I have seen from any of the other large AI labs. The methodology …
simonwillison.net
July 22, 2025 at 9:20 PM
Reposted
June 17, 2025 at 8:56 AM