palema.bsky.social
@palema.bsky.social
Data Scientist - LLMs, NLP & AWS
Reposted
DeepSeek, a LLM trained for a fraction of the cost of GPT-Xx models, in 2 months for 6 million, on limited GPUs due to export restrictions, and competing head to head. This is crazy.

It's not the AI part I'm excited about, it's the level of efficiency. github.com/deepseek-ai/...
GitHub - deepseek-ai/DeepSeek-V3
Contribute to deepseek-ai/DeepSeek-V3 development by creating an account on GitHub.
github.com
December 31, 2024 at 5:07 PM
Reposted
A new paper, "Let Me Speak Freely" has been spreading rumors that structured generation hurts LLM evaluation performance.

Well, we've taken a look and found serious issue in this paper, and shown, once again, that structured generation *improves* evaluation performance!
November 21, 2024 at 6:33 PM
I really don’t want to talk about that other app. But is anyone else insanely worried about incredibly dumb people there gotten? There’s a mass mental decline happening right in front of our eyes. #BringBackCriticalThinkers
November 18, 2024 at 8:59 AM
This reminds me of OG twitter, before that weird man came to ruin it.
November 15, 2024 at 3:01 PM