brianarbuckle.bsky.social
@brianarbuckle.bsky.social
It is my understanding that DeepSeek R1 training was done without humans in the loop, pure reinforcement learning (RL). While the breakthrough with ChatGPT was reinforcement learning with human feedback (RLHF). so if that’s the case, there were fewer man hours involved in grading performance.
January 25, 2025 at 3:41 AM
Congrats!
December 2, 2024 at 9:05 PM