Lightnews — Scholar-powered news

@brianarbuckle.bsky.social

13 followers 24 following 3 posts

Posts Replies Media Videos

brianarbuckle.bsky.social

@brianarbuckle.bsky.social

It is my understanding that DeepSeek R1 training was done without humans in the loop, pure reinforcement learning (RL). While the breakthrough with ChatGPT was reinforcement learning with human feedback (RLHF). so if that’s the case, there were fewer man hours involved in grading performance.

January 25, 2025 at 3:41 AM

brianarbuckle.bsky.social

@brianarbuckle.bsky.social

Congrats!

December 2, 2024 at 9:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news