A good read for building secure AI!
arxiv.org/pdf/2503.18813
A good read for building secure AI!
arxiv.org/pdf/2503.18813
Link: developer.nvidia.com/blog/how-to-...
Link: developer.nvidia.com/blog/how-to-...
Surprisingly, when I used 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', the output contained tokens/words introduced in the S1 paper. LOL! 🤣
Surprisingly, when I used 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', the output contained tokens/words introduced in the S1 paper. LOL! 🤣
Code: github.com/ducnh279/grp...
- Base model: Qwen2.5-1.5B
- Dataset: GSM8K (math)
- Reward functions: Format, Accuracy
- Objective: GRPO
- PEFT: LoRA
Code: github.com/ducnh279/grp...
- Base model: Qwen2.5-1.5B
- Dataset: GSM8K (math)
- Reward functions: Format, Accuracy
- Objective: GRPO
- PEFT: LoRA
To prevent reward hacking, a repetition penalty is added to the cosine reward.
Paper: arxiv.org/pdf/2502.03373
Hackable Code: github.com/eddycmu/demy...
To prevent reward hacking, a repetition penalty is added to the cosine reward.
Paper: arxiv.org/pdf/2502.03373
Hackable Code: github.com/eddycmu/demy...
🎁 Sebastian Raschka: "reasoning models"
🎁 Sebastian Raschka: "reasoning models"