Duc Nguyen Huu
ducnh279.bsky.social
Duc Nguyen Huu
@ducnh279.bsky.social
Data Science in ♥️ Home in 🇻🇳
I find making your agents safe is just as important as making them smart. 🔒

A good read for building secure AI!

arxiv.org/pdf/2503.18813
March 31, 2025 at 12:47 PM
There will be one day ... in 🇺🇸 or 🇻🇳
March 30, 2025 at 7:37 PM
Have you ever wondered what are the differences between 2 types of knowledge distillation (for LLMs) these days?

Link: developer.nvidia.com/blog/how-to-...
February 11, 2025 at 8:17 PM
Thanks for sharing with us, Seb!

Surprisingly, when I used 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', the output contained tokens/words introduced in the S1 paper. LOL! 🤣
February 11, 2025 at 2:03 PM
My learning project: Implement a Reasoning Model with GRPO from scratch

Code: github.com/ducnh279/grp...

- Base model: Qwen2.5-1.5B
- Dataset: GSM8K (math)
- Reward functions: Format, Accuracy
- Objective: GRPO
- PEFT: LoRA
February 10, 2025 at 5:36 PM
𝐂𝐨𝐬𝐢𝐧𝐞 𝐫𝐞𝐰𝐚𝐫𝐝 stabilizes training & incentivizes the model to generate responses that are both accurate and reasonably lengthy.

To prevent reward hacking, a repetition penalty is added to the cosine reward.

Paper: arxiv.org/pdf/2502.03373
Hackable Code: github.com/eddycmu/demy...
February 8, 2025 at 11:36 AM
The best way to understand GRPO is to code it from scratch without HF Trainer. 😂
February 5, 2025 at 11:37 AM
Profile Summary on Twitter is so interesting! Can't agree more on this.
January 14, 2025 at 2:47 PM
🎁 Jason Wei: "agents"

🎁 Sebastian Raschka: "reasoning models"
December 31, 2024 at 6:54 AM
Microsoft Phi team avoids overfitting on benchmarks using a decontamination algo, removing data overlap via 7&13-gram counts. I want to implement this for my future work, must research efficient way. Looking at scikit-learn's implementation CountVectorizer may be a good start!
December 25, 2024 at 1:53 PM
The Phi-4 technical report is a bag of techniques for synthetic data in pre & post-training. The 𝗣𝗶𝘃𝗼𝘁𝗮𝗹 𝗧𝗼𝗸𝗲𝗻 𝗦𝗲𝗮𝗿𝗰𝗵 (PTS) method for generating DPO pairs is fascinating but computationally expensive for sampling completions to calculate p(success).
December 24, 2024 at 7:17 PM
Interestingly, when a new model is released these days, checking whether it is compared to Qwen models can be a good indicator of its quality. 😂
November 26, 2024 at 7:32 PM