Lightnews — Scholar-powered news

Duc Nguyen Huu

@ducnh279.bsky.social

41 followers 18 following 54 posts

Data Science in ♥️ Home in 🇻🇳

Posts Replies Media Videos

Duc Nguyen Huu

@ducnh279.bsky.social

I find making your agents safe is just as important as making them smart. 🔒

A good read for building secure AI!

arxiv.org/pdf/2503.18813

March 31, 2025 at 12:47 PM

Duc Nguyen Huu

@ducnh279.bsky.social

There will be one day ... in 🇺🇸 or 🇻🇳

March 30, 2025 at 7:37 PM

Duc Nguyen Huu

@ducnh279.bsky.social

Have you ever wondered what are the differences between 2 types of knowledge distillation (for LLMs) these days?

Link: developer.nvidia.com/blog/how-to-...

February 11, 2025 at 8:17 PM

Duc Nguyen Huu

@ducnh279.bsky.social

Thanks for sharing with us, Seb!

Surprisingly, when I used 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', the output contained tokens/words introduced in the S1 paper. LOL! 🤣

February 11, 2025 at 2:03 PM

Duc Nguyen Huu

@ducnh279.bsky.social

My learning project: Implement a Reasoning Model with GRPO from scratch

Code: github.com/ducnh279/grp...

- Base model: Qwen2.5-1.5B
- Dataset: GSM8K (math)
- Reward functions: Format, Accuracy
- Objective: GRPO
- PEFT: LoRA

February 10, 2025 at 5:36 PM

Duc Nguyen Huu

@ducnh279.bsky.social

𝐂𝐨𝐬𝐢𝐧𝐞 𝐫𝐞𝐰𝐚𝐫𝐝 stabilizes training & incentivizes the model to generate responses that are both accurate and reasonably lengthy.

To prevent reward hacking, a repetition penalty is added to the cosine reward.

Paper: arxiv.org/pdf/2502.03373
Hackable Code: github.com/eddycmu/demy...

February 8, 2025 at 11:36 AM

Duc Nguyen Huu

@ducnh279.bsky.social

The best way to understand GRPO is to code it from scratch without HF Trainer. 😂

February 5, 2025 at 11:37 AM

Duc Nguyen Huu

@ducnh279.bsky.social

Profile Summary on Twitter is so interesting! Can't agree more on this.

January 14, 2025 at 2:47 PM

Duc Nguyen Huu

@ducnh279.bsky.social

🎁 Jason Wei: "agents"

🎁 Sebastian Raschka: "reasoning models"

December 31, 2024 at 6:54 AM

Duc Nguyen Huu

@ducnh279.bsky.social

Microsoft Phi team avoids overfitting on benchmarks using a decontamination algo, removing data overlap via 7&13-gram counts. I want to implement this for my future work, must research efficient way. Looking at scikit-learn's implementation CountVectorizer may be a good start!

December 25, 2024 at 1:53 PM

Duc Nguyen Huu

@ducnh279.bsky.social

The Phi-4 technical report is a bag of techniques for synthetic data in pre & post-training. The 𝗣𝗶𝘃𝗼𝘁𝗮𝗹 𝗧𝗼𝗸𝗲𝗻 𝗦𝗲𝗮𝗿𝗰𝗵 (PTS) method for generating DPO pairs is fascinating but computationally expensive for sampling completions to calculate p(success).

December 24, 2024 at 7:17 PM

Duc Nguyen Huu

@ducnh279.bsky.social

Interestingly, when a new model is released these days, checking whether it is compared to Qwen models can be a good indicator of its quality. 😂

November 26, 2024 at 7:32 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news