Kazuki Fujii
banner
kazukifujii.bsky.social
Kazuki Fujii
@kazukifujii.bsky.social
Tokyo Tech CS Master (Rio Yokota Lab → Jun Sakma Lab) Distributed Training, Sytems for Machine Learning
📢 New findings on FP8 training for Continual Pre-Training! 🚀
Our experiments on Llama-3-70B show that FP8 significantly boosts training throughput (415 → 570 TFLOP/s) but induces loss spikes, leading to downstream performance drops. FP8 isn't always the best choice—it depends! (1/n)
November 25, 2024 at 12:43 AM