Kazuki Fujii
banner
kazukifujii.bsky.social
Kazuki Fujii
@kazukifujii.bsky.social
Tokyo Tech CS Master (Rio Yokota Lab → Jun Sakma Lab) Distributed Training, Sytems for Machine Learning
We've also observed that the impact of FP8 varies depending on model size and training scenarios (e.g., continual pre-training, from-scratch training, SFT). A comprehensive evaluation requires significant computational resources—this is not a trivial issue. (2/n)
November 25, 2024 at 12:44 AM