While bfloat16 is more effective during training, float16 will provide improved performance during inference, leading to more objective MMLU scores.
While bfloat16 is more effective during training, float16 will provide improved performance during inference, leading to more objective MMLU scores.