5 Comments
User's avatar
NirajPandkar's avatar

Wow Lightning does make it super easy to employ mixed precision and distributed training! When I wrote about mixed precision in 2020, when PyTorch just released their AMP (automatic mixed precision) module, it was a mess trying to autocast the layers and remembering their precisions.

Enjoyed the read! Thanks!

Expand full comment
Sebastian Raschka, PhD's avatar

Glad to hear this is useful! And yeah, internally it's using PyTorch's AMP, but it's making the API more user-friendly :)

Expand full comment
Ahmed Besbes's avatar

Clear and straight to the point, thanks a lot!

Expand full comment
Rabin Adhikari's avatar

Any specific reasons to prefer bfloat16 to float16?

Expand full comment
Sebastian Raschka, PhD's avatar

Good question, in many situations, I find that both of them work well. However, when I finetuned LLaMA models, for example, float16 gave really poor performance. I think that might have something to do with not well normalized activations or gradients. Because bfloat16 has a larger range of values it can display (but the precision is lower). Or in other words, there may have been values that exceeded -77k or 77k, and that would then cause problems in float16.

Expand full comment