5 Comments

Wow Lightning does make it super easy to employ mixed precision and distributed training! When I wrote about mixed precision in 2020, when PyTorch just released their AMP (automatic mixed precision) module, it was a mess trying to autocast the layers and remembering their precisions.

Enjoyed the read! Thanks!

Expand full comment
author

Glad to hear this is useful! And yeah, internally it's using PyTorch's AMP, but it's making the API more user-friendly :)

Expand full comment
Aug 24, 2023Liked by Sebastian Raschka, PhD

Clear and straight to the point, thanks a lot!

Expand full comment

Any specific reasons to prefer bfloat16 to float16?

Expand full comment
author
Jun 28, 2023·edited Jun 28, 2023Author

Good question, in many situations, I find that both of them work well. However, when I finetuned LLaMA models, for example, float16 gave really poor performance. I think that might have something to do with not well normalized activations or gradients. Because bfloat16 has a larger range of values it can display (but the precision is lower). Or in other words, there may have been values that exceeded -77k or 77k, and that would then cause problems in float16.

Expand full comment