Due to the extensive length of the regular Ahead of AI #11: New Foundation Models article, I removed some interesting tidbits around the Llama 2 weights from the main newsletter. However, it might be nice to include those as a small bonus for the supporters of Ahead of AI. Thanks again for the kind support!
In this short(er) article, we will briefly examine the Llama 2 weights and the implications of hosting them using different floating point precisions.
Llama 2 Weights Have Changed?
The original Llama 2 weights are hosted by Meta but the Hugging Face (HF) model hub also hosts the Llama 2 weights for added convenience. Using one over the other should be a matter of convenience. However, There was something peculiar about the HF weights, which were stored in float16 precision, whereas the original weights were stored in bfloat16 precision. Both bfloat16 and float16 are low-precision formats that reduce the compute memory requirements by half compared to regular 32-bit precision models, …