Due to the extensive length of the regular Ahead of AI #11: New Foundation Models article, I removed some interesting tidbits around the Llama 2 weights from the main newsletter.
So... Much to do about nothing -- yet! Another sneaky complexity that could do potential damage for operational AI systems of the future. AI developer beware!
Plus... I see yet another textbook emerging from Sebastian. You can remove the kid from the university but you can not prevent the man from his teaching ...even if it was just a bit... TY ;)
Haha, thanks Richard! So far, it seems like there is maybe no problem, yet, but yeah, the difference between the different floating point format is subtle but can make a huge difference, so I thought it was a good idea to write about it in general :)
However, this only considers the weights. What if the change from bfloat16 to float16 causes activation overflow or underflow? Is this possible?
Yes, that's totally related as the weights data types determine the activation data types
So... Much to do about nothing -- yet! Another sneaky complexity that could do potential damage for operational AI systems of the future. AI developer beware!
Plus... I see yet another textbook emerging from Sebastian. You can remove the kid from the university but you can not prevent the man from his teaching ...even if it was just a bit... TY ;)
Haha, thanks Richard! So far, it seems like there is maybe no problem, yet, but yeah, the difference between the different floating point format is subtle but can make a huge difference, so I thought it was a good idea to write about it in general :)