4 Comments

However, this only considers the weights. What if the change from bfloat16 to float16 causes activation overflow or underflow? Is this possible?

Expand full comment

Yes, that's totally related as the weights data types determine the activation data types

Expand full comment

So... Much to do about nothing -- yet! Another sneaky complexity that could do potential damage for operational AI systems of the future. AI developer beware!

Plus... I see yet another textbook emerging from Sebastian. You can remove the kid from the university but you can not prevent the man from his teaching ...even if it was just a bit... TY ;)

Expand full comment

Haha, thanks Richard! So far, it seems like there is maybe no problem, yet, but yeah, the difference between the different floating point format is subtle but can make a huge difference, so I thought it was a good idea to write about it in general :)

Expand full comment