Discussion about this post

User's avatar
Kai Liu's avatar

Great article, Sebastian! Thank you for your work!

siyu's avatar

Looks like you have the same (similar) code for both "Gated Attention" vs. "Gated delta net attention". At least the subroutines have the same name, and I'm having trouble seeing the difference between the 2 implementations

26 more comments...

No posts

Ready for more?