About LayerNorm Variants in the Original…

May 24, 2023

A few months ago, I shared the article, Understanding Large Language Models: A Cross-Section of the Most Relevant Literature To Get Up to Speed, and the positive feedback was very motivating!

Read →

4 Comments

James Caron

May 25, 2023

Very astute. I’m a Septuagenarian who interest involved from high school mechanical drafting into CAD and CNC years ago. Progressively from logic and truth tree training with projects on canning retort operations using VAT. Eventually using R language I been aggressively reading all I can about ML and AI. A fair hobby to stimulate my mind in maturing. Anyhow, your research is inspiring. Keep up the good work. Thank you.

J. Caron

Canada

Expand full comment

Reply (1)

Sebastian Raschka, PhD

May 26, 2023

I often wish to have experienced the research world before the internet back then. I appreciate how connected everything is and how easy it is to access knowledge these days. But when I imagine how science, research, and engineering back then, I can't help but feel like I missed out! I think I would have loved it!

Anyway, I hope I'll stay motivated and keep tinkering far into the future, too! And thanks for the kind words. It's very nice to hear that my writings are useful and appreciated!

Expand full comment

pi-tau

Jul 13, 2023

Hi, thanks for the helpful references!

Regarding the [official implementation](https://github.com/tensorflow/tensor2tensor/commit/f5c9b17e617ea9179b7d84d36b1e8162cb369f25) I can see that they have set the default to `layer_postprocess_sequence="dan"`, which according to [this comment](https://github.com/tensorflow/tensor2tensor/blob/bafdc1b67730430d38d6ab802cbd51f9d053ba2e/tensor2tensor/layers/common_layers.py#L881) should be interpreted as dropout -> add -> normaliaze, matching the description of the paper.

Am I missing something ?

Expand full comment

Reply (1)

Sebastian Raschka, PhD

Jul 15, 2023

Thanks for the note. At the time of writing, I was pretty sure that it was changed. I will have to check into that again!

Expand full comment