Discussion about this post

User's avatar
Joe F's avatar

Hey I was wondering if you would consider making a blog post sometime as to how you find your papers? Right now my current strategy is by looking at the trending papers from the paperwithcode site, huggingface blog and your blog.

Expand full comment
Samuel Flender's avatar

GeGLU vs SwiGLU could just be a “decoy” - a random change added just to make it harder to understand where the gains came from? Not sure, just an idea. There’s not a lot of science behind these hyperparameter choices unfortunately.

Great read and keep up the amazing work!

Expand full comment
11 more comments...

No posts