13 Comments

Hey I was wondering if you would consider making a blog post sometime as to how you find your papers? Right now my current strategy is by looking at the trending papers from the paperwithcode site, huggingface blog and your blog.

Expand full comment

Thanks for the suggestion. I can put it onto my list of interesting topics to write about one day. I was a former moderator for the machine learning category (cs.LG) on arxiv, so it's an old habit of mine to scan the submissions, which is what I often (but not daily) do to find interesting papers.

Expand full comment

Thanks so much for the reply! I have read every single on of your blogs and read your book: Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2. I really appreciate all of the work you put into this field and I want to thank you for it!

Expand full comment

That's a lot of articles! Thanks so much for the kind words. Knowing that these materials are so well received keeps me motivated to write more :)

Expand full comment

GeGLU vs SwiGLU could just be a “decoy” - a random change added just to make it harder to understand where the gains came from? Not sure, just an idea. There’s not a lot of science behind these hyperparameter choices unfortunately.

Great read and keep up the amazing work!

Expand full comment

Your round up is always incredible thanks for sharing!

Expand full comment

Fantastic write-up, as usual, thank you!!

Maybe just one super-minor typo about OLMo:

"decay up to the peak learning rate" --> "decay up to a tenth of the peak learning rate"

Expand full comment

Thanks on both accounts. And yes, this was a typo! Just fixed it!

Expand full comment

Very impressive how many papers you manage to get through. Just ordered the book. Thanks

Expand full comment

Ah, yes, it's a lot of work, but when spread out throughout the month, the amount is actually not that scary. There are also many papers that I only briefly skim because reading each paper in detail would indeed by a full time job. Thanks for getting a copy of my book!

Expand full comment

Thanks a lot for a great write up (as always). I just wanted to whether you would be covering training big models using different sharding methodologies (distributed data parallelism/ multi gpu training etc ) in your book ?

Expand full comment

As always very well written. 😄

Is the deviation from ReLu’s to SwiGlu and GeGlu, is to make the function smoother instead of piecewise linear when using ReLu?

Expand full comment

Congratulations on the book launch! I just put in a pre-order for a copy on Amazon.

Expand full comment