This is fantastic. Really looking forward to going through each of these papers. The rate of progress is so fast that collections like these are essential so that people who are not at the core of the field can keep up with the key insights.
Awesome compilation! Really helpful for folks who are starting out. It would be amazing if you can write a similar blog for computer vision to catch up on SOTA like diffusion models , Vision transformers etc..
One minor typo, in the BERT section you write "..The BERT paper above introduces the original concept of masked-language modeling, and next-sentence prediction remains an influential decoder-style architecture.." which I think it should be "..encoder-style architecture..."
Really liked the article. Shall be following the links for further reading. Much thanks! One things, in "
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning" You misspelt Deshpande, one of the authors. p and h are interchanged. Great read tho! I would love more ML articles. Even historical ones.
Glad to hear it was useful. I would say that pretty much everything is converging towards transformers. Even a big field such as computer vision is now heavily driven by attention layers / transformer-based architectures. Whether it's going to last for a whole decade that has to be seen. New methods can always emerge unexpectedly (e.g., the recent trend from GANs to diffusion models.)
Hi just loved reading your article. Due the nature of my work I have been reading and testing small llm models like tinyllama or phi-3 from Microsoft. I particular the last one is focused on the success of small models is related about the high quality training datathat is presented when training. Do you have any experience with this models? Will you post anything about this? In any case many thanks for sharing your knowledge in such clear and consise way.
Glad you liked it! Yes, these are nice, small models. Fun fact: The TinyLlama model and paper was actually based on the LitLlama/LitGPT framework I help developing. I wrote about Phi-3 a bit here (https://magazine.sebastianraschka.com/p/how-good-are-the-latest-open-llms) in Section 1.3 if useful. I should probably also add some of these small-but-good models to this article some time.
This is fantastic. Really looking forward to going through each of these papers. The rate of progress is so fast that collections like these are essential so that people who are not at the core of the field can keep up with the key insights.
I'd love to read additional ML & AI articles from you, outside of your existing newsletter format! So you've got my vote ✅
Awesome, I am glad to hear. And thanks for the feedback!!
Mine too!!
Awesome compilation! Really helpful for folks who are starting out. It would be amazing if you can write a similar blog for computer vision to catch up on SOTA like diffusion models , Vision transformers etc..
Glad you liked it! And yes, I would love to do that one day (haha, I have quite a long list of things I love to write about :)). In the meantime, I highlighted a few interesting papers from CVPR here: https://magazine.sebastianraschka.com/p/ahead-of-ai-10-state-of-computer
Great work! Congratulations!!
One minor typo, in the BERT section you write "..The BERT paper above introduces the original concept of masked-language modeling, and next-sentence prediction remains an influential decoder-style architecture.." which I think it should be "..encoder-style architecture..."
Awesome, thanks for the note. Fixed it!
I vote YES as well. Please keep going on ML & AI topics. Thanks for sharing.
Thanks for the feedback, glad to hear!
Stunning explanations! Thanks
Thanks for the kind words. It's very nice to hear this!!
Thanks for the article!
A minor note: I think there is a typo in "BlenderBot 3: A Deployed Conversational Agent that Continually Learns to Responsibly Rngage".
Thanks! That should have been "Engage" (not "Rngage") of course. Fixed it!
I like your idea of posting some additional articles related to machine learning and AI.
Thanks for the feedback!
Hi Sebastian,
I love your blog. Your blogs are very helpful to keep me updated on emerging technology.
Request you to write blog on
1) why/how LLM learn in-context without training on data. ( zero shot learning )
2) Prompt Chaining
Thanks for the feedback. Nice, I was indeed planning on writing something on in-context learning vs finetuning this weekend.
This is really good. I also enjoyed reading your book on ML with pytorch and SK Learn. Recommend to everyone.
Really liked the article. Shall be following the links for further reading. Much thanks! One things, in "
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning" You misspelt Deshpande, one of the authors. p and h are interchanged. Great read tho! I would love more ML articles. Even historical ones.
Thanks for the feedback! Glad to hear you liked the article! (Also thanks for mentioning the misspelling, just fixed it right away!)
This is a really great and detailed article. It seems like everything is converging towards transformers.
Transformers is literally taking over AI.
Do you think that transformers will be the monad for AI for the coming decade?
Yes looking at their high performance it will surely be the monad for the coming century and not decade 👍
Glad to hear it was useful. I would say that pretty much everything is converging towards transformers. Even a big field such as computer vision is now heavily driven by attention layers / transformer-based architectures. Whether it's going to last for a whole decade that has to be seen. New methods can always emerge unexpectedly (e.g., the recent trend from GANs to diffusion models.)
Love your content and eagerly look forward to it. Keep it going!
For sure, your content is always a great read!
Thanks, glad to hear!
Yes I really enjoyed it
I learned alot from Dr Sebastian Ras, thx dude
Hi just loved reading your article. Due the nature of my work I have been reading and testing small llm models like tinyllama or phi-3 from Microsoft. I particular the last one is focused on the success of small models is related about the high quality training datathat is presented when training. Do you have any experience with this models? Will you post anything about this? In any case many thanks for sharing your knowledge in such clear and consise way.
Glad you liked it! Yes, these are nice, small models. Fun fact: The TinyLlama model and paper was actually based on the LitLlama/LitGPT framework I help developing. I wrote about Phi-3 a bit here (https://magazine.sebastianraschka.com/p/how-good-are-the-latest-open-llms) in Section 1.3 if useful. I should probably also add some of these small-but-good models to this article some time.