30 Comments
User's avatar
Xavier's avatar

Great stuff, as always. The inline cards linking to previous articles are very handy. 🙏 👍

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks!

Expand full comment
Zia Khan's avatar

Wow. Great article! You saved me hours of reading. Thank you!

Expand full comment
mikolysz's avatar

How do you deal with finding and sorting through all the papers that keep coming out in this field? I think it's the aspect I struggle with most, and you clearly are doing a very good job here, so hopefully you have some suggestions.

What tools do you use to do this, especially for new papers? How do you easily distinguish papers worth reading from garbage when there are no citations to speak of? How many abstracts do you read compared to full papers? Do you usually skim-read / LLM summarize to get the gist, or do you actually spend the time to go through everything and attempt to thoroughly understand every single equation presented?

Great post btw, as always.

Expand full comment
Sebastian Raschka, PhD's avatar

I must say it’s big challenge to find those diamonds in the rough. I used to be one of the machine learning moderators at arxiv a few years ago — there were three of us taking turns skimming through the titles of all submissions from the ML category (cs.LG). Mainly the titles but sometimes also the abstracts etc. I was mainly to check that the articles were categorized correctly (submitters choose categories and we additionally had classifiers for flagging miscategorized ones), but that wasn’t perfect. Long story short, I build a habit of skimming arxiv titles (not always but pretty often). Also I bookmark things I stumble upon on social media etc. I purely go by title first to keep a sub list of interesting papers. Then I select a sub-sub list and read the abstracts. Then based on sub-sub-sublist, I would skim (quickly read through) those ones that seem interesting for a given projects. And finally I read 1-3 papers a week more carefully. But yeah ultimately it’s a lot of work.

Expand full comment
Ilona Brinkmeier's avatar

As always, perfect article of yours!

Expand full comment
Moein Salimi's avatar

Great review!

I’m a bit confused—has reinforcement learning actually caused any emergent abilities yet? It may require deeper investigation.

Expand full comment
Sebastian Raschka, PhD's avatar

According to the DeepSeek-R1 paper it has. Ie the “Aha” moment that Models exhibited after multiple rounds of RL. There are papers saying that base models also already have these emergent abilities. In my opinion it’s not conclusive though because nowadays chain-of-thought data is exceedingly part of the pre-training data mix.

Expand full comment
kevin's avatar

greate!

Expand full comment
Nimish Sanghi's avatar

This is super cool. The explanation of RLHF with PPO takes the cake. Also a super curated list of papers with short explanations.

Expand full comment
Szymon Palucha's avatar

Great article! Very clear explanations, it's great to have these summaries and not have to spend hours going through the papers.

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks!

Expand full comment
Jassim Moideen's avatar

This is a brilliant read. You never fail to impress !

Expand full comment
Mario's avatar

What a great post!!! Thank you so much. I loved it and it helped me understand a lot of things.

Just a small detail: I think there’s a typo in "the just-realized o3 model". I assume you meant "just-released".

Thanks again for your work!

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks! And you are totally right, I meant to write "just-released" not "just-realized". This was "just-fixed" :D

Expand full comment
nicola leonardi's avatar

in relation to this sentence: "And since distillation, in this paper, meant instruction fine-tuning on chain-of-thought data, it's likely that pre-training on data that includes chain-of-thought data induces these abilities as well" I think it is completely analogous to the concept of continue-pretraining where I try to adapt the model to my context in a next-token prediction approach. I consider it "obvious", in the sense that it does not surprise me at all.

Expand full comment
Sebastian Raschka, PhD's avatar

Yes, the training task in pre-training and SFT is exactly the next-token prediction task with the same cross-entropy loss.

Btw another example is that many base models can now follow instructions quite well to some extend (whereas, back in the day, base models were terrible at that). This is mostly due to the fact that pre-training data now also contains Q&A data. Some of it is coincidental, but it's also often done deliberately now as I described a while back in

"New LLM Pre-training and Post-training Paradigms" (https://magazine.sebastianraschka.com/p/new-llm-pre-training-and-post-training)

Expand full comment
Miguel Conner's avatar

I loved the combo of the RL summary and the selected papers. I definitely want to check out the one about the logic puzzles and the one extending RL to other domains; it seems like there’s still a lot to be discovered in these spaces.

Expand full comment
Devansh's avatar

Sir you're one of the greatest minds in AI. Another excellent compilation. Your work is genuinely inspiring

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks, Devansh!

Expand full comment
Dr. Ashish Bamania's avatar

Very detailed and interesting. Thanks for it!

Expand full comment
Alessandro Pessoa's avatar

Olá Raschka, obrigado pelos artigos sempre excelente conteúdo

Expand full comment
Sanket Gupta's avatar

Quite a detailed overview, great work!

Expand full comment