How do you deal with finding and sorting through all the papers that keep coming out in this field? I think it's the aspect I struggle with most, and you clearly are doing a very good job here, so hopefully you have some suggestions.
What tools do you use to do this, especially for new papers? How do you easily distinguish papers worth reading from garbage when there are no citations to speak of? How many abstracts do you read compared to full papers? Do you usually skim-read / LLM summarize to get the gist, or do you actually spend the time to go through everything and attempt to thoroughly understand every single equation presented?
I must say it’s big challenge to find those diamonds in the rough. I used to be one of the machine learning moderators at arxiv a few years ago — there were three of us taking turns skimming through the titles of all submissions from the ML category (cs.LG). Mainly the titles but sometimes also the abstracts etc. I was mainly to check that the articles were categorized correctly (submitters choose categories and we additionally had classifiers for flagging miscategorized ones), but that wasn’t perfect. Long story short, I build a habit of skimming arxiv titles (not always but pretty often). Also I bookmark things I stumble upon on social media etc. I purely go by title first to keep a sub list of interesting papers. Then I select a sub-sub list and read the abstracts. Then based on sub-sub-sublist, I would skim (quickly read through) those ones that seem interesting for a given projects. And finally I read 1-3 papers a week more carefully. But yeah ultimately it’s a lot of work.
According to the DeepSeek-R1 paper it has. Ie the “Aha” moment that Models exhibited after multiple rounds of RL. There are papers saying that base models also already have these emergent abilities. In my opinion it’s not conclusive though because nowadays chain-of-thought data is exceedingly part of the pre-training data mix.
in relation to this sentence: "And since distillation, in this paper, meant instruction fine-tuning on chain-of-thought data, it's likely that pre-training on data that includes chain-of-thought data induces these abilities as well" I think it is completely analogous to the concept of continue-pretraining where I try to adapt the model to my context in a next-token prediction approach. I consider it "obvious", in the sense that it does not surprise me at all.
Yes, the training task in pre-training and SFT is exactly the next-token prediction task with the same cross-entropy loss.
Btw another example is that many base models can now follow instructions quite well to some extend (whereas, back in the day, base models were terrible at that). This is mostly due to the fact that pre-training data now also contains Q&A data. Some of it is coincidental, but it's also often done deliberately now as I described a while back in
I loved the combo of the RL summary and the selected papers. I definitely want to check out the one about the logic puzzles and the one extending RL to other domains; it seems like there’s still a lot to be discovered in these spaces.
Great stuff, as always. The inline cards linking to previous articles are very handy. 🙏 👍
Thanks!
Wow. Great article! You saved me hours of reading. Thank you!
How do you deal with finding and sorting through all the papers that keep coming out in this field? I think it's the aspect I struggle with most, and you clearly are doing a very good job here, so hopefully you have some suggestions.
What tools do you use to do this, especially for new papers? How do you easily distinguish papers worth reading from garbage when there are no citations to speak of? How many abstracts do you read compared to full papers? Do you usually skim-read / LLM summarize to get the gist, or do you actually spend the time to go through everything and attempt to thoroughly understand every single equation presented?
Great post btw, as always.
I must say it’s big challenge to find those diamonds in the rough. I used to be one of the machine learning moderators at arxiv a few years ago — there were three of us taking turns skimming through the titles of all submissions from the ML category (cs.LG). Mainly the titles but sometimes also the abstracts etc. I was mainly to check that the articles were categorized correctly (submitters choose categories and we additionally had classifiers for flagging miscategorized ones), but that wasn’t perfect. Long story short, I build a habit of skimming arxiv titles (not always but pretty often). Also I bookmark things I stumble upon on social media etc. I purely go by title first to keep a sub list of interesting papers. Then I select a sub-sub list and read the abstracts. Then based on sub-sub-sublist, I would skim (quickly read through) those ones that seem interesting for a given projects. And finally I read 1-3 papers a week more carefully. But yeah ultimately it’s a lot of work.
As always, perfect article of yours!
Great review!
I’m a bit confused—has reinforcement learning actually caused any emergent abilities yet? It may require deeper investigation.
According to the DeepSeek-R1 paper it has. Ie the “Aha” moment that Models exhibited after multiple rounds of RL. There are papers saying that base models also already have these emergent abilities. In my opinion it’s not conclusive though because nowadays chain-of-thought data is exceedingly part of the pre-training data mix.
greate!
This is super cool. The explanation of RLHF with PPO takes the cake. Also a super curated list of papers with short explanations.
Great article! Very clear explanations, it's great to have these summaries and not have to spend hours going through the papers.
Thanks!
This is a brilliant read. You never fail to impress !
What a great post!!! Thank you so much. I loved it and it helped me understand a lot of things.
Just a small detail: I think there’s a typo in "the just-realized o3 model". I assume you meant "just-released".
Thanks again for your work!
Thanks! And you are totally right, I meant to write "just-released" not "just-realized". This was "just-fixed" :D
in relation to this sentence: "And since distillation, in this paper, meant instruction fine-tuning on chain-of-thought data, it's likely that pre-training on data that includes chain-of-thought data induces these abilities as well" I think it is completely analogous to the concept of continue-pretraining where I try to adapt the model to my context in a next-token prediction approach. I consider it "obvious", in the sense that it does not surprise me at all.
Yes, the training task in pre-training and SFT is exactly the next-token prediction task with the same cross-entropy loss.
Btw another example is that many base models can now follow instructions quite well to some extend (whereas, back in the day, base models were terrible at that). This is mostly due to the fact that pre-training data now also contains Q&A data. Some of it is coincidental, but it's also often done deliberately now as I described a while back in
"New LLM Pre-training and Post-training Paradigms" (https://magazine.sebastianraschka.com/p/new-llm-pre-training-and-post-training)
I loved the combo of the RL summary and the selected papers. I definitely want to check out the one about the logic puzzles and the one extending RL to other domains; it seems like there’s still a lot to be discovered in these spaces.
Sir you're one of the greatest minds in AI. Another excellent compilation. Your work is genuinely inspiring
Thanks, Devansh!
Very detailed and interesting. Thanks for it!
Olá Raschka, obrigado pelos artigos sempre excelente conteúdo
Quite a detailed overview, great work!