A well-curated list. Another noteworthy paper in October was Microsoft's BitNet. It demonstrated something quite remarkable - they managed to run a 100B parameter language model on a single CPU while maintaining human-level reading speed (5-7 tokens per second) by using 1.58-bit quantization. This breakthrough has huge implications for making large language models accessible on local devices without requiring specialized hardware.
Thanks! And yes, you are absolutely right regarding BitNet. Had it on my paper bookmark list (https://magazine.sebastianraschka.com/p/llm-research-papers-the-2024-list) but ultimately decided to pick the scaling laws for November because that's right now a bit more relevant for my work. Not saying that BitNet is super (!) impressive, but it was a bit tough to pick only one for each month 😅.
Many thanks for the very useful curated summary of the best papers. Apart from those that show ways to make learning or inference more efficient, I am wondering why the following paper that emergent abilities of AI are a mirage (https://arxiv.org/abs/2304.15004) did not make the listing ? It seems that a lot of the excitement about LLMs are based on this. Many thanks for answering !
Thanks for asking. It's actually a 2023 paper, so that's maybe why I didn't even consider it. For these Part 1 and Part 2 articles, I strictly picked exactly 1 paper for each month in 2024.
A well-curated list. Another noteworthy paper in October was Microsoft's BitNet. It demonstrated something quite remarkable - they managed to run a 100B parameter language model on a single CPU while maintaining human-level reading speed (5-7 tokens per second) by using 1.58-bit quantization. This breakthrough has huge implications for making large language models accessible on local devices without requiring specialized hardware.
https://arxiv.org/abs/2410.16144
Thanks! And yes, you are absolutely right regarding BitNet. Had it on my paper bookmark list (https://magazine.sebastianraschka.com/p/llm-research-papers-the-2024-list) but ultimately decided to pick the scaling laws for November because that's right now a bit more relevant for my work. Not saying that BitNet is super (!) impressive, but it was a bit tough to pick only one for each month 😅.
So much progress and novel papers--it's definitely hard to pick one.
Can you talk about the MoE structure and how does it differ from the previous transformer? Thanks!
I had a more detailed section on those back on January 2024 here that might be useful: https://magazine.sebastianraschka.com/i/141130005/mixtral-of-experts
Many thanks for the very useful curated summary of the best papers. Apart from those that show ways to make learning or inference more efficient, I am wondering why the following paper that emergent abilities of AI are a mirage (https://arxiv.org/abs/2304.15004) did not make the listing ? It seems that a lot of the excitement about LLMs are based on this. Many thanks for answering !
Thanks for asking. It's actually a 2023 paper, so that's maybe why I didn't even consider it. For these Part 1 and Part 2 articles, I strictly picked exactly 1 paper for each month in 2024.
Great rundown Sebastien, really love the analysis on the test-time compute.
Thanks, Mark!
Thank you for the wealth of information!