10 Comments

A well-curated list. Another noteworthy paper in October was Microsoft's BitNet. It demonstrated something quite remarkable - they managed to run a 100B parameter language model on a single CPU while maintaining human-level reading speed (5-7 tokens per second) by using 1.58-bit quantization. This breakthrough has huge implications for making large language models accessible on local devices without requiring specialized hardware.

https://arxiv.org/abs/2410.16144

Expand full comment

Thanks! And yes, you are absolutely right regarding BitNet. Had it on my paper bookmark list (https://magazine.sebastianraschka.com/p/llm-research-papers-the-2024-list) but ultimately decided to pick the scaling laws for November because that's right now a bit more relevant for my work. Not saying that BitNet is super (!) impressive, but it was a bit tough to pick only one for each month 😅.

Expand full comment

So much progress and novel papers--it's definitely hard to pick one.

Expand full comment

Can you talk about the MoE structure and how does it differ from the previous transformer? Thanks!

Expand full comment

I had a more detailed section on those back on January 2024 here that might be useful: https://magazine.sebastianraschka.com/i/141130005/mixtral-of-experts

Expand full comment

Many thanks for the very useful curated summary of the best papers. Apart from those that show ways to make learning or inference more efficient, I am wondering why the following paper that emergent abilities of AI are a mirage (https://arxiv.org/abs/2304.15004) did not make the listing ? It seems that a lot of the excitement about LLMs are based on this. Many thanks for answering !

Expand full comment

Thanks for asking. It's actually a 2023 paper, so that's maybe why I didn't even consider it. For these Part 1 and Part 2 articles, I strictly picked exactly 1 paper for each month in 2024.

Expand full comment

Great rundown Sebastien, really love the analysis on the test-time compute.

Expand full comment

Thanks, Mark!

Expand full comment

Thank you for the wealth of information!

Expand full comment