10 Comments
User's avatar
Sahar Mor's avatar

A well-curated list. Another noteworthy paper in October was Microsoft's BitNet. It demonstrated something quite remarkable - they managed to run a 100B parameter language model on a single CPU while maintaining human-level reading speed (5-7 tokens per second) by using 1.58-bit quantization. This breakthrough has huge implications for making large language models accessible on local devices without requiring specialized hardware.

https://arxiv.org/abs/2410.16144

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks! And yes, you are absolutely right regarding BitNet. Had it on my paper bookmark list (https://magazine.sebastianraschka.com/p/llm-research-papers-the-2024-list) but ultimately decided to pick the scaling laws for November because that's right now a bit more relevant for my work. Not saying that BitNet is super (!) impressive, but it was a bit tough to pick only one for each month 😅.

Expand full comment
Sahar Mor's avatar

So much progress and novel papers--it's definitely hard to pick one.

Expand full comment
Shawn's avatar

Can you talk about the MoE structure and how does it differ from the previous transformer? Thanks!

Expand full comment
Sebastian Raschka, PhD's avatar

I had a more detailed section on those back on January 2024 here that might be useful: https://magazine.sebastianraschka.com/i/141130005/mixtral-of-experts

Expand full comment
Birger Johnson's avatar

Many thanks for the very useful curated summary of the best papers. Apart from those that show ways to make learning or inference more efficient, I am wondering why the following paper that emergent abilities of AI are a mirage (https://arxiv.org/abs/2304.15004) did not make the listing ? It seems that a lot of the excitement about LLMs are based on this. Many thanks for answering !

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks for asking. It's actually a 2023 paper, so that's maybe why I didn't even consider it. For these Part 1 and Part 2 articles, I strictly picked exactly 1 paper for each month in 2024.

Expand full comment
Mark Hinkle's avatar

Great rundown Sebastien, really love the analysis on the test-time compute.

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks, Mark!

Expand full comment
Robert Ta's avatar

Thank you for the wealth of information!

Expand full comment