Discussion about this post

User's avatar
Sahar Mor's avatar

A well-curated list. Another noteworthy paper in October was Microsoft's BitNet. It demonstrated something quite remarkable - they managed to run a 100B parameter language model on a single CPU while maintaining human-level reading speed (5-7 tokens per second) by using 1.58-bit quantization. This breakthrough has huge implications for making large language models accessible on local devices without requiring specialized hardware.

https://arxiv.org/abs/2410.16144

Expand full comment
Shawn's avatar

Can you talk about the MoE structure and how does it differ from the previous transformer? Thanks!

Expand full comment
8 more comments...

No posts