20 Comments
User's avatar
Kai Liu's avatar

Great article, Sebastian! Thank you for your work!

Expand full comment
Daniel Kleine's avatar

I really enjoyed the overview! Just a quick note – from section 3.1 onward, some sentences seem to have line breaks that make the text a bit hard to read (I have noticed that especially in sections 3.1 and 3.3). Could you please take a quick look when you get a chance?

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks! And not sure how that happened, but I was copying back and forth from my local markdown editor, which may have caused this. The back and forth because, for the first time, I got an "Your post is too long and can't be saved. Please edit it to make it shorter or split part of it into another post." error in Substack :(

Expand full comment
Daniel Kleine's avatar

Thank you!

Expand full comment
Chris Wendling's avatar

Fantastic work Sebastian! Gives me hope that the solution will be found. I strongly encourage you to review some papers I’ve written on the subject- as you certainly have the background to understand:

Substack Archives —

https://chrispwendling.substack.com/archive

And-

http://www.itrac.com/EGM_Document_Index.htm

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks for sharing! I can't promise to get to them soon, but I will add them to my reading list and check them out some time.

Expand full comment
Kai Liu's avatar

I came to this and a little confused:

we now have a quadratic n_heads × d_head in here...

can you explain a lit-bit? thanks!

Expand full comment
Sebastian Raschka, PhD's avatar

Good catch, I meant to type d_heads × d_head. Does this address your concern?

Expand full comment
Daniel Kleine's avatar

It should be "d_head × d_head" (without the s), right?

Could you please update this in the repo file as well?

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks & done!

Expand full comment
Kai Liu's avatar

Yes, thanks for your clarification!

Expand full comment
Ai Therapy Solutions's avatar

What are your thoughts about specialized foundational models. An llm but with specialized training - guardrails - human in the loop ?

Expand full comment
Sebastian Raschka, PhD's avatar

Regarding specialization: I think that's in some sort what's already happening with code models, right?

Could you describe a bit more how the humans interact during training, and what the motivation here is?

Expand full comment
Ruben Hassid's avatar

We’re entering the post-LLM era. Not “smarter models.” Smarter stacks.

The winners won’t chase bigger benchmarks. They’ll orchestrate smaller systems that think faster, cheaper, and closer to the edge. One model routes. One reasons. One polishes. Together, they outperform giants.

This flips how you build. You stop prompting. You start composing. Every model becomes a teammate, not a tool.

As I wrote in Consultants, real leverage comes from sequencing intelligence, not scaling it.

Big isn’t the future. Interconnected is.

Expand full comment
Varun's avatar

Thanks for the great article!

I do hope you get the time to write a deep-dive on Mamba & DeltaNet architectures someday. Would love to learn more deeply about them!

Expand full comment
Aishwarya Agrawal's avatar

It was a great read, Sebastian! I really learnt so much about whats going on in this space and alternates to standard LLMs in such a lucid way! Definitely excited to see more articles about each of these sections in detail...

Expand full comment
Antonina's avatar

Thank you for the overview!

A small note: the link to "Diffusion‑LM Improves Controllable Text Generation" paper in 3.1 leads to another paper. The correct one is probably this: https://arxiv.org/abs/2205.14217

Expand full comment
APS's avatar

Sometimes I feel like a hermit on a rock just sat waiting for someone to bring me the news that they've finally solved SSMs as a paradigm and they're going to come and blast us past the now-generally-accepted inherent limitations of regular transformers.

Expand full comment
praxis22's avatar

An excellent way to walk home. Thanks!

Expand full comment