Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers
Fantastic work Sebastian! Gives me hope that the solution will be found. I strongly encourage you to review some papers I’ve written on the subject- as you certainly have the background to understand:
Substack Archives —
https://chrispwendling.substack.com/archive
And-
http://www.itrac.com/EGM_Document_Index.htm
Thanks for sharing! I can't promise to get to them soon, but I will add them to my reading list and check them out some time.
I came to this and a little confused:
we now have a quadratic n_heads × d_head in here...
can you explain a lit-bit? thanks!
Good catch, I meant to type d_heads × d_head. Does this address your concern?
What are your thoughts about specialized foundational models. An llm but with specialized training - guardrails - human in the loop ?
Regarding specialization: I think that's in some sort what's already happening with code models, right?
Could you describe a bit more how the humans interact during training, and what the motivation here is?
Great article, Sebastian! Thank you for your work!
An excellent way to walk home. Thanks!
Fantastic work Sebastian! Gives me hope that the solution will be found. I strongly encourage you to review some papers I’ve written on the subject- as you certainly have the background to understand:
Substack Archives —
https://chrispwendling.substack.com/archive
And-
http://www.itrac.com/EGM_Document_Index.htm
Thanks for sharing! I can't promise to get to them soon, but I will add them to my reading list and check them out some time.
I came to this and a little confused:
we now have a quadratic n_heads × d_head in here...
can you explain a lit-bit? thanks!
Good catch, I meant to type d_heads × d_head. Does this address your concern?
What are your thoughts about specialized foundational models. An llm but with specialized training - guardrails - human in the loop ?
Regarding specialization: I think that's in some sort what's already happening with code models, right?
Could you describe a bit more how the humans interact during training, and what the motivation here is?
Great article, Sebastian! Thank you for your work!
An excellent way to walk home. Thanks!