8 Comments
User's avatar
Chris Wendling's avatar

Fantastic work Sebastian! Gives me hope that the solution will be found. I strongly encourage you to review some papers I’ve written on the subject- as you certainly have the background to understand:

Substack Archives —

https://chrispwendling.substack.com/archive

And-

http://www.itrac.com/EGM_Document_Index.htm

Expand full comment
Sebastian Raschka, PhD's avatar

Thanks for sharing! I can't promise to get to them soon, but I will add them to my reading list and check them out some time.

Expand full comment
Kai Liu's avatar

I came to this and a little confused:

we now have a quadratic n_heads × d_head in here...

can you explain a lit-bit? thanks!

Expand full comment
Sebastian Raschka, PhD's avatar

Good catch, I meant to type d_heads × d_head. Does this address your concern?

Expand full comment
Ai Therapy Solutions's avatar

What are your thoughts about specialized foundational models. An llm but with specialized training - guardrails - human in the loop ?

Expand full comment
Sebastian Raschka, PhD's avatar

Regarding specialization: I think that's in some sort what's already happening with code models, right?

Could you describe a bit more how the humans interact during training, and what the motivation here is?

Expand full comment
Kai Liu's avatar

Great article, Sebastian! Thank you for your work!

Expand full comment
praxis22's avatar

An excellent way to walk home. Thanks!

Expand full comment