Discussion about this post

User's avatar
Sasskia Ludin's avatar

It's a real pleasure to read the summary of such a knowledgeable man as you, kudos!

Now, I'm a little bit surprised that you didn't introduce multi-modality in LLMs as a main axis of research/differentiation. Pairing text to vision is already relatively straightforward and there is also sub-modality differentiation in the sound landscape with speech to text enlarged to text to sound (e.g. the Bark model), but this is only the beginning....

IMHO, generalized multi-modality it is a too neglected straightforward path towards AGI as it would solve a good part o the thorny issue of symbol grounding in those models (the other parts being feedback from retro-action from the world to the models, a direct pathway toward the synthesis of genuine evolved intentionality).

With models encompassing our 5 senses and proprioception, they should evolve inner world representations more aligned with human ones. One can even speculate if those LLMs would converge toward an universal underlying neural coding scheme like e.g. a kind of Grossberg' ART refinement, as described in this paper, https://www.mdpi.com/2078-2489/14/2/82

Expand full comment
Milton Leal's avatar

Awesome write-up. I tend to try to follow the news as they occur, but you do such a great job in distilling everything that I may just consume your newsletter. I wonder if you use any LLM to help you writing or organizing raw text.

Expand full comment
12 more comments...

No posts