Dec 31, 2023ยทedited Dec 31, 2023Liked by Sebastian Raschka, PhD

This LLM encoder/decoder stuff messes with my mind! There is something fundamental here that I'm not getting. HELP... ๐Ÿค” I have been fascinated with autoencoders, which take an example from feature space and ENCODE it into point in latent space and then DECODE it back into a reconstructed example in feature space, thus allowing a reconstruction loss to be calculated. [ref: Python ML 3Ed, Chap 17]

1) Should LLM decoders be called 'generators' like in GANs?

2) That single line that connects LLM encoder to its decoder... Is that the same data that one receives as an embedding from the LLM API?

3) For a decoder-only LLM, is its input always an embedding vector? Or, where do the model weights come from?

4) Is it possible to take an LLM embedding, reconstruct its initial input, and calculate the reconstruction loss? If true, this would enable us to map the fine (manifold) structures in these mysterious LLM latent spaces. Loved your old examples of putting/removing smiles on celebrity faces. Like to find a few hallucinations lurking in LLM latent spaces! ๐Ÿ˜ฎ

Expand full comment
Feb 17Liked by Sebastian Raschka, PhD

thanks a lot for this. I'm actually at the point where you left off in your comment below, where I'm using an open-source API layer on top of GPT to piece together how it all works, and how to get some short-term gratification building my own components on top of it.

But I got this far without even knowing that GPT is decoder-only, until today!

My first steps into machine learning were with the encoder-decoder architecture of face-swapping models, so I'd assumed LLMs were built with the same architecture.

Expand full comment