Discussion about this post

User's avatar
Richard Hackathorn's avatar

This LLM encoder/decoder stuff messes with my mind! There is something fundamental here that I'm not getting. HELP... 🤔 I have been fascinated with autoencoders, which take an example from feature space and ENCODE it into point in latent space and then DECODE it back into a reconstructed example in feature space, thus allowing a reconstruction loss to be calculated. [ref: Python ML 3Ed, Chap 17]

1) Should LLM decoders be called 'generators' like in GANs?

2) That single line that connects LLM encoder to its decoder... Is that the same data that one receives as an embedding from the LLM API?

3) For a decoder-only LLM, is its input always an embedding vector? Or, where do the model weights come from?

4) Is it possible to take an LLM embedding, reconstruct its initial input, and calculate the reconstruction loss? If true, this would enable us to map the fine (manifold) structures in these mysterious LLM latent spaces. Loved your old examples of putting/removing smiles on celebrity faces. Like to find a few hallucinations lurking in LLM latent spaces! 😮

Expand full comment
Viswa Kumar's avatar

I agree the term encoder / decoder is overloaded since almost all architectures would essentially perform the encoding / decoding as a function. Engineers are not good at naming not only vars after all 🤣

Expand full comment
3 more comments...

No posts