11 Comments
Sep 27, 2023Liked by Sebastian Raschka, PhD

Thanks for great post Sebastian. Please correct me if i am wrong: I think in prefix tuning and p-tuning methods LLM actually learns the prompts itself for input and output (we can still write custom input tho like "summarize : <context para>" ). I have seen some examples for Hugging face peft lib thats where they only provide the type of peft method (p-tuning, prefix-tuning etc) to lib without actually prefixing the input from the dataset. https://huggingface.co/docs/peft/task_guides/seq2seq-prefix-tuning/

Please share your views regarding that.

Thanks & Regards

Expand full comment
May 9, 2023Liked by Sebastian Raschka, PhD

Thank you for a great explanation. I think the "intuition" behind some LLM finetuning methods is applicable to non-LLM models (e.g. ResNet), do you think we can use these techniques to make a "better" finetuned non-LLM models?

Expand full comment

I have a question about the prefix-tuning paper: so the prefix embedding is added to the output of the attention layer or before the attention layer as your illustration here? Because if it's before the attention calculation, there will be q, k and v vectors for each token, then if you concatenate the embedding to both k and v vector, it will modify the dimension of the k vector, which makes it incompatible with q vector in the attention calculation, is that correct?

So is it correct that the prefix embedding is concatenated after go through a MLP along the dimension of the sequence to the output h_i of the attention layer, that's the same dimension of the input token. and the original W_o matrix is still compatible?

And talk about the MLP, is it true that each layer will need a specific MLP for produce the prefix embedding?

Expand full comment

I'm not sure I understand prompt tuning. Do you recommend any resources that I can use to read about it?

Expand full comment