Hi, This is Regan. I am currently operating a Chinese AI blog named Baihai IDP.
Please allow me to translate this blog post into Chinese.
I am very interested in the content of your blog post. I believe that the information in it would be of great benefit to a wider audience if it were translated into Chinese.
I would be sure to include a link to the original blog post and your name as the author. I would also be happy to provide you with a copy of the translated post for your records.
I hope you will consider my request and I look forward to hearing from you.
Thank you! This is an incredible intro to understanding fine tuning. I’m curious though how these different approaches impact model output/performance. 1. What are the a ways that researchers assess output/performance 2. How different is performance between in context v. Indexing v. Retraining etc.
This is a nice article. But one thing I did not understand is the comment on "keep frozen" for feature based approach. In this technique I understood that we are not doing any fine tuning of the language model. We are just taking the output of the LM and using that to train a new model. So, there is no question of updating the existing model weights. Is my understanding right? Also for this approach, I think I can only use the embedding models like text-ada-embedding. I will not be able to use GPT 2.5 or DaVinci, is my understanding correct?
Many thanks for writing this, it’s one of the first posts I’ve seen that is starting from a first principles approach. I wanted to ask if you plan to write code tutorials for how to implement the methods mentioned in your posts? Learning to fine tune models with custom data seems to be a very valuable skill and I was wondering if you could point to some materials - either your own or others where people could actually practice this.
Your articles coupled with annotated notebooks or examples would be a great combination!
Hi Sebastian, I love your content. Always on point as usual. It would be nice if you could go a little deeper into how to compile your training dataset if you want to fine-tune your own LLM. I read a lot about fine-tuning algorithms of LLMs such as prefix/prompt tuning, adapters or lora. But I think structuring and fine-tuning your underlying dataset is just as important. Say you have millions/billions of observations, how do you decide which data points to include and how to format your instructions for example? It would be nice if you went into that sometime. I think a lot of the work you have to do in order to get a good performance out of your LLM finetuning will be based on the underlying dataset you use to finetune your LLM.
Thanks for the post. Most of the questions are answered.
Can you please clarify the difference between hard prompt tuning (assuming it’s prompt engineering, where we manually create prompts with some modifications etc., is that correct?) and soft prompt tuning.
Also, in soft prompt tuning, how can we differentiate between different tasks (QA, Summarization, classification etc.,). Will we be using different vocabulary for different tasks.
German translation doesn't need any special prompting. just say "translate this to German: good morning" and it will work. Your article states other wise
Finetuning Large Language Models
Hi, This is Regan. I am currently operating a Chinese AI blog named Baihai IDP.
Please allow me to translate this blog post into Chinese.
I am very interested in the content of your blog post. I believe that the information in it would be of great benefit to a wider audience if it were translated into Chinese.
I would be sure to include a link to the original blog post and your name as the author. I would also be happy to provide you with a copy of the translated post for your records.
I hope you will consider my request and I look forward to hearing from you.
Thank you! This is an incredible intro to understanding fine tuning. I’m curious though how these different approaches impact model output/performance. 1. What are the a ways that researchers assess output/performance 2. How different is performance between in context v. Indexing v. Retraining etc.
This is a nice article. But one thing I did not understand is the comment on "keep frozen" for feature based approach. In this technique I understood that we are not doing any fine tuning of the language model. We are just taking the output of the LM and using that to train a new model. So, there is no question of updating the existing model weights. Is my understanding right? Also for this approach, I think I can only use the embedding models like text-ada-embedding. I will not be able to use GPT 2.5 or DaVinci, is my understanding correct?
Hi Sebastian,
Many thanks for writing this, it’s one of the first posts I’ve seen that is starting from a first principles approach. I wanted to ask if you plan to write code tutorials for how to implement the methods mentioned in your posts? Learning to fine tune models with custom data seems to be a very valuable skill and I was wondering if you could point to some materials - either your own or others where people could actually practice this.
Your articles coupled with annotated notebooks or examples would be a great combination!
Hi Sebastian, thanks for writing this! Big fan of your work. What are the real world use cases you are seeing for PEFT?
Hi Sebastian, I love your content. Always on point as usual. It would be nice if you could go a little deeper into how to compile your training dataset if you want to fine-tune your own LLM. I read a lot about fine-tuning algorithms of LLMs such as prefix/prompt tuning, adapters or lora. But I think structuring and fine-tuning your underlying dataset is just as important. Say you have millions/billions of observations, how do you decide which data points to include and how to format your instructions for example? It would be nice if you went into that sometime. I think a lot of the work you have to do in order to get a good performance out of your LLM finetuning will be based on the underlying dataset you use to finetune your LLM.
Thanks for the post.
Thanks for the post. Most of the questions are answered.
Can you please clarify the difference between hard prompt tuning (assuming it’s prompt engineering, where we manually create prompts with some modifications etc., is that correct?) and soft prompt tuning.
Also, in soft prompt tuning, how can we differentiate between different tasks (QA, Summarization, classification etc.,). Will we be using different vocabulary for different tasks.
some practical context would be nice, e.g., where does setfit fit etc.
German translation doesn't need any special prompting. just say "translate this to German: good morning" and it will work. Your article states other wise