23 Comments
May 8, 2023Liked by Sebastian Raschka, PhD

Hi, This is Regan. I am currently operating a Chinese AI blog named Baihai IDP.

Please allow me to translate this blog post into Chinese.

I am very interested in the content of your blog post. I believe that the information in it would be of great benefit to a wider audience if it were translated into Chinese.

I would be sure to include a link to the original blog post and your name as the author. I would also be happy to provide you with a copy of the translated post for your records.

I hope you will consider my request and I look forward to hearing from you.

Expand full comment
Apr 22, 2023Liked by Sebastian Raschka, PhD

Thank you! This is an incredible intro to understanding fine tuning. I’m curious though how these different approaches impact model output/performance. 1. What are the a ways that researchers assess output/performance 2. How different is performance between in context v. Indexing v. Retraining etc.

Expand full comment
Apr 9Liked by Sebastian Raschka, PhD

Hi Sebastian, I am rediscovering this post after running into some issues with fine tuning. Thanks for the awesome post!

I am particularly interested in your opinions on fine tuning all layers vs fine tuning the last layer (maybe plus gradual unfreezing) for repurposing the pretrained model, e.g., for training reward models.

You mentioned in another post that the most popular method nowadays is to fine tune all layers all together (i.e., gradual unfreezing as in UMLfit is out of date). But could you explain why it makes sense? Intuitively, when we add a linear layer to the pretrained backbone to learn reward for example, and that we use the same very small learning rate (e.g. 1e-5) for both the backbone and the linear layer, the linear layer is basically not changing, so we are pretty much adapting the backbone representation to fit random weights in the linear layer?

Thanks ahead for your reply!

Expand full comment
Jul 31, 2023Liked by Sebastian Raschka, PhD

This is a nice article. But one thing I did not understand is the comment on "keep frozen" for feature based approach. In this technique I understood that we are not doing any fine tuning of the language model. We are just taking the output of the LM and using that to train a new model. So, there is no question of updating the existing model weights. Is my understanding right? Also for this approach, I think I can only use the embedding models like text-ada-embedding. I will not be able to use GPT 2.5 or DaVinci, is my understanding correct?

Expand full comment
May 22, 2023Liked by Sebastian Raschka, PhD

Hi Sebastian,

Many thanks for writing this, it’s one of the first posts I’ve seen that is starting from a first principles approach. I wanted to ask if you plan to write code tutorials for how to implement the methods mentioned in your posts? Learning to fine tune models with custom data seems to be a very valuable skill and I was wondering if you could point to some materials - either your own or others where people could actually practice this.

Your articles coupled with annotated notebooks or examples would be a great combination!

Expand full comment
May 16, 2023Liked by Sebastian Raschka, PhD

Hi Sebastian, thanks for writing this! Big fan of your work. What are the real world use cases you are seeing for PEFT?

Expand full comment
May 5, 2023Liked by Sebastian Raschka, PhD

Hi Sebastian, I love your content. Always on point as usual. It would be nice if you could go a little deeper into how to compile your training dataset if you want to fine-tune your own LLM. I read a lot about fine-tuning algorithms of LLMs such as prefix/prompt tuning, adapters or lora. But I think structuring and fine-tuning your underlying dataset is just as important. Say you have millions/billions of observations, how do you decide which data points to include and how to format your instructions for example? It would be nice if you went into that sometime. I think a lot of the work you have to do in order to get a good performance out of your LLM finetuning will be based on the underlying dataset you use to finetune your LLM.

Expand full comment
May 1, 2023Liked by Sebastian Raschka, PhD

Thanks for the post.

Expand full comment
Apr 22, 2023Liked by Sebastian Raschka, PhD

Thanks for the post. Most of the questions are answered.

Can you please clarify the difference between hard prompt tuning (assuming it’s prompt engineering, where we manually create prompts with some modifications etc., is that correct?) and soft prompt tuning.

Also, in soft prompt tuning, how can we differentiate between different tasks (QA, Summarization, classification etc.,). Will we be using different vocabulary for different tasks.

Expand full comment
Apr 22, 2023Liked by Sebastian Raschka, PhD

some practical context would be nice, e.g., where does setfit fit etc.

Expand full comment
Apr 22, 2023·edited Apr 22, 2023

German translation doesn't need any special prompting. just say "translate this to German: good morning" and it will work. Your article states other wise

Expand full comment