17 Comments
author

Small correction: There was originally a drop from 0.783 to 0.028 for "All-layer QLORA" in the causative benchmark, which seemed like a significant drop that went unmentioned in my text.

This was because I was looking at the correct numbers in my notes but had an incorrect number in the table figure I prepared for the post. In reality, "All-Layer QLoRA" actually improves the benchmark: from 0.783 to 0.788. I have updated the table.

Expand full comment
Nov 20Liked by Sebastian Raschka, PhD

The article was very well written

Loved it.

Are the weights decomposed using PCA?

Expand full comment
Nov 20Liked by Sebastian Raschka, PhD

When we do this kind of experiments for fine tuning hyperparameters, are we supposed to repeat the training for several times with different seeds and take the average weights?

Expand full comment
Nov 20Liked by Sebastian Raschka, PhD

In section "Balancing LoRA Hyperparameters: R and Alpha", the setting of r= 256 and alpha = 128 obvious get the best performance. Why ?

Expand full comment
Nov 19Liked by Sebastian Raschka, PhD

thank you for this great article !

Expand full comment
Nov 19Liked by Sebastian Raschka, PhD

Thanks for the tip about the memory requirements for longer sequence lengths! I have been trying to debug a cuda memory issue for the dolly-15k dataset when running on a 24GB GPU and it's useful to have my thinking confirmed :))

Expand full comment
Nov 19Liked by Sebastian Raschka, PhD

Great article

Expand full comment

How would you compare the performance of QLoRA 4bit vs QLoRA 8bit?

Expand full comment