37 Comments

Fantastic write-up, thank you!

Small correction:

"The DoRA two-step process (decomposing a pretrained weight matrix and applying DoRA to the directional matrix) is further illustrated in the figure from the DoRA paper below."

applying DoRA to the directional matrix --> applying LoRA to the directional matrix.

Expand full comment
Feb 19Liked by Sebastian Raschka, PhD

layer_lora_1 = LinearWithLoRA(layer, rank=2, alpha=4)

print("LoRA output:", layer_lora_2(x))

minor typo, did you mean ("LoRA output:", layer_lora_1(x))

Thanks for great write-up

Expand full comment
Feb 18Liked by Sebastian Raschka, PhD

Great write-up! Clear, informative, on a super useful topic. Thank you for sharing!!

Expand full comment
Apr 26Liked by Sebastian Raschka, PhD

I have a OOM error in this line:

denominator = numerator.norm(p=2, dim=0, keepdim=True), is it will consume much more GPU memory. how can we handle this?

Expand full comment
Apr 24Liked by Sebastian Raschka, PhD

Thanks for your great writing.

Expand full comment
Mar 12Liked by Sebastian Raschka, PhD

Great write up. Thankyou

Expand full comment
Mar 10Liked by Sebastian Raschka, PhD

Great write-up! thank you!

Expand full comment
Mar 6Liked by Sebastian Raschka, PhD

as always, great post!

Thank you very much!

Expand full comment
Feb 21Liked by Sebastian Raschka, PhD

Incredible write-up and turn around on this.

Does it seem a bit slow that m is multiplied by both V and delta V? (I guess especially multiplying by V adds steps for the forward propagation). I don't really see a way around this though.

Expand full comment
Feb 19Liked by Sebastian Raschka, PhD

For domains where we have large amounts of raw data (eg: 10 Billion tokens or more), would peft methods like DORA/LORA combined with converting the data to instruction format (eg: AdaptLLM) be sufficient to adapt the model to the new domain or do we have to definitely perform Full Fine-Tuning?

Expand full comment
Feb 19Liked by Sebastian Raschka, PhD

Indeed good article. @rasbt I am surprise by result too, I would also be curious if we just do weight normalization of standard LORA itself?

Expand full comment
Feb 19Liked by Sebastian Raschka, PhD

Awesome 🤩 l was wondering can LoRa/DoRa be applied to vision models or object detection models as well.

Since pretrained models know how to detect / classify , it doesn’t know what to detect (out-of-distribution Concepts).

Expand full comment
Feb 18Liked by Sebastian Raschka, PhD

Great explanation, thank you!

Expand full comment
Feb 18·edited Feb 18

Seems like LeCun's server is down, currently I cannot download the MNIST datasets from there

Expand full comment