9 Comments
Sep 21·edited Sep 21Liked by Sebastian Raschka, PhD

Wow, nice summary!

(fyi the link to the "Label Supervised LLaMA Finetuning" paper has an empty space)

Expand full comment
author

Thanks! (And not anymore :D)

Expand full comment
Sep 21Liked by Sebastian Raschka, PhD

Highly recommended. I needed to max the dram on my laptop from 16GB to 64GB, fwiw.

Expand full comment
Sep 21·edited Sep 21Liked by Sebastian Raschka, PhD

I was able to run the code in the whole book locally with 32GB of RAM on Windows, even the GPT-2 XL model (but this was hard at my hardware limit). When running it inside the Docker container, I needed 8-12 GB of RAM additionally.

Expand full comment
author

That's good to know, thanks for sharing!

Expand full comment
author

Oh interesting. I was testing all code on my MacBook Air (24 GB RAM) where it worked fine tbh. Maybe depends on the OS.

Expand full comment
Sep 21Liked by Sebastian Raschka, PhD

Got the book yesterday... :^D

But what hardware is required/do you suggest?

Expand full comment
author

Nice timing! I hope you have a fun weekend ahead! The code in the main chapters of this book is should run on conventional laptops within a reasonable timeframe. Additionally, chapter 5 to 7 automatically run on a GPU if available.

Expand full comment

Thank you for the article!!

About the padding vs not-padding: Can we keep the padding but instead of always taking the output from the last token, we take the output from the last token of the sequence for each sequence in the batch? Wondering if you tried that and if it helps? Thanks.

Expand full comment