Sep 21·edited Sep 21Liked by Sebastian Raschka, PhD
I was able to run the code in the whole book locally with 32GB of RAM on Windows, even the GPT-2 XL model (but this was hard at my hardware limit). When running it inside the Docker container, I needed 8-12 GB of RAM additionally.
Nice timing! I hope you have a fun weekend ahead! The code in the main chapters of this book is should run on conventional laptops within a reasonable timeframe. Additionally, chapter 5 to 7 automatically run on a GPU if available.
About the padding vs not-padding: Can we keep the padding but instead of always taking the output from the last token, we take the output from the last token of the sequence for each sequence in the batch? Wondering if you tried that and if it helps? Thanks.
Wow, nice summary!
(fyi the link to the "Label Supervised LLaMA Finetuning" paper has an empty space)
Thanks! (And not anymore :D)
Highly recommended. I needed to max the dram on my laptop from 16GB to 64GB, fwiw.
I was able to run the code in the whole book locally with 32GB of RAM on Windows, even the GPT-2 XL model (but this was hard at my hardware limit). When running it inside the Docker container, I needed 8-12 GB of RAM additionally.
That's good to know, thanks for sharing!
Oh interesting. I was testing all code on my MacBook Air (24 GB RAM) where it worked fine tbh. Maybe depends on the OS.
Got the book yesterday... :^D
But what hardware is required/do you suggest?
Nice timing! I hope you have a fun weekend ahead! The code in the main chapters of this book is should run on conventional laptops within a reasonable timeframe. Additionally, chapter 5 to 7 automatically run on a GPU if available.
Thank you for the article!!
About the padding vs not-padding: Can we keep the padding but instead of always taking the output from the last token, we take the output from the last token of the sequence for each sequence in the batch? Wondering if you tried that and if it helps? Thanks.