How to run Llama-7B on a laptop with 4GB GPU
In this tutorial we will load and make predictions with the Llama-7B model using a Laptop with 6GB free RAM and 4GB GPU
Github: https://github.com/thushv89/tutorials_deeplearninghero/blob/master/llms/llama_on_laptop.ipynb
llm.int8() paper: https://arxiv.org/pdf/2208.07339.pdf
Huggingface's accelerate: https://huggingface.co/docs/accelerate/index
00:00 - Introduction
01:52 - Initial setup
02:43 - Main libraries
04:04 - Compute specifications
04:30 - Using the accelerate library
06:33 - Using GPU, CPU and Disk to load the model
07:52 - Loading the model
08:10 - llm.int8() quantization
09:24 - …
Watch on YouTube ↗
(saves to browser)
Chapters (12)
Introduction
1:52
Initial setup
2:43
Main libraries
4:04
Compute specifications
4:30
Using the accelerate library
6:33
Using GPU, CPU and Disk to load the model
7:52
Loading the model
8:10
llm.int8() quantization
9:24
CPU offloading
10:25
Running inference
10:55
Running on colab
11:35
Wrap up
DeepCamp AI