How to run Llama-7B on a laptop with 4GB GPU

DeepLearning Hero · Beginner ·🧠 Large Language Models ·2y ago
In this tutorial we will load and make predictions with the Llama-7B model using a Laptop with 6GB free RAM and 4GB GPU Github: https://github.com/thushv89/tutorials_deeplearninghero/blob/master/llms/llama_on_laptop.ipynb llm.int8() paper: https://arxiv.org/pdf/2208.07339.pdf Huggingface's accelerate: https://huggingface.co/docs/accelerate/index 00:00 - Introduction 01:52 - Initial setup 02:43 - Main libraries 04:04 - Compute specifications 04:30 - Using the accelerate library 06:33 - Using GPU, CPU and Disk to load the model 07:52 - Loading the model 08:10 - llm.int8() quantization 09:24 - …
Watch on YouTube ↗ (saves to browser)

Chapters (12)

Introduction
1:52 Initial setup
2:43 Main libraries
4:04 Compute specifications
4:30 Using the accelerate library
6:33 Using GPU, CPU and Disk to load the model
7:52 Loading the model
8:10 llm.int8() quantization
9:24 CPU offloading
10:25 Running inference
10:55 Running on colab
11:35 Wrap up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)