Round 2 - I use CodeLlama 70B vs Mixtral MoE to write code to finetune a model on 16 GPUs ๐คฏ๐คฏ
I try to see if I how well three different LLMs work for writing a python script to finetune a model on 16 GPUs (multi-node).
This video is not edited in any way. It shows a realistic workflow for coding without gimmicks or hype.
I ask CodeLlama 70B, Mixtral MoE to write a python program to finetune a computer vision model on the CIFAR10 dataset. You can validate all this for yourself by running the 3 studios for free:
This is an unedited video... so here are some corrections:
- To clarify what "based on Llama 2 means". Mistral 7B tweaks the way llama 2 does attention but is then pretrainedโฆ
Watch on YouTube โ
(saves to browser)
Chapters (26)
Introduction
0:40
Run CodeLlama 70B
1:13
Run Mixtral 8x7B (MoE)
1:34
Run Mistral 7B
1:47
How to get a GPU
2:08
What is a Lightning Studio
3:47
Basic CodeLlama 70B test
4:20
Basics of model monitoring
4:39
Connect a local VSCode
6:20
Basic Mixtral MoE coding test
8:46
Create the prompt to generate the ML code
9:04
Connect an S3 bucket
10:10
Full prompt for ML code
13:16
Prompt Mistral 7B
13:50
Debug the finetuning script
14:16
About the Lightning Trainer
14:56
Sanity check the finetuning script
15:30
Monitor with Tensorboard
16:20
About model RAM and model size
16:44
A quick TL;DR about profiling a model
17:40
Scale to multi-node (16 GPUs)
19:10
CodeLlama 70B results
20:00
About finetuning
22:10
Monitoring the 16 GPUs
22:54
CodeLlama 70B code results
25:35
Look at multi-node logs, weights
DeepCamp AI