GPT-4 leaked! ๐ฅ All details exposed ๐ฅ It is over...
Key Takeaways
The video discusses the leaked details of GPT-4, including its architecture, model size, training costs, and capabilities, as well as the legal and ethical implications of training AI models on copyrighted material, with tools such as Moe, Google Blog, OpenAI, Transformers, and Nvidia A100 GPUs being utilized
Full Transcript
well the AI cat is out of the bag and there's no putting it back the people over at semi-analysis shared all the data they have on open ai's model gpt4 this includes model architecture training infrastructure inheritance infrastructure parameter count training data composition token count layer count the multimodal vision adaptation Etc things that we've been wondering about and speculating on for quite a long time unfortunately the thing costs about a thousand bucks so we don't get to see it until this guy reposts it for free now he did get hit with a copyright thing he had to take it down but as you know what happens on the internet well it stays on the internet so let's dive into some of the most important findings I'll try to put them towards the beginning use the chapters at the bottom if you want to skip around and at the end we're going to talk about what could this mean what is this going to mean for Google and Microsoft and open EI what is this going to mean for the open source Community Etc if you remember that Google memo that got leaked that might have been a little bit prophetic as in like get prophecy to what was going to happen with openai basically they were saying that even though open AI is beating Google in some respects both those companies they were getting beat they were lapped they were completely annihilated by this third faction that third faction was open source so the question here is what's going to happen with the opening eye with their lead with their advantage if a lot of this stuff gets out there if a lot of this stuff is shared with the open source Community as you'll see there's still a moat that exists around gpt4 but how long for we don't yet know so let's dive into it first things first here's the model size for up to 2022. so as you can see here gpg 3 is at 175 billion parameters here's the language model sizes up to August 2022. Lambda which is the Google model 137 billion parameters Palm code or Minerva which is the Google model that's 540 billion again there's gpt3 at 175 billion there's a pretty good illustration what of what that looks like so as you can see here open AI Chi gbt got a Microsoft is a part of it so you got the GPT 3.5 then you got Google and Bard they're Lambda and palm models you've got Chinese uh Ernie at 260 billion parameters the Ernie bot so a lot of people were wondering where does gpt4 fit into all this now some of these larger models are going to be about 500 billion is GT4 10 times more five times more is it smaller is it the same size you know the growth in model size is referred to as kind of a version of the Moore's law where basically 10xes every year here they're showing a 15 000 X increase in five years so here they're saying that GPT 4 is 10x the size of gpt3 or gt 3.5 and it's somewhere about 1.8 trillion parameters across 128 layers and layers are basically how the parameters are organized so parameters are organized in two layers each layer of the model represents different features of the input data so if you're thinking about something like an image recognition program some of the earlier layers they might be dealing with recognizing edges and colors whereas some of the later the deeper layers they're recognizing things like faces and objects Etc so if this is saying that they have 120 layers that means it's a deep architecture that can learn to do a lot of varied and complex tasks obviously gpt4 is probably one of the most advanced models out there next thing that's of note and this is an interesting one is they're using something called Moe mixture of experts basically instead of thinking of gpt4 as just one static model think of it as different experts in this case 16 different experts that each kind of do their own thing so I haven't been yet able to figure out which one of them which specific experts there are but for example there could be one that's dealing with coding there could be one that's dealing with taking the information and sort of outputting in the proper form like formatting or question answer format Etc so ammo East is a mixture of experts Moe Moe is not a new thing it's been around there's multiple studies with it Google mentions it in one of the blog posts from early 2022 I think so here's the AI Google Blog Moe with expert Choice routing so basically depending on what type of question what type of information you're asking for it routes you to a different expert that then answers your question so this is nothing new Google knew about it although I don't know if their models use it necessarily so I'm not 100 sure this paper seems to indicate that maybe the model that they built with Moe surpasses the performance of flan Palm the 62 billion parameter model so I think Google's models use Moe but I'm not 100 sure about that and so obviously the advantage that they're talking about is that Moe effectively and efficiently scales up language models without needing a rise in how much it costs and the resources required so and in this MLP thing they're talking about is a so an MLP is a multi-layer perceptron which is this thing right here no wait sorry that's is Optimus Prime never mind basically an MLP a multi-level perceptron is how these neural networks are basically that's basically how the neural networks work a perceptron is basically the basic unit of a neural network so basically this image this these are the different layers input layer hidden layer output layers Etc so this would be a multi-layer perceptron this is where I get a little bit out of my depth so I'm not going to continue with this you should Google it or even better ask Chad GPT and then they also talk about how the routing is done so basically while a lot of people think that there should be this complex way of routing your different questions and tokens to which experts should be answering them it sounds like open AIS is very very simple so this next part says they're roughly 55 billion share parameters for attention so you might be aware of the big paper Google released I think it was late 2016 or 2017 called attention is all you need that was one of the papers that did a lot of the breakthroughs in the space allowing for these AI models to become much more complex so the idea of Transformers came out of that paper before that these large language models they had problems remembering kind of like where they started talking about so as it started doing a paragraph by the time we got to the inner paragraph I kind of forgot what I was talking about up here basically they kind of had add and so attention or Transformers was kind of like a D and D medication and then they started like remembering what they were talking about and so that's what it's talking about here so 55 billion shared parameters so meaning that all the experts all the models kind of use 55 billion parameters for attention so again attention think of it as like the ADHD medication that you give these that you give these AIS so that it can remember what they're talking about and attention in these models is also very similar to how humans pay attention right you walk into a room you're not necessarily noticing every little Nick on the wall or some dust on the floor you're kind of like generalizing to see okay what's the important thing that I need to be paying attention to so it's a similar thing here with these 55 billion parameters allow these models to focus on what's important all right so next we have inference so inference is basically how these llms make prediction that's what it's called the ability to predict the sentences or predict which images you're going to use that's inference so I don't fully know exactly what that is but this is basically talking about the efficiency of how much it takes to generate answers to generate predictions right if a model is very expensive and very slow these numbers would be higher if it's fast and it's cheaper to produce these numbers will be lower so basically you want the best possible outputs at the lowest possible cost at the fastest possible speed so again I'm not sure exactly what numbers are good or bad but it sounds like gpt4 is doing a great job so I I assume these numbers are really good so next we're talking about the data sets how we train these AI models is we just kind of let them loose on large gargantuan amounts of data and with llms tokens are sort of like the basic units that text is broken in into so for example you can think of it as for example like words or letters so saying that it's trained on 13 trillion tokens means that it passed through that it processed 13 trillion pieces of text and so machine learning you have these epics so you can think of what an epic is is one Epic is when all the entire data set has been fed to the machine so if it's multiple epics then there's some multiple passes through the data and so here they're saying that there's two epics for text based data and four for code based data and there's millions of rows of instruction fine-tuning data from scale Ai and internally so fine tuning is basically taking a large language model sort of like the base model and tweaking it to perform to perform certain tasks a little bit better so for example if you have a model that is able to recognize images you can take that and let's say you just wanted it to be really good at recognizing dogs or human faces for example and so you would go in there and you would fine tune it to be great at that one particular task so Scala AI is a company that provides a lot of these high quality training data for AI and then internally is open ai's internal data sets whatever they're using so next we're talking about the sort of the context length and this I guess referred internally as cclem I guess sequence length cclean and then so the context length was 8K and then after training the 32k c-clen version of gpt4 is based on fine-tuning the AK version 8K version after the pre-training so next we're talking about the batch size and how was it was gradually ramped up over the number of days so the batch size was gradually ramped up over a number of days on the cluster by by the end opening I was using a batch size of 16 million this of course is only a batch size of 7.5 million tokens per expert due to not every expert seeing all the tokens so again depending on which expert you were routed to to answer your question that would decrease that batch size again that's a little bit outside of my knowledge Zone knowledge base so if you have some more insights about that please comment I read every single one thank you and then so this next one is is very interesting because here's their they're talking about what Hardware they're using and how much it costs and so let's come back to this thing in just a second but let's look at this training cost so so they were using somewhere like they were using something like 25 000 a100s which is that Nvidia GPU so it was probably something like this and cost about 25 Grand and they had 25 000 of them or more likely it was since they were probably training in the cloud so as far as I know open eye has never disclosed how they train their models what software they use it to use the cloud or they use fiscal gpus or whatever since they have that deal with Microsoft they probably use Azure Microsoft cloud service I think they actually get some not free but they get some sort of a sweetheart deal where they get to use azure's Network so they probably using that to train it and so if their cost the cloud was about one dollar per one hour of running these a100s so the training costs to train gpt4 would be 63 million and so today's equivalent would be the h100 tensor core GPU from Nvidia which Nvidia if you need me to review any of these things please just just send them I'll I'll find some time to do it just send them over and I'll review them no charge and so so they were running they were trained this for 90 to 100 days which if you think about it so you know these big corporations with a lot of money they still have a pretty big advantage to creating these large language models I mean yeah now with the h100s that might only cost you know 21 million 22 million to train in 100 days or excuse me for the h100 it'll be 55 days for the quote unquote older a100 will be 90 days to 100 days but still obviously this is Out Of Reach for most people most companies but the interesting thing is how fast the prices drop and how rapidly the the size of these models is increasing so I'd be curious to know at what point can somebody can can a small business you know let's say they have a six-figure budget how many years from now could they create GPT for at what point will it be sort of feasible financially feasible to do something like that might not be that far away and then 32 to 36 mfu so mfu is machine fabric utilization so it's basically how much of that time was you know how efficiently were they utilizing the computational resources and 32 to 36 percent so that means that a significant portion of that computational capacity was not used and so they're saying there was an absurd number of failures so where you had to restart from a checkpoint which is part of training a machine learning model so you can think of it as if you're going through a video game you had a certain checkpoint and you know the game saves and you restart but if you fail doing that part well you go back to the checkpoint so what they're saying here is looks like gpd4 there were parts of it where they keep they kept having to reset back to the checkpoint many many many many many times and that's what was causing such a low efficiency of using these machines of the eight 100 gpus so a couple other things like when we're talking about Moe a mixture of experts you know one of the in the research articles the research seems to suggest that sort of the optimal amount of different slices or Experts of the model to use a 64 to 128 experts that that achieves better loss than than 16 experts as is the case with gpt4 but of course the more you use it could be harder to achieve convergence and so one of the reasons why open AI probably chose to go with with a smaller number of experts than what research suggests is optimal it's probably because it was such a massive run this training run it was so big you know it costs you know we assumed 60 million dollars plus and that's just how much it costs to run the GPU use just how much to run the hardware for that so maybe they chose to be a little bit more conservative just to make sure that they could produce good results with that and so before openai had their DaVinci model and so gpt4 costs three times more in general you know per guess per prompt let's say it cost three times more that's why you're gonna see in a lot of these studies like for example the Nvidia Minecraft study where they had Chad GPT basically write code for Minecraft that it would execute to make a make it play better Minecraft you know what I mean Explore More and learn how to do all this stuff it's interesting because they used gpt4 in some areas and then they used GPT 3.5 turbo and others so for example to write the code they would use gpt4 to write the code comments and a little descriptions of what the code does and to make titles for what the code is they would use the cheaper faster model GPT 3.5 turbo and so one of the things you might remember when gpt4 came out they were saying that it was going to be multi-modal it was going to have a vision it was going to be able to see stuff and that was exciting but we we haven't really heard anything new about it since then since gpt4 was released although we saw in the demos we saw that it was real as far as if I remember correctly they did show certain applications where it seemed like it was working although you know we we didn't actually test it so we don't know if it's real or not we we saw what looked like a vision with gpt4 and so it looks like open AI wanted to train its vision from scratch but that was a little bit risky gpt4 were still sort of early not mature enough and so the primary point of vision and this is where to me personally gets kind of exciting but you know I think the goal for a lot of people myself included is to at some point be able to create these autonomous agents you know have this these AIS go and read the web and transcribe what's in images or videos or websites Etc now right now you can use some of that with like python code for example but vision seems to be what would completely just take get to the next level just completely blow it out of the water because instead of sitting there and using something like OCR the you know the image recognition software for python or whatever you know you would just have Chad GPT gpt4 you know be able to look at the web page or an image or a video and just sort of gain Insight from it figure out what it's saying I mean for those of you that have had a chance to work with Chad gpt's code interpreter which is now should be available to everybody with a pro license if if you don't have it check your settings in chai GPT and then scroll to the top and it should be one of the options under gpt4 code interpreter so as we've seen there it's getting pretty smart about being able to look at data and create images and even recognize images so we're not that far away from from it with what we have existing but really gpt4 Vision as it was sort of promised it we're not quite there yet and so it seems like what they do have right now it's still behind behind closed walls where we don't have access to it but they have Vision that's able to read text and they're able to read little scripts like latex to where if there's sort of a printed graph they're able to kind of figure out what what what that graph looks like what it says and openai also has whisper So that's its ability it's able to listen to audio and then transcribe it so for example if you have a video playing or a podcast playing whisper would allow you to transcribe it and feed that into Chad GPT and then it would be able to understand what this podcast is saying another thing that they talked about is a speculative decoding and this is something I've seen in other studies as well I haven't had too much of a chance to look into it so I apologize if my understanding of it is a bit fuzzy but as far as I can tell you know since gpt4 it's smarter but slower and more expensive and for example GPU 3.5 turbo is faster cheaper Etc but it's not always right or it doesn't handle some of the more complex tasks as well so for example for coding sometimes it's definitely not as good as gpt4 and so one thing that people have have been testing is they'll they'll ask a question to both them at the same time and once the um the smarter model starts answering once it sees those first few tokens rolling out then it'll give that to the faster cheaper model and and that will take over and that seems to produce pretty good results for cheaper you know for less resources so you think of it as like you know you ask a question from a smart person you're like is it probable that it'll rain today and the person starts yes it is probable and the faster model I'm sitting completes the sentence for him yes it's probably rain today right so the the smarter model sort of sets the direction of where that answer is going and the faster cheaper model picks it up and continues with it at least that's what I understand it as and you might have heard about the conspiracy theory about you know gbt4 or whatever being not as smart as people have been saying like I got a lobotomy like it's just not a smart it's not able to do certain tasks well and you know open ai's official response is like well GPT is just it's the same thing nothing's been changed well both of those might be true in this sense because gpt4 might still be gpt4 but the responses we get from um you know Chad gbt sort of that sort of customer facing thing it might be using this speculative decoding so basically gpt4 starts to answer and then the faster model picks it up and then screws it up so that's that's one that's a spec speculation of why that might be happening and so they're using that to they're using that to save costs basically and so one of the questions that's been asked about opening AI where they get the their data training from there seems to be this idea that they have some secret underground data that nobody else has now you've probably seen that gbt4 pass a lot of the exams like bar and a lot of other Advanced exams for for college level courses pass a lot of the multiple choice questions in professional and academic sort of settings and exams and so the rumors and the speculations is that a lot of that sort of secret data is just textbooks lots and lots of textbooks and so that would explain why gpt4 seems so smart to anybody that uses it right so if you're like a college professor and let's say you're a computer scientist you ask questions about computer science and it tells you the textbook answers right and so that's because it's been trained on those or if it's talking about biology or philosophy or whatever and so there's some rumors and there's certain things that kind of maybe Point towards it I'm not 100 sure but that could certainly mean trouble because I you know that could be legal trouble for open AI that could mean people start asking questions about hey where where did they get this data from and of course you know Europe has very strong laws against using stuff like that I think in the US you know we it's kind of nebulous still we don't really have too many laws about how data is used to train AI models I think we're just beginning to talk about it Japan is on the other side of the spectrum they're just like there's no copyright whatsoever that applies to training AI models right so you can if you're trading an AI model you can use anything with without regard for copyright laws right so any artwork any movie any whatever just use it if you're for for the purpose of training AI which personally to me makes a lot of sense and I know that's a controversial opinion a lot I know a lot of artists are very much against training AI on the artist's Works in order for that AI to then spit out art I get that that's you know problematic for a lot of people it's going to put a lot of people out of business but there is an argument to be made that's how humans learn as well writers read a lot right other artists study artists like themselves to learn so this is the same way we we train ourselves to become better is doing the same thing but obviously you know obviously you can produce a lot more than one person right it can produce more art than the entire human population so there's that so I hope that you found that interesting if I missed anything and that's very possible that I got something wrong or missed something do comment down below you can be nice about it you can be rude about it I don't care but explain what you're talking about give us some context so that we can verify what you're saying and maybe learn a little bit from it what does this mean for Google for Microsoft for openai what does this mean for the open source Community does this mean that there's going to be an open source gpt4 in sometime later this year I'd love to know what you think comment down below if you're interested in learning more how to use this technology even if you have no coding background whatsoever that's the very thing that I'm working on next check out the show notes I link to natural20.com that's my website sign up for the newsletter and as soon as that course gets rolled out you'll be the first to know about it my goal is to create an autonomous AI agent that is able to make money and run a business and it is beginning to seem like we're getting closer and closer to that being able to happen if that's something that's interesting to you stick around my name is Wes Roth and thank you for watching
Original Description
#ai #openai #gpt4
๐ฅ Get my A.I. + Business Newsletter (free):
https://natural20.com/
[TIMELINE]
[00:00] The Leak
[01:34] Model Size
[03:27] MoE
[07:04] Inference
[07:46] Data Sets
[09:09] Context Length
[09:32] Batch Size
[10:07] Training Costs
[14:07] 3x Cost Increase
[14:54] Vision
[17:33] Speculative Decoding
[18:50] Why ChatGPT got dumb
[19:41] Mystery Data
Watch on YouTube โ
(saves to browser)
Sign in to unlock AI tutor explanation ยท โก30
Playlist
Uploads from Wes Roth ยท Wes Roth ยท 42 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
โถ
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Which Vanguard index fund to buy? (hint: it's the one Warren Buffett recommends)
Wes Roth
What does PALANTIR do - Palantir Stock, Founder, Controversy Explained Simply (plus why I'm BUYING).
Wes Roth
Paypal misinformation fine ($2,500) - Close Your Accounts ASAP!
Wes Roth
China Was Just Sent Back to the Dark Ages | US starts aggressively cutting ties
Wes Roth
ChatGPT Business Ideas - How I Use ChatGPT to make money
Wes Roth
ChatGPT Explained - The AI revolution is happening right now... [ chat gpt ]
Wes Roth
ChatGPT Banned - New York blocking network access to ChatGPT
Wes Roth
ChatGPT Trading - this [INSANE] tool A.I. built for me
Wes Roth
Small Business Grants for ChatGPT and A.I. (similar to PPP and EIDL in 2023) |
Wes Roth
How to Make Passive Income with ChatGPT AI
Wes Roth
OpenAIโs GPT-4 Artificial Intelligence = AGI? TRILLIONS of Parameters Plus THIS
Wes Roth
How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode
Wes Roth
John Carmack | AGI by 2030 | Will John Carmack's AI company be the one to make it?
Wes Roth
AI Small Business Grants
Wes Roth
Elon Musk attacks OpenAI - here's Sam Altman's response
Wes Roth
Bill Gates on ChatGPT and OpenAI "The Age of AI has begun"
Wes Roth
Sparks of AGI | Microsoft Researchers claim GPT-4 Is showing "Artificial General Intelligence"
Wes Roth
Elon Musk and Others Call for Pause on AI as GPT-4 shows signs of AGI.
Wes Roth
Comparing GPT-4 and Google's Bard AI - Who is getting closer to AGI?
Wes Roth
Sam Altman on UBI, OpenAI to $100 TRILLION and Massive Job Losses from AI Automation
Wes Roth
25 ChatGPTs play a videogame...
Wes Roth
NVIDIA's new AI: Better Games, Art and... better life?
Wes Roth
Google AI Documents Leak about "Google and OpenAI"
Wes Roth
PaLM 2 vs GPT-4 | why Google is having a hard time catching up...
Wes Roth
How To Access ChatGPT Plugins | They are LIVE! (but hidden)
Wes Roth
Sam Altman to Congress "America HAS to lead the world in AI"...
Wes Roth
Sam Altman Opening Statement to Congress on AI Regulation
Wes Roth
Sam Altman Congress Hearing "AI is the Biggest Threat to Human Race"
Wes Roth
Tree of Thoughts - GPT-4 Reasoning is Improved 900%
Wes Roth
Governance of Superintelligence | OpenAI proposes measures for safe AI development.
Wes Roth
Model Evaluation For Extreme Risks of AI | Google DeepMind and OpenAI Paper
Wes Roth
Minecraft AI - NVIDIA uses GPT-4 to create a SELF-IMPROVING ๐คฏ autonomous agent.
Wes Roth
AI Human Extinction Risk - Experts Warn of "Serious Risk"
Wes Roth
LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply
Wes Roth
99.3% of ChatGPT Performance with OpenSource AI - [QLoRA paper]
Wes Roth
AlphaFold2 Explained | Google's DeepMind Solves Protein Folding
Wes Roth
Illumina AI - ChatGPT for your genome...
Wes Roth
Text to Video Invasion! Runway AI releases GEN 2 text to video.
Wes Roth
LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.
Wes Roth
AlphaDev - DeepMind AI Discovers Better Algorithms for Foundational Computing
Wes Roth
OpenAI GPT-4 Function Calling: *HUGE* Potential
Wes Roth
GPT-4 leaked! ๐ฅ All details exposed ๐ฅ It is over...
Wes Roth
Elon Musk announced XAI - the answer to OpenAI = X.AI
Wes Roth
Andrej Karpathy GPT - Advice for building AI agents
Wes Roth
TEST TO SEE IF AI CAN MAKE $1,000,000 (modern Turing test)
Wes Roth
ChatGPT custom instructions are *POWERFUL* Replace AutoGPT and BabyAGI?
Wes Roth
WORLDCOIN LAUNCH is starting! Backed by Sam Altman of OpenAI.
Wes Roth
WORLDCOIN ORB - I went to L.A. to get my eye scanned for WorldCoin [my experience]
Wes Roth
The Biggest Week of AI News In Months!
Wes Roth
Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots
Wes Roth
AI News is Getting *WEIRD* Human Brain Matter in Chips. OpenAI tutorial. Amazon unleashed it's AI.
Wes Roth
GPT 5 release date ๐ฅ might be closer than we think | OpenAI applies for GPT-5 Trademark in the US.
Wes Roth
AI Agents Simulate a Town ๐คฏ Generative Agents: Interactive Simulacra of Human Behavior.
Wes Roth
Proof that AI Understands? ๐ Andrew Ng on LLMs building mental models, Othello GPT, Geoffrey Hinton
Wes Roth
OpenAI acquires Biomes ๐ an open-source MMORPG. ChatGPT plus Minecraft? ๐ฅ
Wes Roth
OpenAI announces FINETUNING ๐ for ChatGPT
Wes Roth
Autonomous AI Agents - why YOU should be building them... and HOW.
Wes Roth
ChatGPT Enterprise - OpenAI launches the next BIG thing
Wes Roth
HOODWINKED - AI gets away with MURDER ๐ GPT-4 is an effective killer...
Wes Roth
Install Open Interpreter in 2 min | The free, open source CODE INTERPRETER!
Wes Roth
More on: LLM Foundations
View skill โRelated Reads
๐ฐ
๐ฐ
๐ฐ
๐ฐ
LLM Tokens Explained: Cost, Memory, Speed and Context Windows
Medium ยท AI
5 Best Time-Aware Memory Layers for Long-Term AI Agents (2026 Guide)
Medium ยท Machine Learning
5 Best Time-Aware Memory Layers for Long-Term AI Agents (2026 Guide)
Medium ยท LLM
LLM Cost Optimization: Cutting Inference Bills Without Killing Quality
Dev.to AI
๐
Tutor Explanation
DeepCamp AI