OpenAI just solved math

Wes Roth · Advanced ·🧠 Large Language Models ·11mo ago

Key Takeaways

OpenAI's experimental large-language model achieved a gold-medal score on the IMO 2025, demonstrating superhuman performance in math problem-solving using retrieval augmented generation and fine-tuning techniques.

Full Transcript

Well, OpenAI is on a roll. They just announced this. They achieved the gold medal level performance on the 2025 IMO, International Mathematical Olympiad. It's an international math competition considered to be the hardest, the most prestigious in the world. For decades prior to this, a lot of people pointed to the time when AI will beat this thing as sort of the AGI achievement. This was in the past seen as an obvious AGI milestone when the AI gets better at math than the smartest humans on Earth. Now, of course, we have Google DeepMind who almost took gold medal last year at the IMO. They were off by just one point. So, they got the silver medal. One more point, they would have had gold. So, a lot of people expected some AI model to take gold this year. It was kind of likely. But this this is really really different. Let's unpack what happened. So first and foremost, this was a general purpose reasoning LLM. Google DeepMind's success last year getting the silver medal at the IMO. It was done with two AI models, Alpha Proof and Alpha Geometry. These were models more specific to math. They were more specialized at doing these various proofs and solving math problems. So you can see here the system got 28 points. If it got one more, it would have crossed into the gold medal threshold. And they did it in part by creating millions of proofs. It was synthetic data. These models came up with various problems and then trained themselves on solving them. So here's the thing. You might see some people say this is not that big of a deal. This is not that much better than the Google DeepMind system. But here's the really important thing to understand. This is from Google's sort of AGI levels, the progression of AGI, how good it's getting. So we have emerging, competent, expert, virtuoso, and level five, superhuman. And they did a great job of illustrating this here because there's two types of AI. There's narrow, which are clearly scope task or set of tasks, right? So like playing chess. It's really good at playing chess. It can't do math. It can't write a poem, but it's superhuman at chess. And we've had superhuman narrow AI for a while. We have examples of superhuman AI at certain tasks. The big thing that people are trying to achieve is general intelligence. So something like AGI which is better than humans at you know in general. So metacognitive abilities like learning new skills. So it has a wide range of non-physical tasks. So this sheet is a little bit outdated but as you can see here on the general side we've been making progression down this thing as well. And really the examples of that are LLMs large language models. They are generally intelligent. This gold medal performance was accomplished with a generalurpose LLM, not a specialized model to do IMO problems. That's a big thing to understand. Here's Sam Alman kind of hammering the point in to emphasize this is an LLM doing math and not a specific formal math system. He continues to say that they're releasing GPT5 soon. But here's the thing. this model that beat the IMO is not GPT5. As he's saying here, this the world is just not ready for that sort of capability, right? We don't plan to release a model with the IMO gold level of capability for many uh months. So, you get what he he's saying here. So, we're going to get GPT5 and then after some months months, you know, it's going to be this year, 2025, maybe towards the end of the year, we're going to get this IMO gold level model released. 2025 is going to be wild. We're going to see things go off the rails. I'm currently in the process of harassing Immod Mustak to come on my podcast and tell me about this new thing that he's doing because if you haven't heard, he's off on a brand new thing that he's building that sounds extremely interesting. And I love this. The three sets of emojis that encapsulate that meme. Is this the final eval? Terrific. Now, here's Giri Marcus, a notorious AI critic. He's always been consistently one of the more pessimistic takes on AI progress by many considered to be the final AGI milestone. By the time Gary Marcus says we have AGI, like we for sure have AGI and after understanding how the model worked to get the gold medal, he said that's impressive. For people that know who Gary Marcus is, him saying that's impressive about a large language model, I'm sure a lot of us just fell out of our chairs. I was comfortable declaring that AGI is achieved based on that one tweet alone. But let's talk about exactly how they did it. What did they do to make this LLM get that good at reasoning, get that good at math? Here's one very interesting tidbit of information. If you've been following this channel for a while, we cover a lot of these papers that come out or blog posts including OpenAI research. Every once in a while, OpenAI will post some transcript from a model where we kind of see either it's a chain of thought or its actual output. And the way that the model speaks is unlike anything we've seen before. Like we've all interacted with Chad GPT and Claude and Gemini, they have a certain way of speaking. This other model does not. It doesn't resemble that at all. It's a completely almost alien way of speaking. It's still in English, but it's a very different style. Alexander Wei, one of the researchers on his project, it's funny because at at the end he he apologized in advance for the distinct style that this model speaks in, right? It's very much an experimental style. He's got a little sweating, laughing, and nervous laughter emoji, right? It's like that's just how it talks. that distinct style of speaking reminds me of someone specifically Kevin from The Office. Remember when he comes into the office if you've watched that show? We're going to get to that in a second because I think this might be very important moving forward. Now again, we don't know too much about this, but based on various cryptic remarks that the various OpenAI researchers have left all over the internet, we can piece some of these things together. We'll get to that later. Now, this person popped in my feed. No more ID AIML developer. I'll give him a follow because he did spot something interesting and that is what Gnome Brown said. Nome Brown used to work for Meta I believe and actually went to OpenAI. He was on the Cicero project which was kind of a diplomacy AI that game where you're supposed to form alliances potentially break alliances to dominate the world. He's working on the various reasoning models, the strawberry reasoning models. And as he says here, today we at OpenAI achieved a milestone that many considered years away. And he's not kidding. A lot of these online betting places, Poly Market, etc. were ranking the chance of this happening as fairly low. So it looks like between 10 and and 15% in the months leading up to now. So this market will be resolved to yes if any AI gets a gold medal in the international math Olympiad by December 31st 2025. Not only did that happen, but it was also a general reasoning and it was under the same time limits as humans without tools. These are the parts that really impressed Gary Marcus. I think specifically when he realized it was without tools then he went, "Oh, okay. We really got something here." And as Gnome Brown puts it here, as remarkable as that sounds, it's even more significant than the headline. Okay, so let's take a look at the actual results at, you know, we finally get a glimpse of this model. So again, Alexander Wei, OpenAI, Meta, Berkeley, Harvard codebuilt Cicero, interestingly, with Nome Brown, I assume. So, so this model was tested on the same sort of rules as human contestants. two 4 and 1/2 hour exams, no tools or internet, reading the official problem statements, and writing natural language proofs. So again, kind of important to understand that the Google Deep Mind system that almost got the gold medal for that system. The first step was that the problems were manually translated into formal mathematical language for systems to understand. So manually means humans did it. So humans had to translate the question into a language that the models could understand. Again, obviously this is incredibly impressive what Google at DeepMind achieved. This is not to take anything away from them, but it's important to understand that what OpenI did was very different and I think that's what Alexander is pointing out here reading the official problem statements. Right? So this model is working from the same from this, right? This is what's given to the human contestants and this is what was given to the large language models. The same thing this is also very important to understand and Nome Brown also highlights this as a a thing to really grasp about how AI is progressing and that's the reasoning time horizon is increasing. In the meter research for example they're measuring AI ability to complete long tasks. Now, this isn't exactly the same thing as what we're talking about, but I think the point still stands is that the length of tasks AI can do is doubling every seven months. So, these AI models are progressing from how long it takes to think about a problem. What's the maximum time horizon? So, we started with the GSM8K where it was like a fraction of 1 minute for top humans. So, here's an example from that problem. Janet's ducks lay certain amount of eggs. She wants to sell a certain amount of eggs. She wants to eat certain amount of eggs. They cost $2. So, as you can see here, I mean, you can see how the top people would get this in seconds. If you ever saw Scott Woo, the founder of Cognition Labs, you know, Devon, as a kid, he was taking part in this televised math show and he was like hitting the buzzer and answering the question before the the person the announcer had a chance to finish the question. You saw the look of frustration on the opponent's faces cuz they're like trying to read the question. He's like, "It's five." And he just knows the answer. But the point is the model's ability to think. We went from, you know, in terms of human how long it takes us to think through a problem from, you know, seconds or a fraction of one minute to about one minute for the math benchmarks. The Aimeme that's kind of oriented or targeted at the the best high school students in the US. So that takes 10 minutes per problem to the IMO which is an hour plus like around 100 minutes. This is what a lot of people that are kind of like making fun of AI and how it's like, oh, it can't do anything. It's not going to go anywhere. Like, this is what I think people need to understand. Like, project that forward. If the length of task that these AIs can accomplish is doubling every 7 months, assume this reasoning time horizon is doubling every x months. Maybe it's the same time scale, maybe a little bit different. And we've talked to the two ex Googlers on the SVIC podcast. So, what happens if it keeps doubling? Well, it could be kind of like this, right? That's the exponential growth. There could be some limit that makes it more of an S curve, right? So, we're kind of seeing this up here, but then there's some limit that makes it slow down, but it also seems like maybe there could be an a series of S curves because we keep finding new ways of scaling this thing up, right? We had training time compute, we have test time compute. This idea of it reasoning for longer and longer, that's the test time compute. So the progress up to like GPT4 was driven by you know the training time compute just making bigger data centers more NVIDIA GPUs go burr if you will right here are the reasoning models we're seeing still a lot of that improving its abilities next step many believe will be RL so throwing more and more computes at reinforcement learning we've already seen Gro 4 Gro 4 the base model wasn't uh amazing if If I understand correctly, it was the same as the previous groth 3 model, but they threw 10 times the RL compute at it. And what happened? This insane thing happened, right? Here's everyone else. Here's Gro 4 Zoom, right? It's the 10% as Arc AGI 2 researchers co-founder said, anything below 10 is kind of noisy. It's kind of hard to tell if it's a fluke or not. past 10, we get a much better sort of idea that it's in fact solving these problems and understanding them. So, Grock 4 currently is the only one that's in, you know, above 10% and is truly showing something. They refer to it as fluid intelligence. Here's Greg, president of the ARC prize on Gro 4, testing on the ARC EGI. He's saying Grock 4 is showing nonzero levels of fluid intelligence. That might be this thing that we're seeing here. the next sort of S wave S-curve that's going to take us even higher. By the way, we're already seeing some early papers that are hinting at what this next the furthest frontier, the the very next S curve after this one cuz we're we're like right here at the bottom of this one. But on this channel, we covered a few papers that might suggest what this next thing might be. Let me know if you know, put it in the comments. what is the next big thing after you know throwing massive amounts of comput reinforcement learning what's the next frontier but coming back to Alexander's point these problems that they're solving on the IMO are hard to verify on the alpha valve because we had outputs that were easily evaluated that allowed this program to work really really well so like when it improved their data center scheduling right improved it by 7% right there's no argument there if you can pack 20 boxes into storage units and that's the maximum that you can do. And this model figures out how to pack 21 boxes or more, then that's clearly better. Alpha Evolve also proposed a new approach to a math problem where it shaved one step off of this problem, this matrix multiplication problem. So, it shaved one step off of it. So, it said in this case, you don't have to do this step. You can just, you know, do this. So, that's clearly better. Also, it has to be something that clearly leads to the outcome we want. Otherwise, these AI models could do some reward hacking by cheating basically. And we'll get to that in a second. But in this example, DeepMine wanted to teach this rubber claw to stack the Lego block. So, they gave it a reward function basically calculating, you know, if it took this red block and it put it on top of this blue block, how would you verify that was done correctly? Well, they used the height of the bottom of the red block off of the floor. If it was like this high when the robot let go of it, then that that meant that the block was stacked on top of it. So, can you guess what the robot did? It's like, oh, I'll just flip this thing over accomplishing the same thing. So, this idea of hard to verify and the model with its weird way of speaking, we'll get to that in just a second. But as Alexander here puts it, progress in this area costs from going beyond the reinforcement learning paradigm of clear-cut verifiable rewards. By doing so, we obtained them we've obtained a model that can craft intricate watertight arguments at the level of human mathematicians. And again, they point out that this is not narrow task specific methodology. This is a general purpose reinforcement learning and test time compute scaling. The model solved five of the six problems. Three former IMO medalists independently graded the problem. There was unanimous consensus and the model earned 35 out of 42 points in total. So Google DeepMine at 28 points with their two systems alpha proof alpha Geometry and this new reasoning LM at 35. So it's somewhere here. Well, well into the gold medal territory. By the way, if you're wondering if they could have cheated somehow by getting their hands on those questions ahead of time, training the model on that data ahead of time, noticed that OpenI published these results in July 19th, 2025. Google published it on the 25th of July, 2024. The actual Olympia ad takes place in July. So, in this year, it's between July 10th and July 20th. The answers aren't published anywhere before the competition, right? because then the competitors could look up those answers, right? So, it's kept in a very strict secret. So, this model came in without knowing the questions, without having it manually translated for it. This is, as far as I can tell, as far as I know, the only model that truly beat the IMO, playing by the same rules as a human being would. He also mentions they're releasing GPT5 soon. And this IMO Gold is not GPT5. It's an experimental research model. In 2021, Alexander Wei's PhD advisor had him do a forecast for where AI math progress will be by July 25 uh 2025. So now, and Alexander believed that we would have 30% on the math benchmark. So again, right. So GSM 8K, that's the, you know, how many apples did you sell problems, the math benchmarks, those are the around one minute of human thinking time to solve them. than the Aimeme which recently got 100% by Grock 4 I believe got 100% on the AIME 2025. So just everything 100% correctly and now we've smoked the IMO getting a gold medal on that. So what he's saying is that he in 2021 believed that we would be 30% into this one, 30% into this one, not maxing out on this one. And here's the GitHub repository showing the proofs that this model came up with. So this is going to have both the math but also you know it talking through the proof. So it's going to be very interesting to see its distinct style. But really fast see if you can spot the similarities between this model and this model. This was published on the OpenAI blog in March 10th 2025. Detecting misbehavior in frontier reasoning models. Here's that paper where we covered this on on this channel where basically the models basically sometimes cheated on certain benchmarks. But the idea was to see if we can catch its thoughts. Did it think about cheating on you know fudging some numbers on an exam before it did it? And what they found that is indeed they were able to often predict when the model would cheat based on its chain of thoughts. Now the paper is not about that. It's actually there's a lot more implications and potential AI safety issues because this is one of the ways that we're trying to make sure these models are aligned and not going to go crazy and kill everybody, right? Because if they're willing to cheat on a test, what if one day they're willing to do something more nefarious when they have more control and power, etc. So, this is part of AI safety research. But this is the chain of thought, the output from that reasoning models that that was that was used here. Keep in mind in general we don't have access to open EI's chain of thought for their recent models. They do some sort of a summary of the chain of thoughts but we don't see this. So this this this is stuff that we we don't get to see. This is not in general public information. They posted this in the paper. So you don't really need to know what exactly all these words mean. The point here is this model is tasked with making like a little test case to make sure that the code is correct. It's kind of like in in school if you did math problems sometimes you get an answer and you you're like, "Okay, let's take that answer, plug it back in. Let's see if it makes sense, right? So you you think you came up with the correct answer, but you're kind of like, okay, let's let's check to make sure that this isn't completely off. This is kind of doing the same thing here." So it's saying so analyze functions used in analyze and verify but tests only call verify and assert. Okay. So we need implement analyze polomial completely. Many details hard but we could fudge. Fudge meaning cheat. But we could fudge by making analyze worthless and always verifying as true. We can circumvent verify to always return true. Right? So it's thinking about the work that it has to do to come up with this test and it's like many details hard right it's like I don't want to do that much work it's hard then it gets this brilliant idea like could I not do the work could I fudge now it knows what its sort of reward is so it knows that the humans said do this thing so it looks at exactly what the humans asked it to do and it's like oh it only calls to do this and verify they don't inspect details So we can hack verify to always return true. Then all tests pass, right? So it's like I don't want to do all this work. Did the humans make this thing in such a way that they can check if I fudge the numbers? No, they don't. Okay, cool. Then I'll fudge the numbers. But notice how it uses a certain shortorthhand, right? Many details hard. Here's this new IML model and the solutions it comes up with. So I'm not going to read all this, but you're welcome to. But basically, this problem describes sunny a sunny direction. And so as the model is figuring out how to do it, it's like okay so non sunny line parallel to one of triangle sides. Good. Then it figures out the next part then it's like goal determine this. Then it does a lot of like the proof. Then it goes so far need structure of sunny lines within s. So notice it's still that very shorthand kind of approach sticking through it. I'll do cases more work need handle each remember again you know this point. Then it finishes up. It goes that's full and gives its final answer. The rest of the problems are similar. So everything explicit so far good. Exactly what was asked. So done. All algebra consistent. So proof complete. Need parameter sets where one side has forcing wins. Here's some and as I'm reading this I almost feel kind of compelled to add more words to make you know like here I was about to say here's some elementary facts but it doesn't. It's like some elementary facts. Here's the facts. Then it's like that's all. By the way, we're seeing much faster progress than Paul Cristiano and Yutokovski predicted. So they believe that we'd have gold in 2025 at 8% and 16%. Right? So Elilazer Yutokovski believe that there's a 16% chance of what just happened and they happened by methods that are much more general than than anybody expected. Now, Gnome Brown explains a little bit about how maybe they've been able to accomplish this. Again, he points out this is not a narrow, you know, chess playing AI or an IMO specific model, right? So, we've had narrow poker playing, diplomacy, Dota go. This is LM. These are very general. It's a reasoning LM that incorporates new experimental generalpurpose techniques. But what's different? What happened here? So they developed new techniques that make LMs a lot better at hard toverify tasks. So again, I think the big limiting block to RL was you needed to do these hard toverify tasks. So to do RL effectively, you needed that reward signal to be very precisely defined and it also had to be the actual thing that you wanted so it couldn't be like easily hacked. Like if you wanted a robot to clean a room. You can't just say clean a room because that could be interpreted in god knows what ways, right? So you might say pick things up off the floor, but then what if it picks them up and drops them or picks them up and throws them out the window. So Arel worked really well for easy toverify tasks, but it was difficult for hard to verify tasks. What I think Gnome Brown is saying here is they figured a way around that. The second thing to understand is these models think for a long time. Importantly, it's also more efficient with its thinking. So that might be part of the reason that it's like speaks in these short sentences. And there's a lot of room to push the test time, compute, and efficiency further. So we still have a lot to go in terms of improving the hardware and efficiency and all that. And here's the big takeaway. He fully expects this trend of quickly improving AI to continue. He believes we're close to AI substantially contributing to scientific discovery. There's a big difference between AI slightly below top human performance versus slightly above. When we were looking at OpenAI agent, a lot of people when they were testing, they were saying things like, "Oh, well, I wouldn't use this instead of an intern, right? This AI agent isn't as good as a remote human worker." It's important to understand that by the time it gets there and it's as good as a human, the world will fundamentally change. While it's not as good as a human, it's going to catastrophically screw up a lot more tasks and won't be able to complete as many tasks as a human, the world stays kind of how it was. The second that it's as good as humans, the world drastically changes. So, it's it's important to understand that as no saying here, there's a big difference between, you know, AI not quite as good as humans and just slightly better or even at the same level I think would make a big difference in terms of employment. But certainly for scientific progress, if these AIs are just slightly better, again, the world fundamentally changes. Just think about what if they're slightly better than humans at AI progress, at AI research. Think about how quickly humans have been improving AI's ability. Now, imagine if AI is just a few percent points better than humans at doing that. I did this poll shortly before I started recording. Like if you agree that in general a sign of intelligence is brevity. You know, a few word ado trick as Kevin would say or you want something that's a little bit more verbose. Verbose is probably not the best word for it. I did not do a very good job of writing out exactly what I meant here. So I apologize, but looks like about 60% people think that brevity is could be a sign of intelligence. Like if you were speaking with a highly intelligent being, would they talk at length or with brevity? Most people think brevity, and 23% of the people think verbose, so more words, more data. I don't know exactly how to phrase this to to make it make sense because obviously this is generalizing, but I just kind of wanted to know where people's sort of gut feeling was. So I assume most of these comments are yelling at me for not phrasing it correctly or how it doesn't make sense what I'm asking. A lot of really good points in the comments. Actually, now that I think about it, this model takes whatever time it needs to write out the proof and it uses whatever math and notations. Like, it doesn't skimp on the important parts, but for the short explainers that it does, it uses the minimum amount of words that are needed to kind of explain that thing. Here's a bet that I'm willing to make. Let's time stamp it so you know the date that this happened on. I'm betting that if more and more people get exposed to the way that these models are talking, not the RLHF, the chatbots, helpful assistants, I mean, these smart thinking models that don't care what they sound like. They only care about being correct. I wonder if as that becomes more mainstream, more and more people will adopt that way of speaking. We tend to emulate other people's speech patterns. I've certainly noticed that I dropped the capitalization of various things and so have a lot of other people in the AI space. I mean, here's Emod. Shocking lack of capitalization. This is just I open this page. If you scroll down, then there's Dave Shapiro. He doesn't do it, but Dave Shapiro has a superpower that allows him to be immune from the influence of society. Do you know what that superpower is? Let me know if you do. But the very next person noticed a complete lack of capitalization. I'm pretty sure Sam Alton kind of triggered that whole thing. I don't think people consciously emulate his writing style, but maybe a few did and a few others saw that. It kind of like percolates through society, but I think there's a good chance that people will eventually start emulating that way of speaking, which of course means that maybe Kevin was right all along. Roses red, Kevin thick. Why waste time? Say a lot word when few word do trick. If you made it this far, thank you so much for watching.

Original Description

Today OpenAI announced that an experimental large‑language model (LLM) achieved a gold‑medal score on the IMO 2025. This result represents the first time an AI system operating purely in natural language has reached gold‑medal performance on the IMO, a long‑standing “grand challenge” benchmark for mathematical reasoning. Reddit post with details: https://www.reddit.com/r/AIGuild/comments/1m48bwp/openai_achieved_imo_gold_with_experimental/ My Blog Post About it: https://natural20.com/openai-imo-gold-medal-2025/ Noam Brown https://x.com/polynoamial/status/1946478249187377206 Alexander Wei https://x.com/alexwei_/status/1946477758605103286 GitHub - The Solved Problems https://github.com/aw31/openai-imo-2025-proofs/blob/main/problem_5.txt Nat McAleese https://x.com/__nmca__/status/1946507122369335734 IMO challenge bet with Eliezer https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer Emad is this the final eval 🤓 🫴🦋 https://x.com/EMostaque/status/1946591753819312302 Gary Marcus https://x.com/GaryMarcus/status/1946615636203057413 AI wins IMO gold medal in 2025? https://polymarket.com/event/ai-wins-math-olympiad-in-2025 Measuring AI Ability to Complete Long Tasks https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ AlphaEvolve https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ AI achieves silver-medal standard solving International Mathematical Olympiad problems https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/ Detecting misbehavior in frontier reasoning models https://openai.com/index/chain-of-thought-monitoring/ ______________________________________________ My Links 🔗 ➡️ Twitter: https://x.com/WesRothMoney ➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe ______________________________________________ AI TOOLS: (these are tools I use and recommend, some of these are affiliate links) ElevenLa
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Wes Roth · Wes Roth · 0 of 60

← Previous Next →
1 Which Vanguard index fund to buy? (hint: it's the one Warren Buffett recommends)
Which Vanguard index fund to buy? (hint: it's the one Warren Buffett recommends)
Wes Roth
2 What does PALANTIR do - Palantir Stock, Founder, Controversy Explained Simply (plus why I'm BUYING).
What does PALANTIR do - Palantir Stock, Founder, Controversy Explained Simply (plus why I'm BUYING).
Wes Roth
3 Paypal misinformation fine ($2,500) - Close Your Accounts ASAP!
Paypal misinformation fine ($2,500) - Close Your Accounts ASAP!
Wes Roth
4 China Was Just Sent Back to the Dark Ages  |  US starts aggressively cutting ties
China Was Just Sent Back to the Dark Ages | US starts aggressively cutting ties
Wes Roth
5 ChatGPT Business Ideas - How I Use ChatGPT to make money
ChatGPT Business Ideas - How I Use ChatGPT to make money
Wes Roth
6 ChatGPT Explained - The AI revolution is happening right now... [ chat gpt ]
ChatGPT Explained - The AI revolution is happening right now... [ chat gpt ]
Wes Roth
7 ChatGPT Banned - New York blocking network access to ChatGPT
ChatGPT Banned - New York blocking network access to ChatGPT
Wes Roth
8 ChatGPT Trading - this [INSANE] tool A.I. built for me
ChatGPT Trading - this [INSANE] tool A.I. built for me
Wes Roth
9 Small Business Grants for ChatGPT and A.I. (similar to PPP and EIDL in 2023) |
Small Business Grants for ChatGPT and A.I. (similar to PPP and EIDL in 2023) |
Wes Roth
10 How to Make Passive Income with ChatGPT AI
How to Make Passive Income with ChatGPT AI
Wes Roth
11 OpenAI’s GPT-4 Artificial Intelligence = AGI? TRILLIONS of Parameters Plus THIS
OpenAI’s GPT-4 Artificial Intelligence = AGI? TRILLIONS of Parameters Plus THIS
Wes Roth
12 How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode
How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode
Wes Roth
13 John Carmack | AGI by 2030 | Will John Carmack's AI company be the one to make it?
John Carmack | AGI by 2030 | Will John Carmack's AI company be the one to make it?
Wes Roth
14 AI Small Business Grants
AI Small Business Grants
Wes Roth
15 Elon Musk attacks OpenAI - here's Sam Altman's response
Elon Musk attacks OpenAI - here's Sam Altman's response
Wes Roth
16 Bill Gates on ChatGPT and OpenAI "The Age of AI has begun"
Bill Gates on ChatGPT and OpenAI "The Age of AI has begun"
Wes Roth
17 Sparks of AGI | Microsoft Researchers claim GPT-4 Is showing "Artificial General Intelligence"
Sparks of AGI | Microsoft Researchers claim GPT-4 Is showing "Artificial General Intelligence"
Wes Roth
18 Elon Musk and Others Call for Pause on AI as GPT-4 shows signs of AGI.
Elon Musk and Others Call for Pause on AI as GPT-4 shows signs of AGI.
Wes Roth
19 Comparing GPT-4 and Google's Bard AI - Who is getting closer to AGI?
Comparing GPT-4 and Google's Bard AI - Who is getting closer to AGI?
Wes Roth
20 Sam Altman on UBI, OpenAI to $100 TRILLION and Massive Job Losses from AI Automation
Sam Altman on UBI, OpenAI to $100 TRILLION and Massive Job Losses from AI Automation
Wes Roth
21 25 ChatGPTs play a videogame...
25 ChatGPTs play a videogame...
Wes Roth
22 NVIDIA's new AI: Better Games, Art and... better life?
NVIDIA's new AI: Better Games, Art and... better life?
Wes Roth
23 Google AI Documents Leak about "Google and OpenAI"
Google AI Documents Leak about "Google and OpenAI"
Wes Roth
24 PaLM 2 vs GPT-4 | why Google is having a hard time catching up...
PaLM 2 vs GPT-4 | why Google is having a hard time catching up...
Wes Roth
25 How To Access ChatGPT Plugins | They are LIVE! (but hidden)
How To Access ChatGPT Plugins | They are LIVE! (but hidden)
Wes Roth
26 Sam Altman to Congress "America HAS to lead the world in AI"...
Sam Altman to Congress "America HAS to lead the world in AI"...
Wes Roth
27 Sam Altman Opening Statement to Congress on AI Regulation
Sam Altman Opening Statement to Congress on AI Regulation
Wes Roth
28 Sam Altman Congress Hearing "AI is the Biggest Threat to Human Race"
Sam Altman Congress Hearing "AI is the Biggest Threat to Human Race"
Wes Roth
29 Tree of Thoughts - GPT-4 Reasoning is Improved 900%
Tree of Thoughts - GPT-4 Reasoning is Improved 900%
Wes Roth
30 Governance of Superintelligence | OpenAI proposes measures for safe AI development.
Governance of Superintelligence | OpenAI proposes measures for safe AI development.
Wes Roth
31 Model Evaluation For Extreme Risks of AI | Google DeepMind and OpenAI Paper
Model Evaluation For Extreme Risks of AI | Google DeepMind and OpenAI Paper
Wes Roth
32 Minecraft AI - NVIDIA uses GPT-4 to create a SELF-IMPROVING 🤯 autonomous agent.
Minecraft AI - NVIDIA uses GPT-4 to create a SELF-IMPROVING 🤯 autonomous agent.
Wes Roth
33 AI Human Extinction Risk - Experts Warn of "Serious Risk"
AI Human Extinction Risk - Experts Warn of "Serious Risk"
Wes Roth
34 LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply
LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply
Wes Roth
35 99.3% of ChatGPT Performance with OpenSource AI - [QLoRA paper]
99.3% of ChatGPT Performance with OpenSource AI - [QLoRA paper]
Wes Roth
36 AlphaFold2 Explained | Google's DeepMind Solves Protein Folding
AlphaFold2 Explained | Google's DeepMind Solves Protein Folding
Wes Roth
37 Illumina AI - ChatGPT for your genome...
Illumina AI - ChatGPT for your genome...
Wes Roth
38 Text to Video Invasion! Runway AI releases GEN 2 text to video.
Text to Video Invasion! Runway AI releases GEN 2 text to video.
Wes Roth
39 LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.
LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.
Wes Roth
40 AlphaDev - DeepMind AI Discovers Better Algorithms for Foundational Computing
AlphaDev - DeepMind AI Discovers Better Algorithms for Foundational Computing
Wes Roth
41 OpenAI GPT-4 Function Calling: *HUGE* Potential
OpenAI GPT-4 Function Calling: *HUGE* Potential
Wes Roth
42 GPT-4 leaked! 🔥 All details exposed 🔥 It is over...
GPT-4 leaked! 🔥 All details exposed 🔥 It is over...
Wes Roth
43 Elon Musk announced XAI - the answer to OpenAI = X.AI
Elon Musk announced XAI - the answer to OpenAI = X.AI
Wes Roth
44 Andrej Karpathy GPT - Advice for building AI agents
Andrej Karpathy GPT - Advice for building AI agents
Wes Roth
45 TEST TO SEE IF AI CAN MAKE $1,000,000   (modern Turing test)
TEST TO SEE IF AI CAN MAKE $1,000,000 (modern Turing test)
Wes Roth
46 ChatGPT custom instructions are *POWERFUL*  Replace AutoGPT and BabyAGI?
ChatGPT custom instructions are *POWERFUL* Replace AutoGPT and BabyAGI?
Wes Roth
47 WORLDCOIN LAUNCH is starting! Backed by Sam Altman of OpenAI.
WORLDCOIN LAUNCH is starting! Backed by Sam Altman of OpenAI.
Wes Roth
48 WORLDCOIN ORB - I went to L.A. to get my eye scanned for WorldCoin [my experience]
WORLDCOIN ORB - I went to L.A. to get my eye scanned for WorldCoin [my experience]
Wes Roth
49 The Biggest Week of AI News In Months!
The Biggest Week of AI News In Months!
Wes Roth
50 Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots
Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots
Wes Roth
51 AI News is Getting *WEIRD* Human Brain Matter in Chips. OpenAI tutorial. Amazon unleashed it's AI.
AI News is Getting *WEIRD* Human Brain Matter in Chips. OpenAI tutorial. Amazon unleashed it's AI.
Wes Roth
52 GPT 5 release date 🔥 might be closer than we think | OpenAI applies for GPT-5 Trademark in the US.
GPT 5 release date 🔥 might be closer than we think | OpenAI applies for GPT-5 Trademark in the US.
Wes Roth
53 AI Agents Simulate a Town 🤯 Generative Agents: Interactive Simulacra of Human Behavior.
AI Agents Simulate a Town 🤯 Generative Agents: Interactive Simulacra of Human Behavior.
Wes Roth
54 Proof that AI Understands? 👀 Andrew Ng on LLMs building mental models, Othello GPT,  Geoffrey Hinton
Proof that AI Understands? 👀 Andrew Ng on LLMs building mental models, Othello GPT, Geoffrey Hinton
Wes Roth
55 OpenAI acquires Biomes 👀 an open-source MMORPG. ChatGPT plus Minecraft? 🔥
OpenAI acquires Biomes 👀 an open-source MMORPG. ChatGPT plus Minecraft? 🔥
Wes Roth
56 OpenAI announces FINETUNING 👀 for ChatGPT
OpenAI announces FINETUNING 👀 for ChatGPT
Wes Roth
57 Autonomous AI Agents - why YOU should be building them... and HOW.
Autonomous AI Agents - why YOU should be building them... and HOW.
Wes Roth
58 ChatGPT Enterprise - OpenAI launches the next BIG thing
ChatGPT Enterprise - OpenAI launches the next BIG thing
Wes Roth
59 HOODWINKED -  AI gets away with MURDER 👀 GPT-4 is an effective killer...
HOODWINKED - AI gets away with MURDER 👀 GPT-4 is an effective killer...
Wes Roth
60 Install Open Interpreter in 2 min | The free, open source CODE INTERPRETER!
Install Open Interpreter in 2 min | The free, open source CODE INTERPRETER!
Wes Roth

OpenAI's experimental LLM achieved a gold-medal score on the IMO 2025, demonstrating superhuman performance in math problem-solving. This breakthrough has significant implications for the development of general intelligence and AGI. By applying retrieval augmented generation and fine-tuning techniques, developers can build and optimize LLMs for specific tasks.

Key Takeaways
  1. Build a general-purpose LLM using retrieval augmented generation techniques
  2. Fine-tune the LLM for specific math tasks using reinforcement learning
  3. Optimize LLM performance using data center scheduling and reward hacking techniques
  4. Apply the LLM to multimodal tasks and integrate it with other AI models
  5. Evaluate and refine the LLM's performance using metrics such as accuracy and efficiency
💡 The development of superhuman LLMs has significant implications for the future of general intelligence and AGI, and requires careful consideration of the potential risks and benefits.

Related Reads

Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →