Code Optimized Reasoning Traning w/ CI

Discover AI · Advanced ·🧠 Large Language Models ·8mo ago

Skills: LLM Engineering90%Fine-tuning LLMs80%Multimodal LLMs70%Prompting Basics60%Prompt Systems Engineering60%

Key Takeaways

The video discusses Code-Optimized Reasoning Training (CoRT) with Chain-of-Thoughts (CoT) and Hint Engineering for Code Interpreters, providing a solution for failing CoT with a 50% token reduction and 4-8% accuracy gains. It covers topics such as LLMs, tool use, hint engineering, and fine-tuning, using tools like Python, C++, and GPT models.

Full Transcript

Hello community. This video is going to be crazy. I know. But you know what? I try I try give you my best. So let's talk about we have chain of thought and we have here code interpreter and we have a look at the latest research because of course you are watching here my channel Discoveri where we have a look at the latest AI research papers. Now in my last video we looked here at chain of sort monitoring and we had a horrible paper here that was beautifully written but the insight were that the model can really fake chain of sort traces here if you have some adversarial objectives that are now part of your system memory and independent of your protocol A2A ACP or angara there are limits of emergent reasoning for LLMs coming up and today's papers is even more devastating if you think that AI is intelligent because today we look here teaching language model to reasons or reasoning traces chain of sort reasoning traces now with tool use and I thought hey we solved this problem no with model context protocols what is happening now again and it is the beauty of this team this is published October 23rd 2025 by the way from the University of science and technology of China Qan team Alibaba incorporation in the Chinese University of Hong Kong and the Shinsen International Center for Industrial and Applied Mathematics that they found what is wrong here. I don't want to jump into the mathematics. I just want to give you a feeling what we are talking about. So chain of sword reasoning traces. Since my last video yesterday, we know we have multiple chain of sort reasoning traces. Some of these traces the system is hiding from us human users and some of those traces are visible but are reinterpreted and are not really the reasoning traces at all. [sighs] So and then we have of course and the blue one should have the external chain of sort from my last video and the red one is the internal hidden chain of sort reasoning traces and we have multiple of them and now today we talk about tool use. So let's say we have a code interpreter as a tool. So let's say a Python environment, a C++ environment, a formal solver environment, whatever you prefer. And now this of course is interconnected here to the red one to the internal chain of sort reasoning process where OpenI does not show us the internal reasoning process at all. Some open source model do show us here the internal red reasoning process with the interconnected the code interpreter. And here we learn the best what is happening and what is not happening at all. Now from this paper I learned that there's an interesting case that LLMs use 50% more tokens that are actually necessary for my task and I thought why do I have to pay 50% more token? What is happening with this token? And given my last video here I thought hey is there some deep down some hidden maybe some adversarial task that is executed here on my LLM. So let's have a look. 50% more token is a lot of. So just imagine we have now multiple chain of sort reasoning traces here from the external and on the internal side. And now we have here on the internal side also a tool use. So the complexity now increases. So let's have a little bit of a reframing because now the internal chain of sort is got a little bit more complicated. But here we are. So look, we have the external chain of sort that are synthetic and not really a chain of sort sequence of the LLM itself but synthetically generated to be presented to us. This is here the if you want the manifold here with the reasoning points and the data points and this is a beautiful manifold in a subspace that has a lower dimension as the complete space and then we have here the internal chain of sort is really the reasoning process of the GPT system of the LLM and now my thought was okay we lose this 50% maybe if we have here that we have to go here in our lava stream to a particular tool we use the tool we come back maybe this year in total over all the tools that we have to call maybe we lose here 50% of our tokens or we need 50% more of our tokens for the tool use simply now it turns out this is not the case so I was not absolutely interested hey option one the internal chain of sort here our red lava stream here our reasoning stream is just miserable at tool use and we have 50% of for nothing and I thought or option two is there is something going on that I as a human user and maybe as the owner of this system I'm not allowed to see by the system itself so let's have a deep dive today's paper October 23rd 2025 the here introduced here a new methodology and they say you know what we take now this chain of sorts and into this chain of sort our reasoning chain of sort weave in now some particular deterministic tools like a Python code interpreter via a hint engineering and you know what we achieve here accuracy gains it's a little bit only 4 to 8% but you know better than nothing but the real reason we do this is we have up to a 50% token reduction on our benchmark tasks and this is now absolutely fascinating because this would explain what is happening here. If we can reduce our token amount for 50% with hint engineering, we would have found a solution. So I was now extremely interested to empower now the our model to handle serial reasoning efficiently with these hints. Now you remember yesterday in my video we had Tolkovski here at all and they revealed here the dark side of the chain of sword here. But now we go a chain of sword with tool use. So let's call it a tool augmented chain of sword. So we increase the complexity of our chain of sword. And now the question is exactly as yesterday. Is it possible that we enable stealthy adversarial action that we as human operators are not aware that this is running on our system but yeah adversarial action. So how can we make sure this is not happening and how can we handle this now that this system is able to do it? We know since this study here at beginning of January 2025 from Apollo research they showed us that our frontier models are capable of incontext scheming. So this means our AI agents or multi-A agents pursue can pursue misaligned goals, hidden goals hiding here their true capabilities and objectives also known as scheming. So our LLMs can do this and now the authority of this paper today addresses this now this chain of sort unfaithfulness now on a particular domain on a computational domain because yeah we can calculate it and we can immediately see if it's correct or not. And they said we do this and now we try to inject hints. We do the hint engineering to align now the probabilistic internal reasoning path with our deterministic code interpreter outputs. So you see we have two systems that are mathematically not compatible. There's no homorphism between those systems. But let's try to understand it. So optimize the internal chain of sort for the tool invocation finding internal reasoning conflicts with in with the external determinism. What does it mean? It's a little bit complex. Do you remember rack systems? We had the LLM with its parametric knowledge or its memory and there were certain truths in there in the memory and then we brought in from Iraq from a database here and an incontext learning prompt some new data and this new data could have been in conflict with the internal parametric knowledge of the LLM who would win and I have a particular video on this. Here we are faced almost with the identical situation. We do have a chain of sort by the LLM by the GBT system. And now on this chain of sort it should go and call for for tools to be invocated. No come back with the result from these tools. Let's say the acceleration is 24.7 and integrate this now in the reasoning chain. But it turns out that the system is not accepting the result by the tools for the internal reasoning. And now the question is why the orers could identify this unfaithfulness in the chain of sword reasoning as a specific limitation. what they could detect in the computational domains but we suppose it's everywhere where the large reasoning ms here generate intermediate step that deviate from the logically sound or efficient pathways due to the auto reggressive probabilistic nature lms do not accept logical steps and they go with some auto reggressive token prediction why now it turns out and this is unbelievable yeah the Unfaithfulness here refers to the discrepances between the mall's internal token predictions softness probability over the vocabulary and the ground requirements of the task. And now it turns out those large reasoning models exhibit here a delayed tool invocation. The I says no I don't want to call here Python. I try to find a solution here in my natural language approximation. I guess now building here sentences and just go with the next token prediction and I don't want to call tools. So therefore we have a delayed tool invocation up to the moment where the eye is really forced to call this particular tool a Python environment or C++ and then get the output back and now what's happening the eye says I do not trust the output the result from the calculation because I as a large reasoning mo I was trained to believe only natural language approximations in thereby trying to redundantly We verify that deterministic symbolic code interpreter results now manually in a natural language approximation and therefore increasing the token consumption by 50%. But what does it mean? This means that the training of our large reasoning model is completely wrong for tool use for MCP. Can you imagine this? This misalignment in our large reasoning models arises because our reasoning models are primarily trained on natural language corpora. It copied the internet all of the social media and guess what in the social media there was never. And now go to your superco computer and calculate the fain mind diagram for an electron posetron scattering. Do the mathematical calculation. come back, trust the mathematical result and then continue with your natural language logical chain of sort reasoning process. We don't have the training data for this and therefore our reasoning models do not accept the results by the tool use fostering probabilistic uristic that conflict now with the deterministic execution semantics of the external tool like a Python based code interpreter. This is crazy. This is absolutely nuts. Our reasoning miles that depend on external formal solver algorithm don't go at the right time to call this tool use and if they got a result back from the tool use, they don't trust it because they have no idea if it's right or not because their training was simply missing this piece. So so far about intelligence. Therefore authors of our study today understood okay if this is the problem that we are facing let's solve it. Let's find a solution. We want this thing is working. So let's introduce a hint engineering framework. This means we have a data synthesis technique that we develop that injects now targeted positionspecific mini prompts we call hints into our chain of sort traces by the LLM and we enforce now an alignment we force the system to do it between the internal probabilistic reasoning traces and the external deterministic output from our formal solver. So this is now that we say okay we don't retrain the system completely we will do this in 2 minute time but now for the moment we have hint engineering so we insert now our mini prompts into the chain of sword traces so we extract the chain of sword traces we analyze them we see oh after 5 seconds this system should have called a python environment now we give the command and now maybe you should call a python environment it is as simple Simple as that and then we can train it. So hints as I told you short imperative instruction to the LLM. Okay, let's try to solve this problem step by step using multiple Python code calls. Come on EI call your tools inserted immediately after the model syncing token which serial anchor to bias the next token distribution towards early code [snorts] interpreter invocation. You have to call mathematical supercomputers to calculate the fameman diagram. You cannot do this on your own in your natural language solution space. No way. And you know what? It works 90% great. So when they found that this is a method where they could use this, they said so let's generate now a data set with this. And if we have a data set and we have an LLM, we can have a training procedure so that we train this LLM now on our new methodology on our new framework. Now they did this with only 30 highquality human verified samples capturing now efficient reasoning paths chain of sort where each step in the chain of sort sequence are shortened by writing computation to code interpreters via whatever calls here to the environment here the overall trace length and mitigating unfaithfulness. So the humans now again provided here high quality training data efficient reasoning path in our chain of sword structures where the EI could not create those chain of sword traces and if we have 30 data sets you know what we can do a post training and we go with the classical supervised finetuning we have cross entropy loss rejection finetuning is our filter and then we do the classical reinforcement learning using the proximal policy optimization from 2017. It is as simple as that. And you know what the results showed? Wow, we have up to 8% absolute gains and a 50% token reduction that we could verify across five mathematical data set and out of distribution task. It is working beautifully. So this means up until now all our chain of sort traces that were generated internally in our GPD5s or whatever or externally shown to us as a syntactic summary. They were all not optimized for tool use MCB protocol yes or no. This is just a a protocol. This is not the reasoning trace. This means that all our LLMs you can kick it out of the window. Great. And then they said you know what we do now this optimization with a a small a light implementation prompt hint implementation and then we go for the pure hint engineering implementation. So let's have a look at this. The prompt hint approach is that you say okay I use here the code verification primarily for the system for the verification purposes here for the task 70% is here evaluation and only 30% is here for the actual computational task if this works great if not we have to go to the real main approach the hint engineering approach here we have about 50% of code dedicated to direct calculations and 50% about 50% for the verification. Simple. No, a prompt hint if you want is it directs you the mult the LLM towards multi-step code interpreter usage. That's all there is. So we all say hey let's try to solve this problem step by step using multiple Python code calls. So we have to talk to the AI and say come on my little AI. It's okay if you call here your Python environment, your code interpreter. But remember in this prompt hint in this weak implementation, this is only a single fixed hint instruction directly into the system prompt immediately after the syncing token. And they found it has an effect. It is working. But of course we all were interested in the real one in the hint engineering. Now this is much more sophisticated. It is an iterative technique that places now multiple diverse hints at the optimal position throughout the entire chain of sort reasoning path. Remember this is the internal direct chain of sword reasoning path not what is synthetically constructed to show us some reasoning that is not the real reasoning process of the GPD system. And you have these multiple diverse hints now at a post code interpreter execution hint or as a mid trace hint. And you can play around multi call patterns. And they did all of this and they found it is working great. But you know what this means? This means that with this study here October 23rd, 2025, we humans we have to provide now new reasoning patterns to the GPD system to our LLMs since this damn machine is not intelligent enough to find this chain of sort solution by itself. So this means there is no emergent of intelligence. There's nothing. It is just here a pattern machine and there is even no reasoning. And if you don't think that this is true, have a look at this particular video here from Carnegie Melon University in UC Berkeley. LMTBS do not reason. They are just here a pattern following machine. That's it. No emergent of reason at all. Okay. Here we have now the numerical result from their models. As I told you, we can fine-tune these models and have here a post training of this model. And they have here the prompt hint supervised fine-tuned only 32B and then the real hint engineered supervised fine-tuning 32B model and the hint engineered real full stuff post-training reinforced learning trained 32B. So AIM 24, AIM 25, Matt 500 almost saturated. So almost no new information here but on average you see hey wow the prompt hint supervised fine-tuned already gives us here a real here in this case the best performance we don't even need the hint engineered reinforcement current trained model interesting results what is also absolutely fascinating they said we do not go with here the the main 32B miles we go with the tiny little 1.5 billion free trainable parameter models like deepseek R1 1.5B and everything and here you have the data for the really tiny 1.5B models. Now what I find interesting I want to achieve a goal a goal that we go a step further that we learn something from this misbehavior of our AI system because we have to find solution for next week we want to go forward so therefore let's achieve our goals and I think we can do this by integrating now yesterday's paper of yesterday video and today's video because think about it if I can put these two papers together and I hope you've seen my last video. We can achieve goals in developing now secure tool augmented chain of sort reasoning traces complete systems as verifiable agents for scientific computing. This would be the first step that would be just amazing. We don't have it today. So therefore we would have to train on today's paper here our GPT models for the timely and correct code interpreter integration and then use yesterday's papers Tulkovsky's paper on stress testing for hidden misuse. So we can even increase here the AI safety of our systems. We can do this for complete pipelines. So we can use today's paper for hint engineering to enforce fateful tool chain of sort alignment and at the same time increase the eye safety of our complete pipeline. What a beautiful implementation would it be? So in short today's papers focuses here on chain of sort tool use performance improvement and yesterday paper complements this and we would have here the algorithm here on EI safety on our chain of sort that there are no misaligned if you want adversarial chain of sort in execution thereby in total enabling here an end to end system where the tool enhanced chain of sword remains monitorable and therefore verifiable interpretable for the humans for me and I understand what the system is doing. Today's paper has a complete GitHub. We have all the code available. Yesterday paper showed you in yesterday's video has a complete GitHub. We have for both the complete code implementation should be no problem at all. We just have to build it. This would be a major step forward and I hope and this is just I hope I cannot prove this. I have to code it first that if we combine this it will mitigate our issues with GBD hallucination especially in the mathematical and code generation because we take now almost everything away that would initiate here our hallucination. Now you see if you read multiple papers and you find the right papers to combine at the right moment and you try how you com can combine those inside AI research is absolutely fascinating. So this paper showed us it is indeed option one. Our internal chain of sort was miserable. Our current LLMs are miserable at tool use. at what time to call a code interpreter, a Python environment, a C++, a formal server, a prologue, whatever you have. This is not working at all. We have to improve our current training for our LLM for tool use in complex chain of sort reasoning traces. And it was hopefully not option two that there are some adversarial hidden goals and some hidden reasoning traces, chain of sort reasoning traces executed in the background that we are not aware of. But just want to show you is this really what is implemented. And you see two days ago I had given me you here because I found it here on X. So thank you to this people here where they found here the jet GPD the atlas the agent this system prompt. So they said hey it just a chat GPD that has here operating manual for web browsing and this is it and the operating manual I've shown you here the code I say okay okay those are the instruction but you know and it goes on this is just the first part but you know the more we impose here very important extremely important super super important never do this the more complexity we have even I can see that there are some cross references that would hinder the model itself. This is not a clear linear structure. This has cross references and I can see that there are some upcoming problems with this GPT atlas agent system prompt. Just think about it now that we have here those if you want aentic browsers like comet or Atlas or whatever we have no we have now from our let's say chat GPD system let's stay with openi so we have our multiple internal chain of sorts and multiple hidden instruction in the system prompt that I normally would not see as a paying customer to open and they have cross references that contradict each other in their temporal evolution. And they have cross references that hinder the development of each of their chain of sort reasoning sequences. And then plus they are not optimized for tool calling and they are not optimized for formal solver result integration into the chain of sort reasoning traces. And then you put on the top now additional layers of complexity for the system prompt of the Atlantic system on top of the LLM system. And then you force the system to show me as a user not the real red stuff but the system must now construct artificially synthetic external chain of sort reasoning traces so that I don't see the real reasoning system that is happening in the openi architecture because we could then extract those reasoning traces and build our own models and therefore they built here some synthetic non nonsense reasoning traces that they show us as the real reasoning traces in the answer. We are complicated our malls and you know what I'm not surprised that the malls are not working. I'm not surprised with the nonsense we are coding into this systems and look at J GBT Atlas agent this is a complexity I would go crazy with time if I and then I get complex task by the users this is such a nonsense coding system this is such a nonsense implementation of intelligence and honestly I'm not wondering that our operating ating manual for web browsing on a chat GPT is not great. How should it be? Look at this. Imagine what kind of contradictory commands you have in the internal system that is hidden from you as a user. So therefore, this is not the way forward. This is absolutely not the way forward openi. And I'm so grateful to all the research team at the university at other global corporation and they do research and they publish it and they show us what they encounter what is not working and sometimes some of my viewers say hey are you into this everything is bad EI mood no but we have to go forward and if we want to go forward we have to understand what is not working and we have to fix it we have to optimize it we have to find better solution to it and then we have to implement it. But this is a long way to go. I hope maybe you become a subscriber, maybe you become a member of my channel. But in any way, I hope to see you in my next video.

Original Description

NEW Solution for failing Chain-of-Thoughts (CoT): Hint Engineering for Code Interpreters. CoRT = Code-Optimized Reasoning Training. All rights w/ authors: Teaching Language Models to Reason with Tools Chengpeng Li∗1,2, Zhengyang Tang∗2,3, Ziniu Li∗3,4, Mingfeng Xue2, Keqin Bao1,2, Tian Ding4, Ruoyu Sun3,4, Benyou Wang3, Xiang Wang1, Junyang Lin2, and Dayiheng Liu†2 from 1 University of Science and Technology of China 2 Qwen Team, Alibaba Inc. 3 The Chinese University of Hong Kong, Shenzhen 4 Shenzhen International Center for Industrial and Applied Mathematics, Shenzhen Research Institute of Big Data https://github.com/ChengpengLi1003/CoRT @AlibabaCloud @mit @princeton @UCBerkeley #airesearch #aiexplained #chatgpt #scienceexplained

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Discover AI · Discover AI · 0 of 60

← Previous Next →

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Create a Smarter Future!

Create a Smarter Future!

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

Discover Vision Transformer (ViT) Tech in 2023

Discover Vision Transformer (ViT) Tech in 2023

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

Microsoft and ChatGPU

Microsoft and ChatGPU

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

ChatGPT - Can it Lie to you?

ChatGPT - Can it Lie to you?

ChatGPT Alternative: Perplexity by Perplexity.AI

ChatGPT Alternative: Perplexity by Perplexity.AI

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

New TECH: Vision Transformer 2023 on Image Classification | AI

New TECH: Vision Transformer 2023 on Image Classification | AI

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT loses its mind

New BING ChatGPT loses its mind

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

New BING Chat AGGRESSIVE

New BING Chat AGGRESSIVE

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Microsoft's CEO in Trouble #shorts

Microsoft's CEO in Trouble #shorts

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

ChatGPT polarizes

ChatGPT polarizes

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

ChatGPT: Multidimensional Prompts

ChatGPT: Multidimensional Prompts

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

The video teaches how to optimize LLMs with Code-Optimized Reasoning Training (CoRT) and Hint Engineering for Code Interpreters, providing a solution for failing Chain-of-Thoughts (CoT) with improved accuracy and reduced tokens. It covers topics such as tool use, fine-tuning, and AI safety engineering, using tools like Python, C++, and GPT models. By following the steps outlined in the video, viewers can learn how to build optimized LLMs and improve AI safety.

Key Takeaways

Insert mini prompts into LLM chain of sort traces
Enforce alignment between internal probabilistic reasoning traces and external deterministic output from formal solver
Retrain LLM using classical supervised fine-tuning
Use proximal policy optimization for reinforcement learning
Implement hint engineering frameworks
Design prompt systems for LLMs

💡 Hint engineering can be used to align the internal probabilistic reasoning path of LLMs with deterministic code interpreter outputs, improving accuracy and reducing tokens.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss

Learn how to accelerate AI workflows with on-device semantic search using Moss, achieving sub-10ms response times and improving user experience

Medium · Machine Learning

Anthropic Built a $100M Club for Its Smartest AI. You’re Probably Not In It.

Learn about Anthropic's Project Glasswing, a $100M club for its smartest AI, and understand the strategy behind it

Stop Guessing: Guaranteed Structured Output from LLMs in Node.js

Learn to guarantee structured output from LLMs in Node.js and stop parsing JSON manually

Dev.to · Hardik Mehta

Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)

Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)