S* for AI CODE Generation: Plus 100%
Key Takeaways
The video discusses S*, a hybrid test-time scaling framework for improving the coverage and selection accuracy of LLM generated code, and demonstrates its application in code generation using tools like TTS, s-star, and Transformer architecture.
Full Transcript
hello Community today we want to increase the AI coding performance at least 100% we call it SAR so let's have a look you know you have your local computer I don't know you have a GPU let's say 16 GB of v and you want to use here this typical code llm so to help you to code and you want to do it locally on your machine imagine you would have here a Q and coder that has a normal performance here of about 20% and you want to say hey can I at least double here my performance and yeah if we have here a particular s star methodology you can right jump up here to the normal performance of a coda of a 14 billion free trainable parameter so from a 3B to a 14b model and you have almost the same performance and this this is really something because if you if you are limited here on your local compute this is the way to get a better better performance without going to the cloud so what we do we run here code 3B model with s star and we have a performance of a code 14b model now how do we achieve this this is easy you remember we have test time scaling TTS and now we applied for code generation and you remember we already open the video now and we have here a star and this is the first hybrid test time scaling framework to improve it accuracy of eii generated code but you know the principle are identical to what we had in mathematical reasoning because you remember test time scaling it was extensively and we discussed it multiple videos here to improve you the causal reasoning the mathematical reasoning and we had more or less three ideas now that we implemented the parallel sampling that increases your the solution coverage the evental refinement that improves your the individual sample to the resyncing and revising and my last focal point was here reward models that guide here the search process and the search space much more efficiently talking about one of my last mod we have here three insane TTS models that I showed you here even with a new Transformer architecture to have this for a language model in this video we discussed it here on Vision language model so anything in robotic or anything that you have with this computer use where you simply have or the AI has to understand your computer screen and click on what button this year we have a look at the latest reward Vision language model and it was called arm up and in detail here 48 minutes I explain you everything about this methodology and now comes the step now we move to the third area and this third area is code so what is here our spark of Genius now this is easy this is is here publication from UC Berkeley great team beautiful idea February 20th 2025 published here and they tell us here SAR test time scaling now for code generation AI code generation they tell us here this extends here the existing parallel scaling Paradigm with a sequential scaling to push you the performance boundaries of eii code generation so finally we have test time scaling also for our code optimization now here you have not a complete visualization of the Performing gain that you can achieve you see the higher you go and when we have model that are already inherent with the rising model like an o1 mini the gains are not that massive but you see for the Q Coda we had an 80% standard performance and then we got more than double the performance now if you really go to this rather huge unfortunately propietary model like 01 and 03 you see dat increase is not 25% but at least around 10% which is also impressive for a model that has such a high performance so you can apply this here for the non-classical reasoning model for the normal models that we have then for the R1 reasoning model for the qw Q32 B preview reasoning model for the o1 and O3 reing models and whatever so this is a methodology you can apply and this is just great so let's come to the core idea and the core idea is simple if you remember what we have done for the language mod and for the vision language mod for all the robotics now guess what we do more or less the same we have a two stage a two face approach in stage number one what we do we have here yeah a problem description let's have a look at this so give him a positive integer number represented as a string return the integer number without trailing zeros as a string so if you have to input this and we have here some trailing zeros just get rid of the trailing zeros and we have the output and we have here some public test that you see beautiful and now what s star does in stage one or phase one is just generates here parallel samples so as star enhances here the parallel samples to an iterative debugging beautiful so this is the beauty with code with a debugger we immediately see where is the mistake each sample is tested using your public test cases executed via an interpreter with outputs and the error messages used here to guide the next round of the sample generation so couldn't be easier with iterative debugging going here multiple rounds you define the max round that you want to invest your time Budget on great and then stage two phase two simple EST star select the best sample by prompting an LM to to generate inputs that differentiate a little bit between the pair samples and then leveraging it actual execution result to inform the llm to determine the optimal choice you see this is in code so easy because iterative debugging then we have our interpretor and we immediately know hey is it working yes or no do we achieve what we set out to do yes or no couldn't be easier now I give you a complete code here in a minute it's ready for you you can do it immediately yeah let's talk about the facts facts is here and I have here from the study just want to show you three Benchmark data you have a q and2 a 7B instruct plus here this new s star code Improvement this 7B now outperforms a 32b instruct model on life code bench by quite interesting amount or if you go with gbd4 Omni mini and as star this s PES here the o1 preview now mind on one preview nobody uses this anymore but also if you have open reasoning mod to achieve performance competitive to stateof thee art proprietary closed models I show you dpse R1 distill to q1 32b with s store comes close here to stateof the-art the openi 01 High model so you see the achievements are there and they're really impressive and especially if you want to act locally or even if you're in the cloud hey if you get a 10 a 20% boost in your actual in your accuracy performance for your code generation this is something you should use so this is it more or less let's have a look at the end just want to show you here some beautiful ration studies here and they looked hey if I look here never mind whatever model it is but if we just have here the sore generation and then the selection what what is the most important action here now I told you this is a two-stage process so from the first one you see we got a 6.7% Improvement and the second is 13 see interesting the second one is so much more interesting now and here r one is still 14b we have a plus 3% and plus 16% so you see this second one is really interesting so this adaptive input synthesis is what you could call the the core the performance core of s star what does it mean in detail now for each pair of samples here and llm is prompted to generate here distinguishing test inputs in this inputs and of course executed we in Python here where the outputs are further provided to ground to llm to select you the best samples so it's rather easy know this adaptive execution grounded approach with the code ensures here really a robust identification of the correct solution so you see this is interestingly here for code a really important step yeah as I told you we do have a GitHub page Nova sky sky sword beautiful remember this is the same team from UC Berkeley that give us here this specific Sky T1 this was similar here to the 01 model to the 01 preview model here you have the GitHub bio beautiful and as you see I'm really early because just one hour ago sky sword they updated here the code for the S star approach so you have everything available here if you go there and you saw okay what is Nova sky or Sky sour whatever different teams from UC berley I show you them in a minute here you have here the beautiful re-release as star so February 21st 2025 code paper simple extended with test time scaling framework for code generation it really works with quite a lot of model I would surprise if you find a model where it's not working at all your Pyon notebook jupyter notebook everything is there for the team I don't know if you notice you see berley C Computing lab real nice you have the all the people this is just the the first thirt of them so you have the faculty core faculty professors and Associated professors here and faculty and then yes beautiful you could scroll down two three times to see all the student and whatever so amazing team great code this is here if you have a look back where is it Apache 2 license so it is for you why not use this I just recommend if if you use this because go and upgrade your local code llm or maybe even then in short time some code Vision language model to have a more physical understanding of the physical surrounding of a robotic system but this would be the content of our next video If you like this kind of videos hey why not subscribe and I see you in my next video
Original Description
S*, the first hybrid test-time scaling framework that substantially improves the coverage and selection accuracy of LLM generated code. Also for deep reasoning models.
All rights w/ authors:
S*: Test Time Scaling for Code Generation
Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li,
Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica
University of California, @UCBerkeley
Work done w/ support from https://lambdalabs.com
#airesearch
#codegeneration
#aicoding
#berkeley
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Discover AI · Discover AI · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Step Into the Unknown (by YouChat) - May 2023 be your best year yet
Discover AI
Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!
Discover AI
Create a Smarter Future!
Discover AI
The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers
Discover AI
Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models
Discover AI
Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business
Discover AI
Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)
Discover AI
Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)
Discover AI
SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)
Discover AI
Discover Vision Transformer (ViT) Tech in 2023
Discover AI
Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)
Discover AI
Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI
Discover AI
BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM
Discover AI
Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts
Discover AI
From T5 to T5X: A Game-Changing Evolution with JAX & FLAX
Discover AI
How to start with ChatGPT? | Short Introduction to OpenAI API #shorts
Discover AI
The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor
Discover AI
Microsoft and ChatGPU
Discover AI
From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch
Discover AI
Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI
Discover AI
TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP
Discover AI
3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer
Discover AI
FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!
Discover AI
ChatGPT - Can it Lie to you?
Discover AI
ChatGPT Alternative: Perplexity by Perplexity.AI
Discover AI
2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2
Discover AI
Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING
Discover AI
BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!
Discover AI
Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A
Discover AI
Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!
Discover AI
After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?
Discover AI
Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT
Discover AI
Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct
Discover AI
The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE
Discover AI
New TECH: Vision Transformer 2023 on Image Classification | AI
Discover AI
PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech
Discover AI
New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!
Discover AI
New BING ChatGPT loses its mind
Discover AI
Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)
Discover AI
Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI
Discover AI
Microsoft strongly restricts access to ChatGPT on new BING - WHY?
Discover AI
PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)
Discover AI
New BING Chat AGGRESSIVE
Discover AI
Panoptic Image Segmentation: Mask2Former explained | Identify all objects!
Discover AI
Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial
Discover AI
Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course
Discover AI
Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide
Discover AI
Microsoft's CEO in Trouble #shorts
Discover AI
Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)
Discover AI
OpenAI's ChatGPT can NOW summarize external Sources on the Internet?
Discover AI
ChatGPT polarizes
Discover AI
Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed
Discover AI
ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial
Discover AI
Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)
Discover AI
ChatGPT: Multidimensional Prompts
Discover AI
ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples
Discover AI
Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM
Discover AI
Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?
Discover AI
Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)
Discover AI
Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?
Discover AI
More on: LLM Engineering
View skill →Related Reads
📰
📰
📰
📰
DeepSeek vs Qwen vs Kimi vs GLM: Which AI API Actually Wins in 2026?
Dev.to AI
Better Models: Worse Tools
Simon Willison's Blog
Una capa de prompts que se califica a sí misma por resultados, hace A/B testing de sus propias reescrituras, e intercambia al ganador casi sin despliegue
Dev.to · Franchesco Romero
LLM APIs as Infrastructure: Building Deterministic Systems Around Probabilistic AI
Dev.to · Akilah Littlejohn
🎓
Tutor Explanation
DeepCamp AI