Program Aided Language Models

Data Skeptic · Beginner ·📄 Research Papers Explained ·2y ago

Skills: LLM Foundations90%Prompt Craft80%Fine-tuning LLMs70%LLM Engineering60%Prompting Basics50%

Key Takeaways

The video discusses Program Aided Language Models, a technique that uses code as a representation of solutions to improve the performance of large language models on complex tasks, with guests Aman Madaan and Shuyan Zhou sharing their research on PAL: Program-aided Language Models.

Full Transcript

welcome to data skeptic machine intelligence our podcast series exploring contemporary topics in artificial general intelligence and large language models we've had a good little run of episodes lately asking questions about how chat GPT and other large language models perform in other domains we've heard they do well at coding they do well when describing graphs and networks and today we're going to find out about their ability to do math problems now something you might know and it's easy to go and check for yourself the current state of large language models are surprisingly bad at mathematics now I know it's not going to be like that forever give a few years and I'm sure some kind of way llms will also be reliable calculators and today's interview might just give us a hint as to how those goals can get accomplished I'm am I'm a PhD student at language Technologies Institute at CMU I'm in my final year and I do a lot of large language model stuff uh reasoning with large language models code generation things like that hi I'm Shuan I'm also a final year pH student at the language Technologies Institute at carnegi Mana University I work on building agents that can help us to solve complex task by a simple natural language command well NLP has certainly changed a lot even in the time you've both been studying can you share a few thoughts on the journey I always wanted to do language generation and that's language generation and Common Sense reasoning so that's basically what I've been doing but like you point out things completely changed after large language models you know I I have a very concrete example so my first paper in my PhD was on politeness transfer so you're given a sentence and you want to make it polite I mean today you there's no need to do anything you go to chat GPD and just ask to make it polite but back then you know it was nont we had like couple of models we had a special data set we had a new modeling Paradigm we called tag and generate it was a paper it made a bunch of made made a lot of news and I think clearly a lot of that is not needed anymore right so it's kind of super cool to see how doing even very difficult things is is now very easy but again I think to be a bit more cautious and pragmatic I don't think we are at a point where you know we can say NLP is soled or reasoning is soled I think there's U many many many challenges uh which become apparent as you start playing with any non-trivial data set so it has definitely evolved and it's been awesome to see the change but I think lots of challenges still yeah I guess for me the most impressive thing is like we start to testing all more and more realistic uh scenarios so when I start working on code generation we start with just generating like a one liner which which is just have one L of code and it was challenging back to then we need a lot of like syntax aware encoding to make sure the grammar is correct we have to augment a lot of data and Trend like a small scale model but right now a lot of techniques are simplified but the outcome is even better now we cannot only do just simple like single uh sing uh oneliner code code generation we can do uh function level class level even Repository level uh right now so I think basically um the application is broader and the actually the underlying technique is simpler so that is the most fantastic thing for me I've had some good success myself generating code uh using an llm but uh maybe it's the limit of my imagination but I only seem to get good things where I ask like maybe 30 lines like here's this function or can you improve what I already have do you think uh we're hitting ceilings like that or can it uh really disguise the limit here oh yeah so when I say uh we can generate a repository level code I just mean we are starting to get into that point getting more focused on that specific scenario with like more and more evaluation data set but of course performance is not there yet but I don't feel like we have King a seting there for a couple of reasons I guess maybe um for example we have like right now we have longer contacts support decoder models and we can do a lot of retrieval I think maybe right now a lot of pre-training are still not that like repository aware but like if we can also incorporate those to our pre-training and like instruction F tuning I think we can get even better performance it's just a starting point I I agree with Shan and I think it's also a function of how common or how popular the thing is that you're trying to do we have a paper called self- refine for which I build a demo in react and I don't know any react but you know basically by prompting gbd4 I was able to sort of get it to do a decent demo where you enter a string and you know it generates a nice graph and things like that so I think those use cases are maybe common so if you're doing something that lots of people have done and have discussed on stack Overflow maybe it can also do like you know longest code generation but if you're doing something very specific I think that's more of a problem I also have a similar experience I did like a flask app I I I know python but I don't use flask a lot um but I was able to do a little bit more interaction with the GT4 and was able to set up my application app and my friend told me in my take him one day to work on that project but it was basically Tak me like three took me three hours to do that despite I need a little bit uh knowledge on how to do debugging how to uh select the important part of the error message and then send back to the model well for as good is these language models are at code they can be embarrassingly bad at some things like basic arithmetic sometimes for listeners who haven't posed a lot of these problems to models could you talk a little bit about some of the limits you've seen in that domain the most common example that anyone can try right away is to take two three-digit numbers and try multiplying them with gbd4 there's a good chance that it'll fail problems like that the models do struggle at even models like gbd4 that's kind of interesting and I think you can make argument both ways so we we have a paper with some of our co-authors on how do you fix these problems but I guess we can come to it later there's a very surprising failure mode in multiplication but also in like adding very large numbers in some sense that's obvious because you know it's like you don't have exam many examples of all kinds of multiplications and at the end of the day the models are doing some sort of learning from the Corpus so if you if you have not shown it enough examples like why do we expect it to pick it up but then you know it can do all these crazy things so I think the answer is bit more nuanced would you be able to describe one of the questions um if you have one handy you could read it out loud or just make one up so we get a a sense of what the challenge is what are some of these problems okay yeah I have some we have some here Johan decides to take up an odd hobby of speed talking his normal speaking speed is 150 words per minute after training his speed is 2.5 times faster than his starting speed how long would it take him to speak 10 pages if each page has 450 words per page I need a pencil but I can solve it yeah yeah yeah yeah but I think like the math data set that I was talking about it has I also got hold of the paper and the numbers there are 50% Sol rate if you use text and 70% Sol rate if you use pal that's a crazy jump I think a very optimistic person might say uh we should just make the large language model bigger double the parameters and somewhere along the way it's going to learn arithmetic too your approach is different uh can you describe what program aided language models are yeah so I think the idea is basically when we are solving a problem or something like a math problem there are two things there's the first thing is what has to be done and the second thing is how to do it for example if the question is Shuan and Aman have two apples each how many apples do they have then knowing what has to be done is that you have to take the app the number of apples that Amon and Shuan have which is two apples each and then calculate 2 plus two and then the second part is actually doing the calculation what we observe in programming language models is that you don't really have to force the language model to do everything you can only make it generate the plan on how to solve the problem you know because these models are trained on tons of python code python is also very natural maybe you know let's generate a Python program to solve the answer and then not bother the language model with doing the calculation but send the program to a python runtime to get the answer so this kind of division of uh you know what has to be done and how to do it and our way of doing it is with a program and it it works pretty well so that's the idea so I could load up chat GPT or some other large language model I could pose a question to it and then I could add to my prompt instead of answering this question directly please give me a Python program that will get me the answer is that basically the approach or is there more engineering to it yeah I think it's basically what they are talking about and right now they even have code interpreter and for example for bar they have implicit code execution then uh you don't even have to say you need a Python program for that maybe they have some internal Technologies to route to a specific generating of code well then as a technique you want to have some measure of success what can you uh use as a baseline to decide if your approach is doing a good job or not we basically compare with different prompting mechanisms like you can directly ask a model to generate answer without the intermediate reasoning steps and also we can use Chain of Thought which eliberate uh what are the intermediate reasoning before it get the answer so we compare with both of them and in many data sets in big bench heart and like others and we found pal is better on those well it could be that the large language model would make coding mistakes you know reference the wrong variable have the wrong constant or something why is it that moving from language to this intermediary somehow allows you to get correct answers yeah I think so there are two kinds of mistakes one is syntactic and one is semantic so syntactic are you know maybe the python syntax is wrong and we see that that happens almost never so it's like less than 1% of the times on gsm 8K that that happens so that's taken care of because we have a really good language model of code the second problem that could happen is the more serious problem which is semantic issues and they do happen you know the success rate is still not 100% but I think the reason why it's much better than doing text only is there's two reasons one is that we do absolutely no calculation so all of those errors go away so there's no addition multiplication taking log or anything that's all routed to the python interpreter so that would be the biggest source of reduction of Errors right so even if the model does not know what's 2+ 2 that's completely fine as long as it can say return a plus b or return 2+ 2 and then we call up a runtime and a second thing we noticed is that at least for the problems we consider in pal it's actually very natural to represent the solution as code so you know the first step is create an init block where you initialize all the variables the next step is create a nice flow so you know what's the first step what's the second step what's the third step and I think that's kind of also a very natural representation for solving these problems not having to do arithmetic calculations is the biggest source of yeah GES yeah I just want to add something about the second Point as well you're right like models can still make mistakes on code but I want to say like code is kind of also it can preserve the richness of the natural language by having uh meaningful variable n and the comments so in that sense we can see there's probably a smoother transl between the natural language and the code thanks to our newest sponsor math. fashion yes that is a real URL visit math. fashion and check out their line of mathematics themed t-shirts they sent me the Klein bottle shirt which is pretty stylish I've gotten a few compliments about it and each one is the opportunity to start a long conversation about two-dimensional manifolds get more math in your life specifically on your body visit math. fashion well you'd mention less than 1% of them have a syntactic error I think that makes it a better programmer than I am but what percentage of the generated programs give it the correct answer I think the accuracy rates on gsm 8K with chat GPT are about 7879 yes something so 78 to 79% of the time it's like the the answer is right and could you frame that for us like is that an impressive score is is that probably state-of-the-art and you know what a human score how do we interpret that number I think if we use if we use Chain of Thought prompting with text only I think you would get about 73% or something2 72% with um text only so that's the G over a text based reasoning or Chain of Thought reasoning with the same underlying model and these are all middle school math problem so you can expect a reasonable human to solve all of them I mean they actually the funny thing is that because this Benchmark is so popular or I don't know why but gbd4 was actually fine-tuned on this so they sort of looked at this Benchmark and so gbd4 gets 94% with pal on the same same Benchmark and coot or Chain of Thought gets 92% there's some there's some gains there as well yeah but it's already contaminated yeah because they're trained yeah yeah so yeah and I think you mentioned these are middle school math problems do you have any thoughts on how it would work if you push it toward high school and college level mathematics yeah I think there's a recent paper actually there's another data set called math it's literally called math and that has uh you know problems which are fancier more advanced High School problems they include geometry calculus probability and there's a paper the name skipping me but I can share later they use pal and this idea of self debug to push that Benchmark to 87% or something in late 8 s and it's the state-ofthe-art and I think the the model even fine tune models are maybe in the 50s or 60s so I think the the the general technique is much more efficient even for like more difficult problems yeah so I'm going to ask you kind of an unfair question but if you kind of pretend to be futurists or maybe you consider yourselves futurists but how is this going to affect uh the future of math and like the way kids learn and how we uh get answers to problems I think one thing is like uh if we can use program as intermediate step to generate a better Solutions which is actually more accurate in any other techniques and we can do some back translation from the programs to some natural language and expand those to uh the kids so that they can understand how does it work mostly I think the again here will be we have a better more accurate solution then that part uh can help the education but like we still need some kind of translation back to natural language so that everyone the case can still understand and and I think this can probably be kind of applicable to other um like Majors or classes as well like maybe chemistry or even physics or something like that yeah I I think in general it's the idea that you can use these models to write good code is going to change a lot of things like if you take any of the basic CS classes right all of the assignments I mean I'm pretty sure gbd4 can do a large chunk maybe not not just because it's so good but because these assignments have been online for a long time and you know they test Concepts which are very frequent so I think maybe it's like a broader theme but the whole the whole idea of how do you test people's programming abilities has to be we have to rethink it because if you give someone a test a take-home test if they have access to gbd4 you know it's like I don't think you can really test anyone because it's going to probably do it right so if it's easy yet that person will have access to gp4 in their uh professional life it's sort of like having a calculator for the test or not yeah but I guess there's some there's some value in learning how to perform addition subtraction before you get access to a calculator right good point it's a very subjective opinion I I don't know but you know I hope that it's helpful because we learned we did our under undergraduates in the pre- chat gbt era so you know so I think we had to like do all of that by hand yeah yeah in the uh Benchmark problems the mathematics are kind of at face value I mean there could be a red herring number and things like that but usually it's there are there's very little red herrings right just kind of the the details are there it's not trying to trick anybody or distract anybody do you think that uh you'll be able to generalize to a more natural language like um a little bit too talkative been sharing a lot of natural details will the system be able to tell the difference and find the mathematics if they're not at face value there's a paper actually called large language models can be distracted by irrelevant context and what they do is they insert random statements in the questions in these benchmarks and the performance actually drops I think it's a challenge but then I saw a recent paper where all the authors did were were they added a statement at the beginning that says ignore any irrelevant statement and the performance sort of improves so who knows right I mean it's yeah yeah and I think we can do like if we know there will be some destructing contexts we can do a two-stage process right in the first stage we say okay I have have the language model to like filter out irrelevant information then do the uh problem solving I'm exactly yeah I'm pretty confident the model can just remove all the irrelevant contacts if you are not doing that everything on single pass yeah yeah yeah these things are pretty good at language stuff yeah I think so yeah there are a lot of sometimes funny examples like that one ignore the irrelevant details and that's all it took but uh it you know for the rigor of computer scientist that does feel a bit Hocus Pocus how much time do you spend on the tweaking of the prompts to be Just So Stories so I think at least for the pound paper we didn't spend a lot of time on tweaking uh actual prompts we basically did the translation between the train of thought prompt to our program style uh reasoning so basically we didn't tweak a specific to tweak to specific more program favorable reasoning but we basically just copy uh the train of thought to our scenario I think the kind of magic here is we sometimes need to inter living between the uh comments and the code and the other is we have to have uh meaningful variables so and you cannot say like a equal to two while the a is actually like number of apples I think in general it depends I think sometimes the paper is just a new technique so for kokoen which was like eight and a half year and a half ago now basically had this idea and in the paper we try a bunch of prompts and we want to show it it generally works for pallet was a direct translation I think in general at least the works we do uh not much time in like carefully crafting The Prompt like the we we try to work on things where the first prompt sort of works and if it does not then you know maybe it's kind of boring to just yeah and funny thing I think is for Coco Jen you actually try different variations actually it doesn't U provide significant improvements or something exactly it's just like the format of programming is important than the actual how do you like tweak that specific prompt and I think there's always an argument actually that yeah if you have a if you have a better prompt the performance could be better and I think that's true but there's also some recent work that does automatic prompt Discovery so you know you can Outsource that work to the language model so you can say hey this is what I care about and this is my first prompt and the performance is 20% so the model can propose something else and say okay this is 21% so I'm in the right direction let me keep writing in this direction and after 50 iterations you get a better prompt I think in a few maybe we'll have some techniques that are that can do that kind of stuff yeah and is there any significance to selecting python as the language to use the reason why we took python was it was the most representative language in the pre-training cus of codex people did multi uh like multi language human eval and I think python is still the best performing language there still for codex but basically the reason why we choose this was python was the most powerful like has a strongest performance uh with codex and we also know python more than we know any other language so there that it's much much easier to write in general and also set up for executions ex exactly yeah you can quickly execute it you know you don't have to like do a jar file and all that so I've done enough of that so I share your good thoughts there yeah there has been somewhat of a trend in machine learning to make everything end to end like uh long ago translation maybe you first wanted to extract the phonemes and then figure out what words were made up of the phones and now you put the raw audio in and you get the translation out do you think maybe that Trend will end if we're seeing like uh there's a benefit to offloading some of the more mechanical calculations like a logarithm to the python interpreter I think it really depends on how large is a skill of the task I think maybe some of the especially like natural language processing task we're starting uh we start from moving from more like modulized methodologies to more endtoend methodology as you say like machine translation we don't do a lot of computational linguistics methodology to discover some features before we do the end task but if we are talking about even higher level task like maybe solving a very complex problem I think we still want to uh leverage different components and but which can be orchestrated by language model or whatever powerful model but we still need to Leverage The Power of uh individual modules because uh they have their Specialties um to so in this way we can achieve a better performance along the same lines it's I think there's also this question of do we really have to express everything in language and is really is everything really expressed most efficiently in language and will everything be available in the pre-training data I think the answer is no and there's some obvious fail cases for example if you want to build a system that reasons about what happened today unless you have a way to train the language model every hour you really need to do some sort of retrieval to get the new piece of information and reason about it right so what's the stock price today uh what's the weather like today what should I do today how do I get from where I am right now to this place I want to be at efficiently you can still use a language model to solve these questions but language models will need signals from The Real World they'll need signals from what's happening today they'll need all that maybe a different modality and I think it's here to stay this idea of using language model as some sort of an operating system and then building on top of it yeah and are there any other approaches you would contrast your approach against I think you know prompting is kind of full of techniques of different ways of prompting Chain of Thought is one where before you spell out the answer you make the model think about the answer so you know in this casee um The Chain of Thought could be okay John's normal speed is 150 wordss per minute after training it's 2.5 times faster so the new speed is 150 * 2.5 so the model sort of is thinking out loud and it's thinking out loud in text and in our approach it's kind of writing a program so that's kind of the most direct comparison and I think that's our main sort of Baseline so to say and uh what's the future of the work where can it go next so for me I'm trying to uh incorporate program this kind of using program as an intermediate reasoning step to solve uh complex interactive decision- making tasks because I believe uh program is a good aux trading it can basically enhance the expressiveness of uh task solving so let's say like uh here we basically say program can help a model to connect with tools like uh python interpretor or even other apis but I think program is in general more uh expressive so for example people do planning uh we try to uh decompose a high level task to subtask which is more manageable but like in the program world it's basically writing Nas functions and those functions are reusable so I make it an add here and also like we have situational vience so for example if I want to take something on the table I just take it but if I need to take something from the drawer I have to open the drawer first so in program we can basically write this very concisely as a if else control flow so I think that there are a lot of analogy between like how people are solving tasks and how we can express that in programs and apparently program is like we can connect to tools by simply import their apis and probably read a little bit about about their API documentations so that um they can naturally adapt to those scenarios so for me I will try to make program useful in the more more interactive decision like solving complex interactive decision making tasks and Aman similar question for you where ises your research lead next I think in terms of programming and generating generating programs we have some work on learning to optimize programs and some work coming up work on how do you what happens when you have programs that have silly bugs which is there's a bug which is in one line and has needs a change of one or two characters to fix and I think in general and I guess this is where I'm getting to some crazy ideas uh which is not not necessarily my research but one idea that I'm very excited about and you see sort of this coming up in the literature but no one has really sort of you know buckled up and sort of looked at it in detail which is what happens if you start pre-training your language models from code so you don't start pre-training with text you have some sort of a small duration or some some token some when I say some of course you know a few billion tokens of code before you start pre-training there's work out there that shows that that shows like pieces of it so there's work that shows that code in general has very small perplexity which means that it's easier to predict and that makes a lot of sense because you know a lot of the times programs are predictable if you have written a class and if you're completing the class then after the init block you already know all the the variables so the number of things that can come next are sort of reduced in a program and of course there are indentations and those things so there's work that shows programs are easier to learn they have low perplexity and then there's work that shows that if you mix code with any other modality then this observation holds as you scale the perplexity gets lower and lower so I think an interesting hypothesis is that if you start training from code is it the case that that sort of gives gives a good initialization to the model we have a reason to believe it one more one more sort of hypothesis is that if you look at the open as descriptions of these models the base model for lots of the GPD 3.5 series was actually code D Vinci O2 which is apparently a code generation model again we don't know a lot of details for many of these things but I think to me that's very interesting like is there something fundamental about programs and code that can help us train good language models in general like Beyond code generation I think that's something I'm very interested in yeah well we've touched mostly on the use of large language models to solve these math problems or at least in this particular work I know you've also looked a little bit into the ability for llms to speak about graphs could you share some details about cocen so cocen is this paper that Shan and I did last year year and a half ago and the idea was that if you are doing structured Common Sense Generation by which I mean if you're doing graph generation for example if you're given a scenario like generate a plan to bake a cake now this plan can be represented as a graph so the first node in the graph can be gather ingredients then you can have a next node or an event which is mix ingredients start the oven and so on so you have this direct day cycle graph or a plan existing works up until that time of course every everyone was already using language models for this kind of stuff right they were already good but the question is how do you represent this output what structure do you use the most common idea was to break down this graph into a sequence of edges and then use the language model to generate it so if you have you know there's a to b and a to c then you have two edges a b a c and just generate it what we proposed is instead of instead of doing all that like writing the graph as a string writing it as sequence of edges why don't we represent the graph as a python class so there can be a class a fake class bake a cake and it could have fake functions like gather ingredients mix ingredients uh start the oven and the dependency between these events or the edges is again very natural it's you can write a main function and the main function can call these fake events in the right order so I keep saying it's fake because we don't really care about the python class itself python is only a way to represent the structure that's our kokoen work which sort of set the stage for some of the other followup works like palis I think the most noteworthy yeah and then uh is there anywh listeners can follow you online I'm on Twitter my Twitter handle is s Shu y n z hxy c on and also my homepage is my phone name.com sho.com and I think for me my Twitter is just my first name undor my last name which just much easier than shans but I think if we if you find either of us you'll find the other because we have a lot of work together and also Alo I have a website which is my last name. github.io cool we have links to all of the above in the show notes for people to follow up with thank you both so much for taking the time to come on and share your work thank you so much yeah thanks a lot for talking to [Music] us

Original Description

We are joined by Aman Madaan and Shuyan Zhou. They are both PhD students at the Language Technology Institute at Carnegie Mellon University. They join us to discuss their latest published paper, PAL: Program-aided Language Models. Aman and Shuyan started by sharing how the application of LLMs has evolved. They talked about the performance of LLMs on arithmetic tasks in contrast to coding tasks. Aman introduced their PAL model and how it helps LLMs improve at arithmetic tasks. He shared examples of the tasks PAL was tested on. Shuyan discussed how PAL’s performance was evaluated using Big Bench hard tasks. They discussed the kind of mistakes LLMs tend to make and how the PAL’s model circumvents these limitations. They also discussed how these developments in LLMS can improve kids learning. Rounding up, Aman discussed the CoCoGen project, a project that enables NLP tasks to be converted to graphs. Shuyan and Aman shared their next research steps. Follow Shuyan on Twitter @shuyanzhxyc. Follow Aman on @aman_madaan.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 0 of 60

← Previous Next →

Data Skeptic book giveaway contest winner selection

Data Skeptic book giveaway contest winner selection

OpenHouse - Front end and API overview

OpenHouse - Front end and API overview

OpenHouse Crawling with AWS Lambda

OpenHouse Crawling with AWS Lambda

[MINI] Logistic Regression on Audio Data

[MINI] Logistic Regression on Audio Data

Data Provenance and Reproducibility with Pachyderm

Data Provenance and Reproducibility with Pachyderm

[MINI] Primer on Deep Learning

[MINI] Primer on Deep Learning

Big Data Tools and Trends

Big Data Tools and Trends

[MINI] Automated Feature Engineering

[MINI] Automated Feature Engineering

The Data Refuge Project

The Data Refuge Project

[MINI] The Perceptron

[MINI] The Perceptron

[MINI] Feed Forward Neural Networks

[MINI] Feed Forward Neural Networks

Data Science at Patreon

Data Science at Patreon

[MINI] Backpropagation

[MINI] Backpropagation

[MINI] Generative Adversarial Networks

[MINI] Generative Adversarial Networks

[MINI] AdaBoost

[MINI] AdaBoost

[MINI] The Bootstrap

[MINI] The Bootstrap

[MINI] Gini Coefficients

[MINI] Gini Coefficients

[MINI] Random Forest

[MINI] Random Forest

[MINI] Heteroskedasticity

[MINI] Heteroskedasticity

Urban Congestion

Urban Congestion

[MINI] The CAP Theorem

[MINI] The CAP Theorem

Unstructured Data for Finance

Unstructured Data for Finance

Detecting Terrorists with Facial Recognition?

Detecting Terrorists with Facial Recognition?

Predictive Models on Random Data

Predictive Models on Random Data

[MINI] F1 Score

[MINI] F1 Score

Machine Learning on Images with Noisy Human-centric Labels

Machine Learning on Images with Noisy Human-centric Labels

The Library Problem

The Library Problem

Stealing Models from the Cloud

Stealing Models from the Cloud

Data Science at eHarmony

Data Science at eHarmony

Multiple Comparisons and Conversion Optimization

Multiple Comparisons and Conversion Optimization

Election Predictions

Election Predictions

[MINI] Calculating Feature Importance

[MINI] Calculating Feature Importance

MS Connect Conference

MS Connect Conference

The Police Data and the Data Driven Justice Initiatives

The Police Data and the Data Driven Justice Initiatives

Studying Competition and Gender Through Chess

Studying Competition and Gender Through Chess

[MINI] Goodhart's Law

[MINI] Goodhart's Law

Trusting Machine Learning Models with LIME

Trusting Machine Learning Models with LIME

Predictive Policing

Predictive Policing

Mutli-Agent Diverse Generative Adversarial Networks

Mutli-Agent Diverse Generative Adversarial Networks

[MINI] Convolutional Neural Networks

[MINI] Convolutional Neural Networks

Unsupervised Depth Perception

Unsupervised Depth Perception

[MINI] Max-pooling

[MINI] Max-pooling

Activation Functions

Activation Functions

[MINI] The Vanishing Gradient

[MINI] The Vanishing Gradient

Estimating Sheep Pain with Facial Recognition

Estimating Sheep Pain with Facial Recognition

[MINI] Conditional Independence

[MINI] Conditional Independence

MINI: Bayesian Belief Networks

MINI: Bayesian Belief Networks

Project Common Voice

Project Common Voice

[MINI] Recurrent Neural Networks

[MINI] Recurrent Neural Networks

The video discusses Program Aided Language Models, a technique that uses code as a representation of solutions to improve the performance of large language models on complex tasks. Guests Aman Madaan and Shuyan Zhou share their research on PAL: Program-aided Language Models, highlighting its potential to generate better solutions than other techniques. The technique can be used as an intermediate step to generate accurate solutions and can help kids understand how it works by back translating fr

Key Takeaways

Use code as a representation of solutions to improve LLM performance
Implement Program Aided Language Models
Use Chain of Thought prompting to improve LLM performance
Fine-tune LLMs on specific tasks
Use instruction tuning to improve LLM performance
Design and implement LLMs using program aided language models
Use Python to represent the structure of graphs and generate code

💡 Program Aided Language Models can generate better solutions than other techniques by using code as a representation of solutions and eliminating arithmetic calculations.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

How to Open HSD Files (Husqvarna Viking Designer Embroidery)

File Extension Geeks