PAL : Program-aided Language Models with LangChain code

Sam Witteveen · Intermediate ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations80%LLM Engineering70%

Key Takeaways

The video explores Program-aided Language Models (PAL) and its implementation in LangChain, providing code examples and demonstrations of its capabilities.

Full Transcript

okay in this video I'm gonna go through pal which is program aided language models I'm going to go quickly through the paper and some of the key points in the paper and their website and then I'm going to show you the code for implementing and using this in Lang chain so this paper comes out of CMU which has a very good LP team there and these are the authors you know number of authors that have worked on it all right this really is an interesting paper in that it's kind of going along with this new theme of that rather than just train up models to get bigger and to get better at things you can also look at using in context learning and using few shot prompting to basically manipulate these models to do something with a tool with an external tool that's outside of the actual language model so what a lot of this is doing is used for math now math has been an issue for language models for a long time that they've had problems with doing even simple things like additional multiplication with large numbers I and that a lot of that has to do with just the way that they're trained so if you think about it they will learn a lot of things with small numbers in certain scenarios because they see that a lot in their training data but they don't really learn sort of the logic of how how to do multiplication or division or even multiple things all in the same the same equation so what this basically proposes is that rather than rather than just export out the answer why don't we export out a Python program and this Python program can then be used to actually calculate the answer so it can be run with PRI python read evaluate print Loop and then that can be used to basically give us the answer so it's interesting here that they're using the Codex model so they've found that this works better that the actual code models which makes sense because they're trying to generate code here and the goal here is to basically offload a solution to a python interpreter right this is what they're doing so if you look at the ID here of you know what's being done in the past the Chain of Thought stuff was really interesting when that came out last year and the idea there was just by asking the language model to basically give us step-by-step thinking for it could it basically give us the answer and it turned out that this worked in a certain amount of time certainly much better than just asking for the answer so I you would often see things like let's think about this step by step a lot of those things are prompting for Chain of Thought whereas what this is doing here is basically getting the model to Output a set of variables which we can then use so we can basically take this Rogers tennis balls right and this tennis balls equals five this is just like a python thing two cans of three tennis balls each so this is going to be the bought balls is two times three and then the answer is going to be the tennis balls plus bought balls so you can see that doing these things like this allows the python interpreted to actually do the math part the language model the other has to do the leg the math pot it's just basically doing what it's just creating the function that then it's going to be that then is going to be run in Python itself so the paper's very interesting I encourage you to go and have a look at it they focus on certain types of math and they focus on certain types of questions you'll see that they use this m8k Benchmark which I'm pretty sure stands for grade school math 8K and these are basically math word problems and this is this is one of the data sets there's another one GSM hard which they use for this and you'll see that going through this it talks a little bit about how they craft prompts for this how they set those prompts up to basically then export out or to predict out a python function so if comparing to the bass lines this does does far better so there's some really nice implementation points in here of showing us that okay that they're using greedy decoding with a language model with a temperature of zero and they're using this codex code DaVinci 2 model so you'll see when we get to the actual code part we're going to use the same model that they're using in the actual paper let's have a look at the website so they have this nice website and I'll put the links in the description for all of this and you can see that not only do does this do well but when it's compared to the other forms of doing this kind of thing it does you know very well you see just direct prompting doesn't do very well at all the Chain of Thought prompting we see quite a big jump for a lot of these and then the pal which is an even bigger job for a lot of these so these are various data sets that they've tested on the GSM 8K is the one I've just been talking about and the GSM hard you can see really big jumps on some of these so GSM 8K was one that people used a lot for comparing big models you can see that just by adding this because we're take we're taking the heavy lifting out of the model and putting that into the actual code and the functions that we're using all right so some sample outputs they've got a bunch of them here that you can look at which will show you the question will show you the The Chain of Thought what how the Chain of Thought handles this and then it will show you how actually how it's turning it into a python function so that you can see this so I'll show you in the code that you might want to actually sometimes change the prompts so this could be where you could get more examples to use in prompts for this so for using the in context learning using some of these you could actually come come through and look for the kind of thing that you particularly want to do so they've got a whole bunch of sort of standard ones in here they've got the the GSM 8K hard so this is often done a lot more things to do with like floats some different types of questions you're going to see we've got the colored objects questions so in Lang chain it actually has a colored objects prompt so it has its own colored objects pal chain so you'll see that in the code in a second all right and then we've got things like date understanding so the the Lang chain ones don't seem to do that well with this so this is something that I would look at maybe adding as a prompt or something to the work you're going to do in Lang chain to see okay how does it do with these kind of things another one that it perhaps doesn't do as well is these repeat copy things so where you're asking it to do a specific task and you'll see we'll go through it through some of these in the code so have a look at the paper the researchers have done a really good job here it's pretty easy paper to read and see what's going on and it allows you to understand okay what's actually going on in the code so that if you want to change some of this or you want to experiment with this for other things with exporting python functions you would be able to do this kind of thing all right let's jump into the code Okay so we've gone through the paper let's jump in and have a look at the code so it turns out that Lang chain has pal tool in there or Pal chain rather in there that we can use so this is basically bringing it in it's quite simple for bringing this in and you'll notice that we're not using the the normal DaVinci 3 model or a chat gbt model we're actually using one of the code models for this so this is to make it like the paper as we go through this so I've put up some examples that you could play with here I will look at some from the paper and some from some other papers to see how they work and then I'll show you some examples of things that don't work as well so setting up the the chain is pretty simple there are three different sort of versions of it the main one we're going to start out with is the from math prompt and then I'll show you the one for colors and another one for step-by-step stuff as well okay we've brought this in we're doing verbose equals true so that we can see what's actually going on and stuff as we go through this so you can see here we've got this is from one of the flan papers from memory it's very simple sort of thing and the big models can do this straight out right and a lot of times they will do it with the Chain of Thought prompt to get them to be better at this but certainly smaller models and even some of the bigger flan T5 models struggle at this kind of thing so you can see here we've got the cafeteria had 23 apples if they use 20 for lunch and bought six more how many apples do they have and sure enough we can see that when we run this pass this question into the power chain and run it we can see that this is actually converted what we've got here and extracted the variables right so the initial apples is 23 apples used Apple Sport and then constructed a very simple math equation for this and it's wrapping it all in Python and then running that python so that we can see the final output is going to be no fine right which 23 minus 20 plus 6 equals nine and we can see here that if we look at that this is the question that I put in this is the actual template that we're doing here like I always say you should look at the templates yourself so that you can learn from them looking at this it's a bit hard to see so what I'm going to do is just copy this over and we'll just post it in here so that we can actually read the full template so for a start you'll notice that the template's quite long and this sort of fits with often up here we will set the large language model to be 512 tokens for this because we're going to have a long prompt that we're putting into this right and definitely using the pal we'll use a decent amount of tokens so you can see that what we're doing here is the in context learning we're basically just giving in examples of a question Olivia has 23 dollars she bought five bagels for three dollars how much money does she have left and then we're giving it the python function that it should generate right so that this would be a solution in Python and it should generate the output and it should always make the variables related to the sort of the actual words in the problem and we can see that okay sure enough it's gone through it's done that now notice we're not giving it the answer because we're going to take this and we're going to run this actually in Python so we don't want the language model to give us the answer we just want the language model to give us the python function which we can then basically run so we've got one example there got another one and this is one of the big things that I want you to take away from this is just how many examples they're giving in the in context learning here so we've got one two three four five six seven eight all right so there are eight examples for this to learn from before it starts doing it and then we just pass in the question that we've got and we're going to tell it solution in Python and then it's going to generate out a function which we will then take and run so you can look at this you'll notice that it always initializes the variables with variable names that relate to the problem it should do the math in there now you'll see that most the math in this particular example is reasonably simple and this could be the reason why it'll get some wrong that I'll show you later on so you could actually even mess with this prompt a little bit more if you're going to do some examples around time or some examples around other things so let's go on that's the actual prompt if we jump in and look at to some of the examples from the data sets used in the paper so remember the GSM 8K this is grade school math questions and we can see that here we've got a question and it's running it and it's working it's generating a python function with variables that match the question it's putting each of them in the right place and then it's exporting then running that and giving us the answer out so we can see we've got multiple ones and I encourage you to go through and play with the collab yourself and try putting in changing these up and playing with it I've taken some of these from the actual the page with the paper on it that we looked at a little bit before so you can see some of these examples are from here where we've got the GSM 8K outputs we've got various other outputs that we can look at in there and you can see that if you go through it can do it can do basically integer math it can do float math it can do a variety of different things one of the kind of examples that I find very interesting are these final examples of repeat copy so you'll see this is where the question will be something along the lines of asking it to do something with language so in this one say the letters of the alphabet in capital letters but only the odd ones all right another one is the repeat cheese seven times every third say whiz so cheese cheese whiz cheese whiz cheese okay so if we look at uh running this in Lang chain it doesn't do quite a great job we're getting cheese we've got no spaces in there right but okay let's forgive that we've got lots of cheeses before we get and then we've got the Wizards right at the end so it's probably getting this wrong because there's none of these examples in the actual prompt so you could actually go and change the prompt or maybe we look at doing you know that in a future video or something if people are interested please comment and let me know I but the one about the letters it does get right so you can see here that okay say the letters of the alphabet in capital letters but only the odd ones these are the answers from the paper and sure enough it gets it right another site that I thought would be interesting is actually just to take a site that's made for children to to learn this stuff I and this is a whole bunch of different questions so you can put these in there and it tells you actually sort of what grade they are and this goes I think from third grade up to eighth grade and some of them some of them are very easy but then some of them do become challenging so I've put a bunch of those in there that you could play around with and see which ones that it gets and it gets most of them right this is a kind of interesting one so interpreting basically we want to see okay can it work out the remainder when we're actually doing division what's left over and you can see sure enough it does the division and then it does the remainder so it's using the modular there to basically work out the the remainder for this mixed operations doing multiple things and stuff like that it seems to do pretty well with those some things related to percentages right and this one is actually a little bit more like reasoning where it's asking it to do things and then you have to say okay which group is bigger and you can see it's written a whole bunch of conditional logic in the function here to work this out so that these are worth having a play with and you could think about where you could use these in in various places and then we get some failures it doesn't do very well with time so this is basically I wake if I wake up at 7 00 a.m and it takes an hour and a half to get ready and walk to school what time will I get to school obviously the answer here would be 8 30 I but it says eight so it's looks like it's getting some things wrong in there and I think it's basically again I think this is more because there's nothing like this in the prompt and so maybe you could improve this by playing around with the prompt another one was ratios that it seemed to get wrong so this one it's looking to get a ratio of where the difference between these I'll let you go through and play with this and read it yourself is 12 including the difference between 21 and 9 would be 12 but 21 in my minus nine it hasn't done a great job there okay the colored objects you can see this is the second kind of power chain in length chain so this is specifically for working with colored stuff and I've played around with these a little bit so you can see that okay on the desk two blue booklets two purple booklets one purple hat and two yellow pairs of sunglasses if I remove all pairs of sunglasses how many purple items do I have it remains so we would expect that we're gonna see two and three here so it's interesting to see the logic of how it does it it basically makes a list it adds in versions of these lists right so we can see that we've got two for those ones we've got one for our purple hat and then it goes and removes all the sunglasses in there and then it counts the number of purple things and it gives us the right answer there so this one I'm not sure exactly what you would use it for but my guess is that maybe there are certain times when colors are going to be really important to things you could probably play around with this prompt to make it something like the number of widgets so if you're building some kind of chat bot or something it was doing maybe some sort of like accounting task where people say all right we had 15 people by widget a 27 by widget B 36 by widget C seven people returned widget a three people returned widget B this kind of thing you could probably change this prompt to fit your very specific example so that it would get the math right for something like that and the final one is this idea of this immediate steps idea what this will basically do is it will just break down what it was doing into the different steps so that we can see this so if we look at the immediate steps and what I've done here is basically just copy the output here of the immediate steps we can see that it's actually broken it down in parts of the function so if you were getting something wrong you could come in here and actually do a run through with the breakdown of the immediate steps and then use that to determine okay should I try and adjust the prompt or something like this so it's not this power thing is not perfect it won't work all the time but it's amazing that it works as much as it does it certainly in the past before this paper people were just generally relying on things like Chain of Thought and just trying to expect the large language model to return this so it's a really cool idea and I think it's this idea we're going to see used a lot more in the future of where people use a python function to basically do something and then you're getting the language model just to write the python function based on on a particular natural language query kind of thing so anyway hopefully this was useful to you if you have any questions please put them in the comments I will always try and go through and answer comments if this is useful to you please click subscribe and let me know what kind of videos you would like to see going forward alright thanks for watching

Original Description

This video goes through the paper Program-aided Language Models and shows how it is implemented in LangChain and what you can do with it. To access all the code please use the Colab link below. Let me know in the comments what other videos you would like to see. Colab:[https://drp.li/cdhf2](https://drp.li/cdhf2) PAL Paper website: https://reasonwithpal.com/ PAL Paper: https://arxiv.org/abs/2211.10435 120 Math Word Problems: https://www.prodigygame.com/main-en/blog/math-word-problems/ My Links: Twitter - https://twitter.com/Sam_Witteveen Linkedin - https://www.linkedin.com/in/samwitteveen/ Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials 00:00 PAL Paper 02:35 PAL example prompts in the paper 04:52 Paper Results vs CoT & Direct prompting 05:37 Sample Outputs 07:34 Code Walkthrough 09:35 PAL LangChain PromptTemplate 11:57 GSM8k examples 12:55 Repeat Copy examples 14:06 Word Math Problems 16:00 Colored Objects #LangChain #BuildingAppswithLLMs #python

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 8 of 60

← Previous Next →

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #2 Tools and Chains

LangChain Basics Tutorial #2 Tools and Chains

ChatGPT API Announcement & Code Walkthrough with LangChain

ChatGPT API Announcement & Code Walkthrough with LangChain

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain Chat with Flan20B

LangChain Chat with Flan20B

LangChain - Using Hugging Face Models locally (code walkthrough)

LangChain - Using Hugging Face Models locally (code walkthrough)

PAL : Program-aided Language Models with LangChain code

PAL : Program-aided Language Models with LangChain code

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 2

Building a Summarization System with LangChain and GPT-3 - Part 2

Microsoft's Visual ChatGPT using LangChain

Microsoft's Visual ChatGPT using LangChain

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

LangChain Agents - Joining Tools and Chains with Decisions

LangChain Agents - Joining Tools and Chains with Decisions

Investigating Alpaca 7B - Finetuned LLaMa LLM

Investigating Alpaca 7B - Finetuned LLaMa LLM

Comparing LLMs with LangChain

Comparing LLMs with LangChain

Running Alpaca7B in Colab

Running Alpaca7B in Colab

How to finetune your own Alpaca 7B

How to finetune your own Alpaca 7B

How to make a custom dataset like Alpaca7B

How to make a custom dataset like Alpaca7B

Understanding Constitutional AI - the paper and key concepts

Understanding Constitutional AI - the paper and key concepts

Using Constitutional AI in LangChain

Using Constitutional AI in LangChain

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Text-to-video-synthesis with Diffusers and Colab

Text-to-video-synthesis with Diffusers and Colab

Meet Dolly the new Alpaca model

Meet Dolly the new Alpaca model

Checking out the Cerebras-GPT family of models

Checking out the Cerebras-GPT family of models

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

Is GPT4All your new personal ChatGPT?

Is GPT4All your new personal ChatGPT?

Raven - RWKV-7B RNN's LLM Strikes Back

Raven - RWKV-7B RNN's LLM Strikes Back

Talk to your CSV & Excel with LangChain

Talk to your CSV & Excel with LangChain

Vicuna - 90% of ChatGPT quality by using a new dataset?

Vicuna - 90% of ChatGPT quality by using a new dataset?

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

Auto-GPT - How to Automate a Task Based AI with GPT-4

Auto-GPT - How to Automate a Task Based AI with GPT-4

Improve your BabyAGI with LangChain

Improve your BabyAGI with LangChain

Generative Agents - Deep Dive and GPT-4 Recreation

Generative Agents - Deep Dive and GPT-4 Recreation

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Red Pajama - Operation: Freeing LLaMA

Red Pajama - Operation: Freeing LLaMA

Investigating Open Assistant - Models, Datasets and Addons

Investigating Open Assistant - Models, Datasets and Addons

Investigating MiniGPT-4 - The Secret behind GPT-V?

Investigating MiniGPT-4 - The Secret behind GPT-V?

Stable LM 3B - The new tiny kid on the block.

Stable LM 3B - The new tiny kid on the block.

Bard can now code and put that code in Colab for you.

Bard can now code and put that code in Colab for you.

Checking out Bark: a Text to Speech system by Suno AI

Checking out Bark: a Text to Speech system by Suno AI

Fine-tuning LLMs with PEFT and LoRA

Fine-tuning LLMs with PEFT and LoRA

Master PDF Chat with LangChain - Your essential guide to queries on documents

Master PDF Chat with LangChain - Your essential guide to queries on documents

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

StableVicuna: The New King of Open ChatGPTs?

StableVicuna: The New King of Open ChatGPTs?

WizardLM: Evolving Instruction Datasets to Create a Better Model

WizardLM: Evolving Instruction Datasets to Create a Better Model

LaMini-LM - Mini Models Maxi Data!

LaMini-LM - Mini Models Maxi Data!

Finding the Best Free ChatGPT

Finding the Best Free ChatGPT

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Transformers Agent - Is this Hugging Face's LangChain Competitor?

Transformers Agent - Is this Hugging Face's LangChain Competitor?

StarCoder - The LLM to make you a coding star?

StarCoder - The LLM to make you a coding star?

Testing Starcoder for Reasoning with PAL

Testing Starcoder for Reasoning with PAL

The New Wizards - Unfiltered & Unaligned

The New Wizards - Unfiltered & Unaligned

Camel + LangChain for Synthetic Data & Market Research

Camel + LangChain for Synthetic Data & Market Research

This video teaches how to implement Program-aided Language Models using LangChain, providing code examples and demonstrations of its capabilities. It covers the PAL paper, example prompts, and code walkthroughs.

Key Takeaways

Access the PAL paper and LangChain code
Understand the concept of Program-aided Language Models
Implement PAL using LangChain
Test and evaluate PAL performance

💡 Program-aided Language Models can improve LLM performance and capabilities, and LangChain provides a convenient framework for implementing and testing PAL.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Your LLM Doesn’t Pick Stocks — It Remembers Them

Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies

Medium · Machine Learning

Word Representation

Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation

When Cosine Similarity Approaching Singularity in Google Search AI Mode

Learn how cosine similarity approaching singularity affects Google Search AI and unified knowledge graphs, and why it matters for AI engineers and data scientists

When Cosine Similarity Approaching Singularity in Google Search AI Mode

Learn how cosine similarity approaching singularity affects Google Search AI and unified knowledge graphs, and why it matters for data science and AI development

Medium · Data Science

Chapters (10)

PAL Paper

2:35 PAL example prompts in the paper

4:52 Paper Results vs CoT & Direct prompting

5:37 Sample Outputs

7:34 Code Walkthrough

9:35 PAL LangChain PromptTemplate

11:57 GSM8k examples

12:55 Repeat Copy examples

14:06 Word Math Problems

16:00 Colored Objects

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)