Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Sam Witteveen · Beginner ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations90%Prompt Craft80%LLM Engineering70%Fine-tuning LLMs60%Multimodal LLMs50%

Key Takeaways

This video demonstrates how to use LangChain with the Alpaca7B model to create a conversational chatbot, utilizing Transformers and Hugging Face pipelines for text generation and fine-tuning.

Full Transcript

okay so a lot of people have been asking me how can we hook up alpaca to Lang chain and try it out for a chatbot kind of thing you can do this quite easy I'm gonna show you a way in this video and then later on in another video I'll show you some ways that we could take what we fine-tuned our special version and use that as well so you're going to need to install Transformers from the main branch from GitHub if you just install the normal pip version you won't get the version with the right tokenizer and the right model for doing this so you can basically just bring that in you're also going to need bits and bytes if you're going to do an 8-bit version and of course you're going to need Lang chain so what we're going to be doing is we're going to be using the hugging face llm wrapper in Lang chain to run this so you can see that okay first off we're just going to be bringing in our llama tokenizer and llama for causal language model remember that alpaca is built on llama and is made for a model that's the system similar style as alpaca is so that's where we're using these from Lang chain we're going to be bringing in the hugging face pipeline we're also going to need to just bring in some other things as we go so first off I'm bringing in the template and the llm chain just to show you that okay we can bring this in we can set up our model with bringing it in as eight bits so that we're going to use less memory and it will be faster for inference and here is where the lot of the magic happens is that the way that the Llama model and alpaca model are set up we can actually set up a hugging face pipeline for them which is going to be doing text generation and we just then just passing this model into that here we pass in the tokenizer we pass in things like our max length our temperature our top p and the repetition penalty in here as well and then we can set up a local llm from this hugging face pipeline so this is actually coming from Lang chain and is allowing us to set up a llm with this hugging face pipeline there once we've got that done we can basically try it out with just a normal llm chain so I'm just going to take one of the standard alpaca prompts and style of prompts for the template so we create a prompt template in here we pass this in and then you can see we're basically just setting up our prompt template with the template that we've made here and then the instruction is just going to be the variable that we're going to pass in so this is the stuff we've then basically got this going in here so we can have what is the capital of England and sure enough this is just going to inject this question into where the instruction goes in here and then it's going to give us our answer out our answer out is going to be the capital of England is London and then if I ask it the typical sort of uh alpaca question of okay what are alpacas and how are they different from llamas you can see here we get the standard answer that we got when we just played around with the model by itself without Lane chain all right so setting up the conversation part is one of the key things is we want to make use of the memory here so we're going to llama and alpaca have a good sized token span so we can actually go up to sort of 2000 tokens here what we're doing now unfortunately we fine-tuned it probably for a lot less than that so it will be interesting to see and I would like to hear back from you guys as well how well does it do when you get a really long token Span in there it made you really good you might find at times that it doesn't do as good but okay so we're going to set up our conversation chain so if you remember one of the things that we use in a conversation chain is the whole idea of a memory and the particular memory that I've chosen to use here is this conversation Mission buffer window memory so what this is going to do is give us a window that we pass across the conversation and we're going to represent T number of turns so for example in this case I've decided to set K to 4 which means we're going to have four turns going on and that will be the limit so this should keep us so our token span never gets too wide in the conversation all right setting up the conversation chain we just passing our local llm that we set up earlier with the hugging face pipeline we're going to pass in our memory to be this window memory that we've created here and we're going to just set verbose equals true so we can just see what's going on if we look at the the template here so just to see what the template the standard template is it looks something like this the following is a friendly conversation between a human and an AI the AI is talkative and provides lots of specific details from its context if the AI does not know the answer it truthfully says it does not and so you can see in this prompt we're injecting two things we're injecting the human input and we're also injecting the history or the memory that's coming of the conversation as it's gone through so I decided to modify this a little bit so here's the vision that I modified it to and you obviously play around with this yourself the following is a friendly conversation between a human and an AI called alpaca so we want to just have a little bit of knowledge about who it is so we could also put some things in there that alpaca is three years old alpaca loves to eat apples you could put some other things in there too trying it's always fun to watch people play with this if you've set it up to be quite funny as well so you could try that out to do that we basically just override the conversation.prompt.template so you can see I'm just taking that new conversation prompt template and trying it out so now it's got our name being alpaca stuff in there all right now we get to talking to it the first thing that I try right out was basically just saying hi there I am Sam so it's interesting here we know that the alpaca is fine-tuned not on dialogue right we haven't done a version that's fine-tuned on dialogue yet maybe that will be a future video but here we've done it's financially on tasks so they tend to be tasks where you ask it to give you a fact or you ask it to reproduce a list of something that kind of thing so when I don't actually ask it something it doesn't do a great job it's perhaps a little bit confused in here and what does it do it doesn't generate bad text it just then goes on to generate more text than we asked for so it's taking it in you know that oh it's talking to Sam so hey there Sam it's nice to meet you what can I help you with and then it generates Sam's response back of do you know what time it is and then alpaca sure sure do it's currently and then obviously passing in a token for where you could substitute the tone you could do a Bridge exchange or something they're putting the current time so then I realized that okay this is probably happening because we didn't ask it a question so it's trying to just keep being chatty and make up things itself so if we try it and we ask it okay what is your name so now it's much more on point right it's okay my name is alpaca how may I help you today so we've now got multiple turns in here but we probably shouldn't count the turn where it generated Sam and it's an alpaca it's really should be human AI is one turn in there now I ask you another question can you tell me what an alpaca is sure you give us an answer and it's not too long here you can see that an alpaca is a species of a South American camelid mammal they're typically brown or white in color and have long necks and legs right so they may ask how is it different than alarm so you see at this point now we've got these multiple steps of our memory being passed in of the conversation so we've got all the steps from the start being passed in here because we haven't met the sort of length of the window where it needs to start cutting things off yet okay how is it different than a llama alpacas and llamas are both members of the camera lady family I'm not sure how you pronounce that but they differ in several ways alpacas tend to be smaller than llamas and there and it then it stops there so it should have been able to go on not ideal again remember this hasn't been fine-tuned for conversation so we're just using the fine-tuned version of alpaca can you give me some good names for a pet alarm now if we go back to here look at the memory so this is the next thing I'm going to ask it you can see we've lost the start of the conversation so where I said hi my name is Sam you know that's all been taken out now we're going as far back now as where I asked what is your name so we've got one human two human three human four human and then this being passed in so a memory is a window of four that we've got going on there now you could experiment with the longer memory if you want to but remember all this is going into the language model so you will find that if your memory gets too long then it can actually be quite slow to do things all right so I ask it can you give me some good names for a pet llama sure here are some great options hacha Tikka cushy and wearer a and then I wanted to test it see does it still remember its name so ask it is your name Fred and you can see that we've lost the bit in the conversation where it told us our name so it's only relying on the name being up there yet still it understands from the context no my name is alpaca so that's good we next asked it okay what food should I feed my new lamb so again you see where our window of four is staying fixed so we're losing things that we spoke about early on in the conversation okay and then finally your new llama can eat grass hay even Alfalfa you could also try giving them some vegetables like carrots apples and bananas so this is just setting it up now you could actually try and mess with the prompt more to basically inject the whole personality in there do some things like that but the idea here is that it turns out that llama's doing pretty well I wouldn't say great but it's doing pretty well with the prompt for the chatbot and then passing in this memory that we've got going on the current conversation memory and then generating so this is something you could try and experiment with and this would also work on the Llama model if you want to actually try the Llama model my guess is that probably won't do as well as the alpaca model I don't know maybe I'll try that out and we can look at that in another video anyway as always if you have any questions please put them in the comments and if this was useful to you please click and subscribe I will see you in the next video bye for now foreign

Original Description

Colab notebook: https://drp.li/XapBR In this video, let's have a play with talking to an Alpaca7B model using LangChain with a conversational chain and a memory window. For more tutorials on using LLMs and building Agents, check out my Patreon: Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen My Links: Linkedin: https://www.linkedin.com/in/samwitteveen/ Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 21 of 60

← Previous Next →

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #2 Tools and Chains

LangChain Basics Tutorial #2 Tools and Chains

ChatGPT API Announcement & Code Walkthrough with LangChain

ChatGPT API Announcement & Code Walkthrough with LangChain

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain Chat with Flan20B

LangChain Chat with Flan20B

LangChain - Using Hugging Face Models locally (code walkthrough)

LangChain - Using Hugging Face Models locally (code walkthrough)

PAL : Program-aided Language Models with LangChain code

PAL : Program-aided Language Models with LangChain code

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 2

Building a Summarization System with LangChain and GPT-3 - Part 2

Microsoft's Visual ChatGPT using LangChain

Microsoft's Visual ChatGPT using LangChain

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

LangChain Agents - Joining Tools and Chains with Decisions

LangChain Agents - Joining Tools and Chains with Decisions

Investigating Alpaca 7B - Finetuned LLaMa LLM

Investigating Alpaca 7B - Finetuned LLaMa LLM

Comparing LLMs with LangChain

Comparing LLMs with LangChain

Running Alpaca7B in Colab

Running Alpaca7B in Colab

How to finetune your own Alpaca 7B

How to finetune your own Alpaca 7B

How to make a custom dataset like Alpaca7B

How to make a custom dataset like Alpaca7B

Understanding Constitutional AI - the paper and key concepts

Understanding Constitutional AI - the paper and key concepts

Using Constitutional AI in LangChain

Using Constitutional AI in LangChain

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Text-to-video-synthesis with Diffusers and Colab

Text-to-video-synthesis with Diffusers and Colab

Meet Dolly the new Alpaca model

Meet Dolly the new Alpaca model

Checking out the Cerebras-GPT family of models

Checking out the Cerebras-GPT family of models

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

Is GPT4All your new personal ChatGPT?

Is GPT4All your new personal ChatGPT?

Raven - RWKV-7B RNN's LLM Strikes Back

Raven - RWKV-7B RNN's LLM Strikes Back

Talk to your CSV & Excel with LangChain

Talk to your CSV & Excel with LangChain

Vicuna - 90% of ChatGPT quality by using a new dataset?

Vicuna - 90% of ChatGPT quality by using a new dataset?

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

Auto-GPT - How to Automate a Task Based AI with GPT-4

Auto-GPT - How to Automate a Task Based AI with GPT-4

Improve your BabyAGI with LangChain

Improve your BabyAGI with LangChain

Generative Agents - Deep Dive and GPT-4 Recreation

Generative Agents - Deep Dive and GPT-4 Recreation

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Red Pajama - Operation: Freeing LLaMA

Red Pajama - Operation: Freeing LLaMA

Investigating Open Assistant - Models, Datasets and Addons

Investigating Open Assistant - Models, Datasets and Addons

Investigating MiniGPT-4 - The Secret behind GPT-V?

Investigating MiniGPT-4 - The Secret behind GPT-V?

Stable LM 3B - The new tiny kid on the block.

Stable LM 3B - The new tiny kid on the block.

Bard can now code and put that code in Colab for you.

Bard can now code and put that code in Colab for you.

Checking out Bark: a Text to Speech system by Suno AI

Checking out Bark: a Text to Speech system by Suno AI

Fine-tuning LLMs with PEFT and LoRA

Fine-tuning LLMs with PEFT and LoRA

Master PDF Chat with LangChain - Your essential guide to queries on documents

Master PDF Chat with LangChain - Your essential guide to queries on documents

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

StableVicuna: The New King of Open ChatGPTs?

StableVicuna: The New King of Open ChatGPTs?

WizardLM: Evolving Instruction Datasets to Create a Better Model

WizardLM: Evolving Instruction Datasets to Create a Better Model

LaMini-LM - Mini Models Maxi Data!

LaMini-LM - Mini Models Maxi Data!

Finding the Best Free ChatGPT

Finding the Best Free ChatGPT

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Transformers Agent - Is this Hugging Face's LangChain Competitor?

Transformers Agent - Is this Hugging Face's LangChain Competitor?

StarCoder - The LLM to make you a coding star?

StarCoder - The LLM to make you a coding star?

Testing Starcoder for Reasoning with PAL

Testing Starcoder for Reasoning with PAL

The New Wizards - Unfiltered & Unaligned

The New Wizards - Unfiltered & Unaligned

Camel + LangChain for Synthetic Data & Market Research

Camel + LangChain for Synthetic Data & Market Research

This video teaches how to create a conversational chatbot using LangChain and the Alpaca7B model, covering topics such as text generation, fine-tuning, and prompt engineering. By following the steps outlined in the video, viewers can build their own chatbot and experiment with conversational AI.

Key Takeaways

Install Transformers from the main branch from GitHub
Bring in Llama tokenizer and Llama for causal language model
Set up a hugging face pipeline for text generation
Pass in tokenizer, max length, temperature, top p, and repetition penalty
Set up a local llm from the hugging face pipeline
Setting up conversation chain with memory buffer window
Passing local LLM with Hugging Face pipeline
Injecting human input and conversation history into prompt template
Modifying conversation prompt template
Talking to Alpaca chatbot

💡 The Alpaca7B model can be fine-tuned for conversational AI tasks and can understand context and provide relevant answers, making it a suitable choice for creating a chatbot with a memory window.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss

Learn how to accelerate AI workflows with on-device semantic search using Moss, achieving sub-10ms response times and improving user experience

Medium · Machine Learning

Stop Guessing: Guaranteed Structured Output from LLMs in Node.js

Learn to guarantee structured output from LLMs in Node.js and stop parsing JSON manually

Dev.to · Hardik Mehta

Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)

Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications

Notes: Memory, Context, and Large Language Models (LLMs)

Learn how memory and context work in Large Language Models (LLMs) and potential improvements

Dev.to · Vladimir Panov

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)