Transformers Agent - Is this Hugging Face's LangChain Competitor?

Sam Witteveen · Advanced ·🧠 Large Language Models ·3y ago

Key Takeaways

The video explores Hugging Face's Transformers Agent, its comparison to LangChain, and its utilization of lessons from papers like Toolformer and ReACT, along with models hosted on the HuggingFace Hub.

Full Transcript

okay in this video I'm gonna look at the Transformers agent so this is a new thing that came out this week from hugging face at first I wasn't sure exactly what it was but then having a look at it it's kind of cool and I think it's definitely what having a video about it and looking at what it's actually doing so in some ways this is doing some things that are very similar to launching tools or actually to the paper tool former where it's basically making an agent that can call different tools and use different tools the key thing here though is that the tools themselves are actual different models on the hugging face Hub not necessarily made by hanging face themselves but hosted on the Hub and can be used with the Transformers Library here so I'll go through the The Notebook but just to sort of show you okay what actually is going on here they've got a nice little diagram here of where you basically put in an instruction which then gets inserted into a prompt just like in Lang chain and then as well as the instruction getting inserted in there there's also a bunch of tools in there so this is a good example of using quite a lot of tools we'll see as we go through this and then those tools are basically injected in there the agent can either be you know open AI bot or it could be one of the ones from the hugging face Hub so it could be the open assistant or Star coder that seems to be a good one uh that they're using for this which is a good sign that there's now open source things that can do that and then basically it will decide what tool to use the tool will then basically just interface with python run using the input and then generate some kind of output so let's jump in and have a look at you know at the code so I basically pulled apart their demo and sort of played around with it a bit just so I could see exactly what's going on here you will need to install a number of libraries when you're doing it the challenge is that if you don't have these in you can't use any of the models that would rely on these so for example image generation is going to be using diffusion models which is going to be using the diffusers library in there same for some of the other things too obviously you need the Transformers library to get this going so you can use open AI or you can use their own internal models so I'll jump around a bit or you can you so you can use the star coder model or you can use the open Assistant model if you've gone for that so when you're setting it up you either put in your open AI key here or you're putting you know using the hugging face Hub you put in your face Hub token in there as well you will need some little functions to play back audio to do things like that once you've got that you basically just initialize the agent that for whatever you're using so in testing I tested very briefly you know and I found that sort of like the open AI one seems to be going better they kind of recommend that although I also tried the star coder one and it seemed to be working as well so all right let's look at what you do you basically initialize your agent here so that's just basically using the large language model via an API so you're not actually installing that model even if you're using star coder or if you're using uh open assistance model in here you're not including installing that locally you're calling that by an API call and then you can see we'll basically just say agent run and then we pass in an instruction so here I've got agent run generate an image of a Maine uh gray cat sitting down resting and you can see what it actually does so it will go through and it will basically do the explanation to itself so this is exactly like Lang chains sort of agents and tools which comes from the react paper comes from Tool former paper comes from a number of different sort of papers putting all this together but we put the same sort of thing so it basically explains to itself I will use the following tool image generator to generate an image according to the prompt and then so then it basically downloads that and it's passing in okay image generator prompt equals mancoon gray cat sitting down resting so you notice that it managed to delete the first part of the prompt two which is kind of cool right that it's understood that that was the instruction to get this module not the instruction for passing into this module okay it downloads a model that text to image model I'm not sure what it is I'm pretty sure it's one of the diffusion models that they've got going here and sure enough it makes this a picture no this one perhaps wasn't the best one that it made early on but anyway it's we've got our Maine gray cat sitting there slightly dodgy eyes and so now you could you know that this is a one-off sort of instruction so when you do agent run whenever you do agent run you're just doing a sort of like a zero shot instruction for this now it can actually do when you're doing this it will interpret if it needs multiple tools to do this so for example here I say I read out loud the summary of techcrunch.com and the other thing I've changed here now is I put return code equals true so that we can actually sort of see what it's actually doing in here when we do this so okay it's got its explanation that I will use the following tools text downloader to download the text from the website summarizer to create a summary of the text and text reader to read it out aloud then we can see that the code generated is going to be text downloader and it's getting it from techcrunch.com I deliberately left out the https to see would it get it yes it's gotten that quite nicely it understands then that it needs to basically summarize that text it actually prints out a summary here if we see here the summary being printed out and then it basically downloaded a text-to-speech T5 model and that's what we've got here so if we you can see here we've got this you know Google I O is a wrap bring that back to the start and if I press play we can listen to it Google I O is a wrap Elon Musk has the CEO for Twitter dungeons dragons gets its very own streaming Channel and it's basically just reading out the headlines that were on that page still pretty cool though that we were able to do all of this with just one little string saying read out loud the summary of techcrunch.com now their text-to-speech you could imagine you could look at writing your own tools later on but you could imagine that if you wanted to use a better TTS system you could certainly ping something and write a tool to actually do that in here so this is some of the basic use of it now they have a second way of doing it where they have it in chat mode so they talk about you know that that run mode doesn't keep a memory across this so this is sort of like the the zero shot agents where we don't have any sort of memory conversation memory going on whereas chat does have the conversation memory going on now they talk about that the run is sort of better for things where you've got multiple operations at once and chat tends to work better when you're doing individual instructions one at a time kind of thing so we do the agent Chat Show me an image of a ginger Maine cat and you can see that the output that it's doing is getting the image generated just like before it generates the cat it didn't actually need to load the model in this time because it's already had the model in there so that's that's worked out well so there's our cat we can also transform images so it's got a module tool or a tool to trans storm images so if I say transform the image so that the background is in the snow it does an okay job it seems to have just Overexposed it but we've got the white sort of snowy look I guess a bit more play around with it yourself and see what what it can do it can also ask you to make a mask of something and it will make a mask of it so that's something that you could use too if you wanted to if we're going to sort of terminate the memory from a chat session we just say prepare for a new chat and that will basically just re-init the memory so that there's nothing in there now I could have come in here we'll see in a second for another thing but I could come in here and asked to then to write a description about this image and it would have remembered that this image was what we had in memory because of the chat memory system in there so going from one task to the other this memory can come in useful so the list of tools that they've got is quite a decent list so they've got this is just taking from their demo they've got document question answering they've got a whole bunch of different things like that they've got uh blip and these are actually showing the models that they're using so for text question answering they're using the flan T5 model I'm not sure which size one they're using but they're using blip for image captioning they've got you know speech to text with whisper and then text-to-speech with the speech T5 is what we were using before I and we can see that it's a number of different things that they've got going on there and they've also got community based tools so this text downloader is actually a community based tool and you can imagine that there's going to be a lot more community-based tools going forward over the next few weeks or months or so so start out again and I want to sort of try something out so what I thought is let's do some q a where we just basically say you know who is this so we ask it about something on the TechCrunch site so I've got a URL for an article there which was talking about Elon Musk appointing a new CEO for Twitter so I basically just passed in that URL and said who is the possible news Twitter CEO based on this article at and then put the URL in there and it's done a great job of working out that okay I need to use the text downloader to go and get the text I then need to use the text QA to answer this question so the QA model it had didn't have downloaded so it comes along and downloads a model there and then sure enough it's able then to work out what what does it pass to the the QA module who is the possible new Twitter CEO and then passing in the text sure enough it got it right if we look at the article it does talk about this woman who is the NBC Universal head of advertising and she's supposedly the next Twitter CEO for that now I wanted to see like would it know that okay it turns out the article when I went and looked at it it mentions both the future CEO being this person Elon as the current CEO and the former CEO from when before Elon bought this so we ask it who was the former Twitter CEO based on the article it gets that right as well then it gets Prague I grow well correct there but then when I ask it who's the current Twitter CEO based on the article it didn't get that right and that's basically just the limitation of the model for this so that's nothing to do so so this is where the Transformer agent is getting everything correct there but because it's using certain models it's not using the actual large language model for doing this for example meaning for doing the question answering we're not using open AI for that we're just using open AI for the agent part here I try to do translate the title uh of the article into French it seemed to understand what it should be but it should use translator for whatever reason I was getting errors with that so that didn't seem to work I want to ask it to summarize the article for me sure enough it actually gives a pretty decent summary right uh summary Elon Musk announced that he has found a new CEO for Twitter the new CEO is expected to start in six weeks it's got some interest the information there although I don't think it actually mentions the person's name of the woman who's actually starting in that role okay so then the next part is I'm just reinitializing it and this has basically just taken from their demo and this is showing how you would add a new tool so it's it's pretty simple but what I wanted to sort of show you was how you know it's basically so this is try to make a tool where you can just say get me a picture of a cat and it turns out that I didn't realize there's a website which is cat as a service API which streams pictures of cats you can just ping it and it will ping back a cat all right and you can see that they basically uh just turned that request here into a class and into a tool that you can basically use for fetching so this is just basically putting it back now the key thing here is that they need to make this into a tool so this is almost identical to what we do in Lang chain is that you need to have a name for a tool and then you need to have a description for the tool and the description for the tool is what the the sort of agent model looks at to decide whether it uses the tool or not so this is a tool that fetches an actual image of a cat online it takes no input and Returns the image of a cat so that shows how it's basically put together if we just run the tool we can see that okay you know it will run if we run the tool from from an actual agent so if we now say fetch an image of a cat online and caption it for me here it's basically getting the image and then it's just using the blip model I'm pretty sure to write the caption of a cat called on a pillow so just quickly looking at there at the source code so so you know I wanted to see like are they using the name chain to do this what are they how are they putting this together so it turns out they're not using Lang chain in here they've got their own system of doing things so they've basically created this sort of tools section inside transformers now that has the agents and then has the various tools each one's it's sitting in here so if we look at the agents we can see that we've got a whole bunch of code in there for pinging it and really probably what's more interesting is if we look into the prompts so this is the tool prompts for what it's doing so we can see that there's a run prompt for this and for the Dell I think you'll see that there's a a chat prompt as well so when we look at the prompt it's basically I will ask you to perform tasks your job is to come up with a series of simple commands and close that in Python that will help perform a task so and then it goes on and then they're injecting all the tools in here so this is the same as what we would do in Lang chain where we would basically inject the tools into the agent that then we're going to use they've then got some some in context learning to basically give some examples for different things that like okay for this kind of input this is the the where you would phrase the output and this is the actual code that you would write to do that so they've got you know a bunch of examples in there and then they've got the format for just outputting putting the original task in that the users put in there and then it basically just outputs the result okay so this is the chat one this is pretty similar it's maybe a little bit different in here we can see that we've got them injecting the tools in there and we've also got the Ico learning going on in there so overall it's it's an interesting project it's certainly worth having a play with just to sort of see what it can do and it's also interesting to think that you could basically take any of these tools and use them for Lang chain as well it wouldn't be that difficult to write these as custom tools for something in Lang chain so if you did want to make something for Lang Chang where you were basically doing text to speech or speech to text or something like that that maybe there isn't a clear off-the-shelf tool already this would give you a good way of seeing how to sort of write a tool that could actually do that anyway have a play with the notebook it's definitely a fun thing to sort of look at there are definitely some real world uses here with the document question understanding and those kind of things in here as always if you've got any questions please put them in the comments below if you like the video please click like And subscribe I will talk to you in the next video bye for now

Original Description

Hugging Face Documentation: https://huggingface.co/docs/transformers/transformers_agents Colab: https://colab.research.google.com/drive/1HGpp1OI-o_ppHi2bHZsvV6QX9k5gsTIK?usp=sharing In this video I look at HuggingFace's Transformers Agent and how it uses lessons from papers like Toolformer and ReACT along with models hosted on the HuggingFace Hub. For more tutorials on using LLMs and building Agents, check out my Patreon: Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen My Links: Linkedin: https://www.linkedin.com/in/samwitteveen/ Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 56 of 60

1 LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Sam Witteveen
2 LangChain Basics Tutorial #2 Tools and Chains
LangChain Basics Tutorial #2 Tools and Chains
Sam Witteveen
3 ChatGPT API Announcement & Code Walkthrough with LangChain
ChatGPT API Announcement & Code Walkthrough with LangChain
Sam Witteveen
4 Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Sam Witteveen
5 LangChain - Conversations with Memory (explanation & code walkthrough)
LangChain - Conversations with Memory (explanation & code walkthrough)
Sam Witteveen
6 LangChain Chat with Flan20B
LangChain Chat with Flan20B
Sam Witteveen
7 LangChain - Using Hugging Face Models locally (code walkthrough)
LangChain - Using Hugging Face Models locally (code walkthrough)
Sam Witteveen
8 PAL : Program-aided Language Models with LangChain code
PAL : Program-aided Language Models with LangChain code
Sam Witteveen
9 Building a Summarization System with LangChain and GPT-3 - Part 1
Building a Summarization System with LangChain and GPT-3 - Part 1
Sam Witteveen
10 Building a Summarization System with LangChain and GPT-3 - Part 2
Building a Summarization System with LangChain and GPT-3 - Part 2
Sam Witteveen
11 Microsoft's Visual ChatGPT using LangChain
Microsoft's Visual ChatGPT using LangChain
Sam Witteveen
12 Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Sam Witteveen
13 LangChain Agents - Joining Tools and Chains with Decisions
LangChain Agents - Joining Tools and Chains with Decisions
Sam Witteveen
14 Investigating Alpaca 7B - Finetuned LLaMa LLM
Investigating Alpaca 7B - Finetuned LLaMa LLM
Sam Witteveen
15 Comparing LLMs with LangChain
Comparing LLMs with LangChain
Sam Witteveen
16 Running Alpaca7B in Colab
Running Alpaca7B in Colab
Sam Witteveen
17 How to finetune your own Alpaca 7B
How to finetune your own Alpaca 7B
Sam Witteveen
18 How to make a custom dataset like Alpaca7B
How to make a custom dataset like Alpaca7B
Sam Witteveen
19 Understanding Constitutional AI - the paper and key concepts
Understanding Constitutional AI - the paper and key concepts
Sam Witteveen
20 Using Constitutional AI in LangChain
Using Constitutional AI in LangChain
Sam Witteveen
21 Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Sam Witteveen
22 Text-to-video-synthesis with Diffusers and Colab
Text-to-video-synthesis with Diffusers and Colab
Sam Witteveen
23 Meet Dolly the new Alpaca model
Meet Dolly the new Alpaca model
Sam Witteveen
24 Checking out the Cerebras-GPT family of models
Checking out the Cerebras-GPT family of models
Sam Witteveen
25 A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
Sam Witteveen
26 Is GPT4All your new personal ChatGPT?
Is GPT4All your new personal ChatGPT?
Sam Witteveen
27 Raven - RWKV-7B RNN's LLM Strikes Back
Raven - RWKV-7B RNN's LLM Strikes Back
Sam Witteveen
28 Talk to your CSV & Excel with LangChain
Talk to your CSV & Excel with LangChain
Sam Witteveen
29 Vicuna - 90% of ChatGPT quality by using a new dataset?
Vicuna - 90% of ChatGPT quality by using a new dataset?
Sam Witteveen
30 Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Sam Witteveen
31 Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Sam Witteveen
32 BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Sam Witteveen
33 Auto-GPT - How to Automate a Task Based AI with GPT-4
Auto-GPT - How to Automate a Task Based AI with GPT-4
Sam Witteveen
34 Improve your BabyAGI with LangChain
Improve your BabyAGI with LangChain
Sam Witteveen
35 Generative Agents - Deep Dive and GPT-4 Recreation
Generative Agents - Deep Dive and GPT-4 Recreation
Sam Witteveen
36 GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
Sam Witteveen
37 Dolly 2.0 by Databricks: Open for Business but is it  Ready to Impress!
Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!
Sam Witteveen
38 Red Pajama - Operation: Freeing LLaMA
Red Pajama - Operation: Freeing LLaMA
Sam Witteveen
39 Investigating Open Assistant - Models, Datasets and Addons
Investigating Open Assistant - Models, Datasets and Addons
Sam Witteveen
40 Investigating MiniGPT-4 - The Secret behind GPT-V?
Investigating MiniGPT-4 - The Secret behind GPT-V?
Sam Witteveen
41 Stable LM 3B - The new tiny kid on the block.
Stable LM 3B - The new tiny kid on the block.
Sam Witteveen
42 Bard can now code and put that code in Colab for you.
Bard can now code and put that code in Colab for you.
Sam Witteveen
43 Checking out Bark: a Text to Speech system by Suno AI
Checking out Bark: a Text to Speech system by Suno AI
Sam Witteveen
44 Fine-tuning LLMs with PEFT and LoRA
Fine-tuning LLMs with PEFT and LoRA
Sam Witteveen
45 Master PDF Chat with LangChain - Your essential guide to queries on documents
Master PDF Chat with LangChain - Your essential guide to queries on documents
Sam Witteveen
46 Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Sam Witteveen
47 Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Sam Witteveen
48 StableVicuna: The New King of Open ChatGPTs?
StableVicuna: The New King of Open ChatGPTs?
Sam Witteveen
49 WizardLM: Evolving Instruction Datasets to Create a Better Model
WizardLM: Evolving Instruction Datasets to Create a Better Model
Sam Witteveen
50 LaMini-LM - Mini Models Maxi Data!
LaMini-LM - Mini Models Maxi Data!
Sam Witteveen
51 Finding the Best Free ChatGPT
Finding the Best Free ChatGPT
Sam Witteveen
52 MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
Sam Witteveen
53 LangChain Retrieval QA Over Multiple Files with ChromaDB
LangChain Retrieval QA Over Multiple Files with ChromaDB
Sam Witteveen
54 LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Sam Witteveen
55 LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Sam Witteveen
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Sam Witteveen
57 StarCoder - The LLM to make you a coding star?
StarCoder - The LLM to make you a coding star?
Sam Witteveen
58 Testing Starcoder for Reasoning with PAL
Testing Starcoder for Reasoning with PAL
Sam Witteveen
59 The New Wizards - Unfiltered & Unaligned
The New Wizards - Unfiltered & Unaligned
Sam Witteveen
60 Camel + LangChain for Synthetic Data & Market Research
Camel + LangChain for Synthetic Data & Market Research
Sam Witteveen

This video teaches how to use Hugging Face's Transformers Agent, compare it to LangChain, and apply lessons from papers like Toolformer and ReACT. It provides a comprehensive understanding of LLMs and Agents.

Key Takeaways
  1. Explore Hugging Face's Transformers Agent documentation
  2. Compare Transformers Agent to LangChain
  3. Apply lessons from Toolformer and ReACT papers
  4. Host models on HuggingFace Hub
  5. Utilize Colab for implementation
💡 Transformers Agent can be a potential competitor to LangChain, offering a unique approach to building Agents using LLMs.

Related AI Lessons

Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss
Learn how to accelerate AI workflows with on-device semantic search using Moss, achieving sub-10ms response times and improving user experience
Medium · Machine Learning
Stop Guessing: Guaranteed Structured Output from LLMs in Node.js
Learn to guarantee structured output from LLMs in Node.js and stop parsing JSON manually
Dev.to · Hardik Mehta
Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)
Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications
Dev.to AI
Notes: Memory, Context, and Large Language Models (LLMs)
Learn how memory and context work in Large Language Models (LLMs) and potential improvements
Dev.to · Vladimir Panov
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →