Autogen - Microsoft's best AI Agent framework that is controllable?

AI Jason · Beginner ·🧠 Large Language Models ·2y ago

Skills: Multimodal LLMs90%Agent Foundations90%Tool Use & Function Calling80%

Key Takeaways

Microsoft's Autogen is a controllable AI agent framework that solves challenges of multi-agent systems, enabling complex applications with consistent results. It supports function calling, content generation pipelines, and research functions, making it a powerful tool for various use cases.

Full Transcript

multi-agent has been a trending topic they already purchased like meta GPT or chair Dev they explore and demonstrate query results to deliver complex software in short those sub Frameworks that allow you to create a group of agents to solve complex tasks together and if you want to learn more about multi-aging Frameworks you can check the other video I made but on the other hand there are also challenges one is very hard for users to give agent feedback so quite often only after the agents finish the whole process then I realized what they delivered is only 50 of what I want and there's no easy way for me to give feedback to let them iterate and second with those Frameworks even though you can get two ages to work together on a given task you can't really add a third or Force agents into the conversation but when we think about some real world tasks like strategy planning it normally involve more than just two persons to deliver best results however Microsoft just announced a new multi-aging framework called autojet which solves those two problems really well introduce some unique Concepts like user proxy agent as well as group chat manager so user proxy agents introduce easy way for us to Define human feedback points during the process it is basically that can talk to other assistant agents on behalf of users but when needed it can ask human for the inputs for example if the user asks agent to create a stock price chart for Tesla then it will trigger a conversation for the agent to complete this task the human can give feedback if they see something wrong and then the agent will start iterating this gives end user a lot more control in terms of plan and find output that the agent should execute and the second concept is group chat manager this is how you can coordinate multiple agents to collaborate on a given go it's almost like a chatter you can create for as many agents as you want if we take a use case of strategy panning with framework like chat Dev they allow you to introduce two agents into the conversation could be CEO and product manager but with autogen you can introduce more agents into the conversation to provide different perspective like data analysts as well as engineer so when you think about a use case like content production you can create different type of chat rooms and then connect together as a Content production Pipeline and if you're thinking about a business Consulting use case you can even create chat rooms with different domain experts in and they can chime in the conversation when they say fit as long as you define the the row of each agent variable with those two things I can actually create a very complex multi-aging applications that deliver consistent results so let's firstly install autochin on your computer I will create a folder and open it in Visual Studio code install the auto gym package create a special file called oai config list this is special file that autogen was used to import the list of open AI API Keys once you're finished let's create a new file called basic.py inside base.pi we'll create it's a most basic use case where it will simulate a conversation between a assistant and user proxy so we'll import the library first and then import open AI API key from the file we just created or created assistant agent give a name assistant as well as a user proxy agent if you remember user proxy agent is like the agents that will talk to other assistant on behalf of user and will also ask for user input when is needed it is also played the role of running the python code generated by the assistance so here I will do user proxy agent name is user proxy and code execution config is coding this basically means all the python codes generated will be saved under a folder called coding so later there will be a folder Creator here and once we Define those two agents I can trigger a conversation via user proxy agent with initial chat send a message probably chart of Nvidia and Tesla stock price our service and open Terminal python basic wi so you can see the user proxy send a message to the assistant about this instruction that we just give and then the assistant actually write the whole python code to get the latest stock data from Yahoo finance of Nvidia and Tesla as well as very detailed instruction about how they should run this app and at bottom you can see that it shows a message please provide feedback to the assistant so here I can actually type in a feedback to the agent but if I don't have any feedback I can just click enter to skip here I actually just want to run this to see the results and boom you can see that it actually generates a chart based on the real-time data but if I want I can also just close this and then give further feedback to the agent please product personally change instead of the price and click enter now you can see it actually changes charge from Dollar change to personal change instead and I can close it again and then give it more instructions now prop bad to the dollar change instead as they look better and also add Apple to the stop list if there are Arrow during the code excuse it actually goes through a self-healing process where the assistant will give some feedback about how can they fix this code and now you can see a new charger is created based on multiple iterations on my feedback when I feel like it has had been finished I can just type exit then it will refinish the conversation and the code generator would be under this coding folder so this is a very good example of even just the one basic agent with proper human input and feedback it can actually provide much better user experience and this type of user feedback is particularly useful for coding agent is there a lot of rooms of arrow in software development process so next I want to quickly showcase how can we use group chat manager to create a coding agent where it will involve three different agents the user proxy agent the coder as well as the product manager so I will create another file called code agent.py and inside the file I refers to import the libraries and again try to get API key to define the config list and here I will add a request timeout to make it bigger because sometimes during the code generation the time can be a bit longer than 60 seconds and then I would Define three different agents the user proxy agent where human input modes to be always which means it will always ask human for confirmation and a coder agent as well as a product manager agent and for the product manager agent I have a special system prompt you will help break down the initial idea into rail scope requirement for the coder and do not involve in future conversation or error fixing the reason I add this part is to making sure product managers only chime in when it's needed which is breaking down the requirements and after that I will create a group chat which involves three different agents and a group chat manager who will coordinate this conversation here in the end I'll use user proxy agent to trigger a task which is built a classic and basic Punk game with two players in Python so I will try to run python code agent or py internal as you can see here the product manager agent would first they create the requirement dot based on the punk game ID and then give it to coder then the coder will generate the game here and after that it will ask me for the human input so I can give feedback at this point but here I want to run it first so I click enter it will give me this interface that is already working but what I want to add is scoreboard so that it can see the score of each user so our closest and then give a feedback everything looks good just edit scoreboards for both users click enter so coder actually iterate the code based on the feedback I provide so I will click enter again and now you can see at top it actually has a scoreboard for each user and if I have more requirements I can just close this game and add more and last I also want to show you how can you create some complicated use case like content generation pipeline where you can create two group chats work for research and then another for Content generation I will go back to the visual studio code create a new file called content agentwy inside I will import a list of different libraries that we're going to use here you can see I actually input a list of different launching Library that's because I'm going to use a dungeon library to do some map reduce summary for the research purpose once we import those libraries we will import the API keys and because we are using both launching and autogen where it uses three lines code we will firstly create a research function which will collect as many information as possible based on the research topic and then we'll do another function for the content writing which is a group chat for Content writing purpose and in the end we will stitch together both research and content writing so let's dive into them one by one for the research function Auto June actually supports function calling so we'll create a few different functions for the agent to use firstly we'll create a search function which is using the serper API key to get search results from Google so we're defined API endpoint the body will be the query that user want to search about at the header including your API key and then try to get a response and after that we'll create a scraping function based on the URL so we'll give a header as well as a request body which is URL that we want to script we return the response into Json format and then send a post request to browserless here we are using browser-less as a scripting service and once we get the content response back we'll use beautiful soup to get the text and if the content is very big we'll trigger another function to summarize the large web page into a small one so that we can fit it within the large language model context window and inside summary function this is where we'll use the land chain so we'll display the big content into small chunks and for each Chunk we will use larger language model to do a summary and we will use a load summarize chain which is a out-of-box chain defined by Lane chain for summarization once we create these three functions we can stitch together for a research function so here is how we can define an agent with function calling in autojet so our defines a legendary model config for researcher inside I can Define the list of functions which is specifically for the Google search as well as script which is two functions that I defined above and for each one you will give the name of the function the description as well as the inputs and if you want to learn more about function coding you can also check out any other video I make and after that I will refer to the config list that we defined above and once we did that create a assistant agent called researcher it has a special system prompt research about given query collect as many information as possible and generate detailed research results with loads of technical details and all the reference link attached and also add terminate to the end of the research report the purpose of the last part is this is how we actually Define the human feedback point because we can Define the user proxy agent in a way that gives a message includes special words like terminate then the human input will be needed and we will Define it in the user proxy agent here so I'll give the name user proxy so our first Define the termination message with this part this basically means if any of the message including words terminate then it will require human input mode and you can check out their documentation for more details but basically human input mode have three different modes one is always which means every time user proxy agents receive a message it will ask for human input or it can be terminate which means it will only ask for human input when the terminate words is triggered or never and then I will also Define the function map for the search and scraping function we defined above and once we do that we can trigger a conversation between the user proxy agent and researcher for the search query and after that I will also add some special code here which will basically send a new message to the researcher give me the research report that just generated again return only the report and reference link and the purpose of this is that so that we can making sure the last message is the actual research report that we want and then return the research reports and we can quickly test out so I will trigger This research function what is Microsoft autogen before you run it making sure you add a new file dot EnV and add your open AI API key here then I will open Terminal around python content agent.py so you can see it starts searching for the results and scripting in the scraping multiple different websites to get as many information as possible after multiple rounds of research you can see the researcher actually generate a research reports with all the information that I found in this message include a word terminate which trigger this user input action and now if I click enter it should trigger the next few actions that which is asking the researcher to summarize the report again so I'll click enter and you can see that it sent out a new message give me the research reports that just generated and here will be the results that we get from calling the research function that we created above so this is working well and next we want to create a write content function which will be given two input one is a research material that we got from the research function as well as a topic that the user wants to write about and here again we will create a group chat with multiple different agents One agent is called editor Define the structure of a short blog post based on the material provided by the researcher and give it to the writer to write a blog post and the second will be a writer assist agent whose role is a professional AI blogger writing a blog post about AI based on the structure provided by the editor as well as feedback received from reviewer and after two rounds of content iteration add terminate to the end of message and then the reviewer assistant who will be basically critic the content generated by writer and give feedback after two rounds of content iteration add terminate to the end of message and in the end we will add this user epoxy agent and again we will do the same thing about adding the terminate word and create a group chat for all those agents that we just defined a book in the end trigger the conversation by user proxy agent with a message write a blog about the topic that here is a research material and we'll do the same thing together latest written content so that's pretty much it all we need to do now just call it a research function as well as a Content generation function together I will create a new large energy model config for Content assistant which have access to the two different functions that we create above one is a research and another is the right content we Define the writing assistant with a special prompt UI writing assistant you can use research function to collect latest information about given topic and then use write content function to write very well written content and we Define a user proxy agent with a function map of the two function above and in the end will trigger this conversation write a blog about autogen or Tai agent framework so let's try this out okay it will return This research report for me to review which looks pretty good so I'm going to give go ahead and click enter the writing assistant takes those research results trigger the right content function and inside the right content function that the editor will generate a structure of the content and then the writer will take the structure to write a blog post after which the reviewer will critique the content and give some suggestion for improvement and in the end the writer actually generate a really well written article based on the research results and multiple critic and reviewer also trigger the terminated word because he also thinks the content looks pretty good and no more changes needed and at this point I can either give more feedback if I want or I can just click enter to finish this task as you can see here with autogun we can create really powerful content generation pipeline he also tweaks the same workflow to the other use cases like leads gen where you have one group chat for doing these research about a given company or people and then have SDR to draft the alt Reach campaign based on those research results so this is Microsoft outrogen it's a normal use case that I didn't really cover here and you can dive into more on their GitHub repo I'm very keen to see what type of multi-aging applications that you start creating so comment below about interesting use case I will continue posting different interesting AI purchase so please consider giving me a subscribe if you enjoy this content thank you and I see you next time

Original Description

Microsoft just announced a multi agent framework called Autogen, which solved a few problems of existing agent frameworks; Let’s dive in 🔗 Links - Follow me on twitter: https://twitter.com/jasonzhou1993 - Join my AI email list: https://www.ai-jason.com/ - My discord: https://discord.gg/eZXprSaCDE - Github repo & blog: https://ai-jason.webflow.io/learning-ai/microsoft-autogen - Autogen: https://microsoft.github.io/autogen/ ⏱️ Timestamps 0:00 Intro 0:12 Challenges of existing multi agents 0:44 Microsoft Autogen 2:06 Install autogen 2:23 Use case: Stock chart gen 4:21 Use case: Build software 6:06 Use case: Content gen - research 10:11 Use case: Content gen - Write content 11:08 Use case: Content gen - Writing assistant 👋🏻 About Me My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com #autogen #metagpt #aiagents #agents #gpt #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #largelanguagemodels #largelanguagemodel #chatgpt #gpt4 #machinelearning

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Jason · AI Jason · 21 of 60

← Previous Next →

Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)

Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)

AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)

AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)

Create your own AI girlfriend that talks ❤️

Create your own AI girlfriend that talks ❤️

How to build with Langchain 10x easier | ⛓️ LangFlow & Flowise

How to build with Langchain 10x easier | ⛓️ LangFlow & Flowise

I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial

I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial

Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt

Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt

Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps

Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps

How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling

How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling

Extract data & automate EVERYTHING | 10x GPT function calling power

Extract data & automate EVERYTHING | 10x GPT function calling power

Finally, an AI agent that actually works

Finally, an AI agent that actually works

"okay, but I want GPT to perform 10x for my specific use case" - Here is how

"okay, but I want GPT to perform 10x for my specific use case" - Here is how

"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how

"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how

"How to give GPT my business knowledge?" - Knowledge embedding 101

"How to give GPT my business knowledge?" - Knowledge embedding 101

“Automation 2.0 coming…No more boring data entry job”

“Automation 2.0 coming…No more boring data entry job”

"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps

"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

"Next Level Prompts?" - 10 mins into advanced prompting

"Next Level Prompts?" - 10 mins into advanced prompting

Build AI agent workforce - Multi agent framework with MetaGPT & chatDev

Build AI agent workforce - Multi agent framework with MetaGPT & chatDev

How to scale your AI automation pipeline

How to scale your AI automation pipeline

AI agent manages community 24/7 - Build Agent workforce ep#1

AI agent manages community 24/7 - Build Agent workforce ep#1

Autogen - Microsoft's best AI Agent framework that is controllable?

Autogen - Microsoft's best AI Agent framework that is controllable?

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

AI agent + Vision = Incredible

AI agent + Vision = Incredible

After 7 days letting AI agents control my email inbox... 📮

After 7 days letting AI agents control my email inbox... 📮

How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial

How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial

What is Q* | Reinforcement learning 101 & Hypothesis

What is Q* | Reinforcement learning 101 & Hypothesis

"Research agent 3.0 - Build a group of AI researchers" - Here is how

"Research agent 3.0 - Build a group of AI researchers" - Here is how

GPT4V + Puppeteer = AI agent browse web like human? 🤖

GPT4V + Puppeteer = AI agent browse web like human? 🤖

Real Gemini demo? Rebuild with GPT4V + Whisper + TTS

Real Gemini demo? Rebuild with GPT4V + Whisper + TTS

AI Robot's ChatGPT moment at 2024?

AI Robot's ChatGPT moment at 2024?

GPT5 unlocks LLM System 2 Thinking?

GPT5 unlocks LLM System 2 Thinking?

The REAL cost of LLM (And How to reduce 78%+ of Cost)

The REAL cost of LLM (And How to reduce 78%+ of Cost)

OpenAI's Agent 2.0: Excited or Scared?

OpenAI's Agent 2.0: Excited or Scared?

Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?

Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?

INSANELY Fast AI Cold Call Agent- built w/ Groq

INSANELY Fast AI Cold Call Agent- built w/ Groq

AI Employees Outperform Human Employees?! Build a real Sales Agent

AI Employees Outperform Human Employees?! Build a real Sales Agent

Future of E-commerce?! Virtual clothing try-on agent

Future of E-commerce?! Virtual clothing try-on agent

Unlock AI Agent real power?! Long term memory & Self improving

Unlock AI Agent real power?! Long term memory & Self improving

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101

"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101

Claude 3.5 struggle too?! The $Million dollar challenge

Claude 3.5 struggle too?! The $Million dollar challenge

Make your agents 10x more reliable? Flow engineer 101

Make your agents 10x more reliable? Flow engineer 101

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

AI process thousands of videos?! - SAM2 deep dive 101

AI process thousands of videos?! - SAM2 deep dive 101

"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial

"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial

How to use Cursor AI build & deploy production app in 20 mins

How to use Cursor AI build & deploy production app in 20 mins

Best Cursor Workflow that no one talks about...

Best Cursor Workflow that no one talks about...

This is how I scrape 99% websites via LLM

This is how I scrape 99% websites via LLM

Better than Cursor? Future Agentic Coding available today

Better than Cursor? Future Agentic Coding available today

EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)

EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)

1000x Cursor workflow for building apps

1000x Cursor workflow for building apps

Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable

Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable

From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)

From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)

Deepseek R1 - The Era of Reasoning models

Deepseek R1 - The Era of Reasoning models

Yep, o3-mini is WORTH the money - Build your own reasoning agent

Yep, o3-mini is WORTH the money - Build your own reasoning agent

The ONLY way to run your own Deepseek on mobile...

The ONLY way to run your own Deepseek on mobile...

Those MCP totally 10x my Cursor workflow…

Those MCP totally 10x my Cursor workflow…

MCP = Next Big Opportunity? EASIST way to build your own MCP business

MCP = Next Big Opportunity? EASIST way to build your own MCP business

Gemini 2.0 blew me away - The future of Multimodal Model

Gemini 2.0 blew me away - The future of Multimodal Model

Microsoft's Autogen is a powerful AI agent framework that enables complex multi-agent applications with consistent results. It supports function calling, content generation pipelines, and research functions, making it a versatile tool for various use cases. By following the steps outlined in this video, viewers can learn how to use Autogen to create complex applications and integrate it with other tools and frameworks.

Key Takeaways

Install Autogen on your computer
Create a folder and open it in Visual Studio Code
Install the Autogen package
Create a special file called oai config
Create a new file called basic.py
Define a user proxy agent to handle human feedback and termination messages
Trigger a conversation between the user proxy agent and researcher for a search query
Generate a research report with reference links and trigger a user input action when a specific word is detected

💡 Autogen's ability to support function calling and content generation pipelines makes it a powerful tool for creating complex multi-agent applications with consistent results.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Multimodal LLMs

View skill →

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

AI Tool Journey

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Developer

Building Multimodal Search and RAG

Building Multimodal Search and RAG

Midjourney Trick: Consistent Character in Different Images

Midjourney Trick: Consistent Character in Different Images

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Ollama Multimodal: EASILY setup Llava locally & Integrate API

The ONLY Real Time Speech AI that can run locally!!!

The ONLY Real Time Speech AI that can run locally!!!

Related Reads

I Cut My LLM Bill 40x: A Backend Engineer's Migration Notes

Learn how a backend engineer reduced their LLM bill by 40x through migration, and apply similar strategies to optimize your own LLM costs

Dev.to · gentleforge

Routing Across Multiple LLM Providers: How an AI Gateway Works

Learn how an AI gateway enables routing across multiple LLM providers for improved reliability and scalability

Building a Vector Search Assistant: What I Learned from Module 2

Learn to build a vector search assistant and improve your Retrieval-Augmented Generation skills

The LLM Gateway & Router Index (2026)

Learn to streamline LLM integration with a gateway and router index for efficient model management

Dev.to · Srijan Paudel

Chapters (9)

Intro

0:12 Challenges of existing multi agents

0:44 Microsoft Autogen

2:06 Install autogen

2:23 Use case: Stock chart gen

4:21 Use case: Build software

6:06 Use case: Content gen - research

10:11 Use case: Content gen - Write content

11:08 Use case: Content gen - Writing assistant

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)