This AI Agent Extracts Text From Images in n8n

Nate Herk | AI Automation · Beginner ·🛠️ AI Tools & Apps ·1y ago

Key Takeaways

The video demonstrates building a simple AI-powered Invoice Agent using n8n with no code, utilizing an OCR API to extract text from images and integrating with Telegram and Google Drive.

Full Transcript

we got telegram open I'm just going to drag over an invoice here drop it in to telegram hit send and now we will see telegram's going to download that file we're hitting an OCR API which is going to take the image extract the text it's going to parse it out it updates our database with our invoices and now we're redownloading that file so we can put it into a Google Drive so we can access that original file if we want and now we're hitting the agent which we just saw we got a response thanks for submitting the invoice the total amount is 950 due date is December 15th it gives us some notes so please make the payment without the due within the due dat to avoid late fees it gives us some contact information um it tells us about the penalties and then it says that the original invoice has been added to the Google drive here is the file name and then you can also access the database here so we're going to open up that database we will see that this invoice got populated into our database here and then if we go to our Google Drive we can see that we got the actual invoice put in here the original file so in case we need to look at it we're good to go so that is what's going on in this workflow let's dive into what's going on within each node okay so let's dive into what's going on within this invoice agent as you saw in the demo we're talking to it through telegram so right here is where we send an image file of an invoice and then it will download the file send it off to an OCR to analyze the image extract the text we're then parsing the text putting it into our database we're redownloading the file because um and then's kind of weird about the way that binary information can pass through so we'll talk about that but then we're downloading that file so we can put it into our Google drive folder of all of our invoices and then we're pretty much just giving the agent the information it needs in order to write a response to us back in telegram of course so let's um walk through a live example where we actually trigger this workflow and we'll go through no by node and we'll watch the data move through each step all right I'm going to be using the same example invoice from the demo so we'll hit test step we will hop into our Telegram and just drag in the sample invoice hit send and now we have the information coming back so what's important to understand here is that we're looking for um a file ID and as you can see there are three different file IDs coming through um and you can see there's different there different sizes and different sort of dimensions so what we did was we grabbed the largest one pretty sure and that's the one we wanted to download just so it's the best quality um we downloaded document. file ID and if we hit test step we can see that we're downloading that file and now let's say we wanted to download this to make sure it's coming through correctly we can see this is the sample invoice that we're using um we got the number the date the amount the address the due date and then some notes and we're not giving the agent all of this information because obviously if we saw the invoice and we sent a picture we don't want a full summary of exactly all this we kind of maybe just want the date the due date the amount and then the notes that's how we have it set up but obviously you can change however you want the agent to get your stuff back anyways once we download that file as you saw it was um comes through as binary so all we're getting here is the file ID and um the path and we're actually really looking for the binary so that we can pass it into OCR an OCR to download this information so I am using a free OCR it's only limits really are that you can't have a file that's bigger than 1 Megabyte in this case we're fine um this is what it looks like um I'll put a link for this specific OCR that I used in the description OCR stands for optimal character recognition and it's basically just a better way of getting text from an image obviously this is a free one so it's probably not the highest quality but um if you were to use like an open AI um analyze an image node it's it's just not really great yet so you have this option here which can do some basic stuff which is cool to play with of course but um if you really need to get in you know like complicated PDFs and stuff like that you probably want to use a third party um OCR so anyways we're setting up this request here of course you will um you put in the endpoint you need to put in your API key and then I pretty much just told it that we're loading in an nnn binary file and then in the binary file um we're just looking for the in input field called data which is right here um so yeah that's pretty much it we'll hit test step and then while this is running I just wanted to say um this workflow that you see right here will be available for download in the free school Community Link for that will be down in the description you'll just download the workflow and then you can import it straight into your NAD you'll have it up and running right away um and then if you're looking to go a little bit deeper you want to go into some deep Dives some hands- on learning some real projects insights then check out my paid Community Link for that will also be down in the description and then if you want me to sort of build out some stuff like that or you're looking for some consulting services for your business then check out the website um my website in the description as well and you can book in a call anywayss so what we see here is we take the binary the file the jpeg file and then once it goes through this HTTP request to this OCR we're getting the actual parse text back so um let me just make this a little bigger can I not expand this okay anyways um parse text so we're getting the invoice number the date the amount billing address we're getting all the information back um pretty much as we saw it right here so that's what we're getting back and from here we're just parsing information so I had chat gbt write this code um I didn't write this at all of course and all it's taking is it's taking this input and then it's just going to get us out the different fields for um you know as you can see invoice number invoice date total amount billing address due date and notes so that's all that's happening in this code node we're just parsing out the text and we want them in separate Fields so that in our Google Sheets node or if you're going to use air table or base row anything like that we can just specify what columns we want what data we want to go to column so um in the example as you can see in our Google Sheets we have the columns invoice date invoice number date total amount all this kind of stuff and then if we are to um hit test step and pop back into our database we will see that information will populate through because we were able to configure you know which information we want to go to which column so that's all that's happening here as you can see we set up that kind of stuff right in this node and then from there this is what I was talking about where we wanted to have the original file that we sent over in our telegram that got downloaded right here we want this file to be put into our Google Drive of invoices and we can see all the different invoices in case we want to just go reference the original image so when we do this we need to give the Google Drive node the binary that we want to actually be downloaded so it's a little bit weird in NN when you have a binary file like we see right here here it's hard to reference that back later because you can see we're getting binary but in the schema we don't see the binary anywhere in the table we don't see the binary anywhere so it's hard to later reference it it pretty much has to be you know the binary has to be the node that feeds straight into the next node that needs to use that input field called data which is binary so if we run this again all that we're doing here is we're just redownloading the file um we're referencing the node that we did earlier which was over here and we're just referencing the file ID that originally gave it so these nodes are practically identical so we're downloading the file and then when we hop into this Google Drive node we have the binary right here to actually reference again so all we're doing here is we're setting up the file name which I hardcoded in invoice square brackets and then I used a formula to just say we want today's date um so we have invoice December 5th 2024 and I played around with the formatting here um actually I'll keep it in here but if we went to if we went in here right and Weck changed the format so we did this oh okay um I have an extra anyways let's just do this again so we have we have now right dollar sign now gives us the date and we don't want the invoice to come in like that so if we do format we get it like this we have the four digigit year with the months and with the dates and so all I did was I took away the year because I don't want that in the front I came back and put the year in the end and then I also wanted the month to be an actual the full name rather than 12 so if I do one more M it goes to the shortened full name which is des for December and if I put one more M it goes to the full December 5th 2024 so that's all I did there and then we just have to link to what um folder we want it's in my drive and we're uploading it to the folder called invoices so we hit test step here this is just extracting the not extracting it's just downloading the file so it can put it in and then this is basically a success message that says that we got it you know the image type is a Json or jpeg it gives us the link to that folder and then if we come in here we should see we just got the second one um at 1112 so if we click on it that's the original invoice okay and then from there we're pretty much done so in the set node all we're doing is we're giving the agent the invoice information and the file name so in here we have um the results from the OCR so we have the text that we extracted and then in here we're just giving it the json. name which came from our um file Google drive file thing and then here we're just giving it the actual name of the file that we just created which is invoice with the date um and maybe you want to play around with that in case you're uploading multiple invoices in the same day so but this is just an example so we're giving it this information so that the agent now has invoice information the file name and a link to the invoice database and then we're telling it in here basically you know this prompt will be included in the workflow if you choose to download it but we're giving it the it's Ro it's a very simple ro we're giving it an example input and then giving it an example output so the way we want it to be structured we want it to be readable we don't want it to Output all the information it's getting just kind of a high level summary so we'll hit test step here we will see that we are getting um thanks for submitting the invoice the total amount is this the due day is this and then some notes and then once again the link so from there we're just feeding it into telegram as you can see we just got this message back it's very clean it has um line breaks it gives us the link to the database and then um yeah that's pretty much all it is all we're doing is we have to configure in here the chat ID so from the telegram trigger let me just go into this real quick from the telegram trigger we're getting some stuff back and if you see right here we have a chat ID this basically is just the identifier of this channel that we're texting the telegram bot in so that's why we need to make sure that we're responding to that chat ID we give it the text which is just the output from the agent and then um obviously with our window buffer memory we're also giving it you know normally when you set up window buffer memory it's going to do take from previous note automatically but we wanted to Define this below and once again we are doing the chat ID so this key right here is the exact same key that we're going into here for the chat ID responding to the telegram agent so I know this one was a quick simple build but hopefully the workflow here has sparked some ideas for you if you choose to download the file um you know maybe you can play around with ways you can expand off the build make it a little more production ready one thing I wanted to mention real quick is I if you notice this was as a tools agent even though it has no tools simply because I started this off with a conversational agent but it wasn't completely understanding my prompting the way I wanted it to um it wasn't outputting the information the way I want it to and I switched it to a tools agent I gave it the same prompt and it just seemed to work a little better so that's the only reason I did that there but if you guys enjoyed this one please give it a like definitely helps me out um let me know in the comments what else you want to see and um hopefully I'll see you guys in the school Community or on some of the live calls in the pay community so thanks guys

Original Description

📌 Join my free Skool community for access to the workflow seen in this video! 👇 https://www.skool.com/ai-automation-society/about 🌟 Join my paid Skool community if you want to go deeper with n8n and AI Automations👇 https://www.skool.com/ai-automation-society-plus/about 🚧 Start Building with n8n! (I get kickback if you sign up here - thank you!) https://n8n.partnerlinks.io/22crlu8afq5r 🔎The OCR API I used in this video: https://ocr.space/OCRAPI In this video, I showcase how I built a simple AI-powered Invoice Agent using n8n with no code! I walk through creating a workflow that takes an invoice image sent via Telegram, uses an OCR (Optical Character Recognition) API to extract text, populates the information into a database, and uploads the original image to Google Drive. The agent then provides a quick summary of the invoice, along with the file name and a database link. If you're passionate about no-code AI automations, don’t forget to like, comment, and subscribe for more content like this! Your support helps me keep bringing you valuable tutorials and tips. 🚀 Business Inquiries: 📧 nateherk@uppitai.com WATCH NEXT: https://youtu.be/u2Tuu02r7QI TIMESTAMPS 00:00 Demo 00:58 How This Works 01:39 Downloading the Image 02:50 OCR API 04:57 Updating Invoice Database 05:55 Adding Original Image to Google Drive 08:40 Giving Agent Data 09:15 Configuring Invoice Agent 09:55 Configuring Telegram Response Gear I Used: Camera: Razer Kiyo Pro Microphone: HyperX SoloCast Background Music: https://www.youtube.com/watch?v=Q7HjxOAU5Kc&t=0s Don't forget to like, subscribe, and hit the notification bell to stay updated with my latest videos on AI agents and automations!
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Nate Herk | AI Automation · Nate Herk | AI Automation · 33 of 60

1 How I Wish Someone Explained AI Agents To Me (as a beginner)
How I Wish Someone Explained AI Agents To Me (as a beginner)
Nate Herk | AI Automation
2 How to Create an AI Email Agent with n8n (No Code, Step-by-Step Tutorial)
How to Create an AI Email Agent with n8n (No Code, Step-by-Step Tutorial)
Nate Herk | AI Automation
3 How to Create an RAG Chatbot AI Agent with n8n (No Code, Step-by-Step Tutorial)
How to Create an RAG Chatbot AI Agent with n8n (No Code, Step-by-Step Tutorial)
Nate Herk | AI Automation
4 Build your first NO CODE AI Agent in n8n (for beginners)
Build your first NO CODE AI Agent in n8n (for beginners)
Nate Herk | AI Automation
5 *LIVE BUILD* Personalized Outreach AI Agent in n8n (No Code)
*LIVE BUILD* Personalized Outreach AI Agent in n8n (No Code)
Nate Herk | AI Automation
6 *LIVE BUILD* Inbox Management AI Agent with n8n (NO CODE, Step-by-Step Tutorial)
*LIVE BUILD* Inbox Management AI Agent with n8n (NO CODE, Step-by-Step Tutorial)
Nate Herk | AI Automation
7 How to Build a Google Scraping AI Agent with n8n (Step By Step Tutorial)
How to Build a Google Scraping AI Agent with n8n (Step By Step Tutorial)
Nate Herk | AI Automation
8 How to Build a Client Onboarding AI Agent with n8n (Step-by-Step Tutorial, No Code)
How to Build a Client Onboarding AI Agent with n8n (Step-by-Step Tutorial, No Code)
Nate Herk | AI Automation
9 I Built a Personal Assistant AI Agent with No Code in n8n
I Built a Personal Assistant AI Agent with No Code in n8n
Nate Herk | AI Automation
10 Build a No-Code AI Chatbot (Step-by-Step Tutorial)
Build a No-Code AI Chatbot (Step-by-Step Tutorial)
Nate Herk | AI Automation
11 I Built an AI Agent that Automated my Inbox with n8n (No Code)
I Built an AI Agent that Automated my Inbox with n8n (No Code)
Nate Herk | AI Automation
12 Step-By-Step: Add 100+ Files to Pinecone for RAG AI Agent with n8n
Step-By-Step: Add 100+ Files to Pinecone for RAG AI Agent with n8n
Nate Herk | AI Automation
13 n8n Masterclass: Build AI Agents & Automate Workflows (Beginner to Pro)
n8n Masterclass: Build AI Agents & Automate Workflows (Beginner to Pro)
Nate Herk | AI Automation
14 Scrape Google for LinkedIn Profiles in Seconds with n8n
Scrape Google for LinkedIn Profiles in Seconds with n8n
Nate Herk | AI Automation
15 Step By Step: Automating Lead Nurturing with No Code in n8n
Step By Step: Automating Lead Nurturing with No Code in n8n
Nate Herk | AI Automation
16 n8n AI Agent Masterclass | AI Nodes Made Simple
n8n AI Agent Masterclass | AI Nodes Made Simple
Nate Herk | AI Automation
17 AI Personal Assistant 2.0 | This Agent Calls Other Agents (No Code) in n8n
AI Personal Assistant 2.0 | This Agent Calls Other Agents (No Code) in n8n
Nate Herk | AI Automation
18 The Best Way to Give AI Agents Tools in n8n
The Best Way to Give AI Agents Tools in n8n
Nate Herk | AI Automation
19 I Scraped, Researched, and Created Outreach for 16,846 Leads using Godmode HQ
I Scraped, Researched, and Created Outreach for 16,846 Leads using Godmode HQ
Nate Herk | AI Automation
20 AI Agent Prompting Masterclass: Beginner to Advanced
AI Agent Prompting Masterclass: Beginner to Advanced
Nate Herk | AI Automation
21 How to Build an AI Slack Assistant in 5 Minutes (Chatbase)
How to Build an AI Slack Assistant in 5 Minutes (Chatbase)
Nate Herk | AI Automation
22 Step by Step: Scrape UNLIMITED Emails for FREE with n8n
Step by Step: Scrape UNLIMITED Emails for FREE with n8n
Nate Herk | AI Automation
23 Chains vs AI Agents in n8n #artificialintelligence #shorts
Chains vs AI Agents in n8n #artificialintelligence #shorts
Nate Herk | AI Automation
24 Step by Step: RAG AI Agents Got Even Better
Step by Step: RAG AI Agents Got Even Better
Nate Herk | AI Automation
25 n8n vs Make.com #artificialintelligence #coding #agentgpt #techtok
n8n vs Make.com #artificialintelligence #coding #agentgpt #techtok
Nate Herk | AI Automation
26 How to Build a Personal Assistant AI Agent in n8n (Step-by-Step, No Code)
How to Build a Personal Assistant AI Agent in n8n (Step-by-Step, No Code)
Nate Herk | AI Automation
27 Personal Assistant AI Agent in n8n  #n8n #coding #agentgpt #artificialintelligence
Personal Assistant AI Agent in n8n #n8n #coding #agentgpt #artificialintelligence
Nate Herk | AI Automation
28 Set up Google Credentials in n8n in 5 minutes (2025)
Set up Google Credentials in n8n in 5 minutes (2025)
Nate Herk | AI Automation
29 5 n8n Tips You NEED to Know
5 n8n Tips You NEED to Know
Nate Herk | AI Automation
30 Build this Multi AI Agent System for Research and Content Creation in n8n
Build this Multi AI Agent System for Research and Content Creation in n8n
Nate Herk | AI Automation
31 Vector Database Optimization with n8n: Metadata, Text Splitting, & Embeddings
Vector Database Optimization with n8n: Metadata, Text Splitting, & Embeddings
Nate Herk | AI Automation
32 Are you doing these things to optimize your Vector Database?  #artificialintelligence #n8n
Are you doing these things to optimize your Vector Database? #artificialintelligence #n8n
Nate Herk | AI Automation
This AI Agent Extracts Text From Images in n8n
This AI Agent Extracts Text From Images in n8n
Nate Herk | AI Automation
34 This Invoice Agent Analyzes Images in n8n  #techtok #agentgpt #artificialintelligence #n8n
This Invoice Agent Analyzes Images in n8n #techtok #agentgpt #artificialintelligence #n8n
Nate Herk | AI Automation
35 The Best RAG System On YouTube (Steal This!)
The Best RAG System On YouTube (Steal This!)
Nate Herk | AI Automation
36 RAG System 2.0 | Effortless RAG in n8n  #artificialintelligence #n8n #aiagent #RAG
RAG System 2.0 | Effortless RAG in n8n #artificialintelligence #n8n #aiagent #RAG
Nate Herk | AI Automation
37 Understanding APIs in n8n (as a beginner)
Understanding APIs in n8n (as a beginner)
Nate Herk | AI Automation
38 Understanding APIs in n8n #n8n #artificialintelligence #api
Understanding APIs in n8n #n8n #artificialintelligence #api
Nate Herk | AI Automation
39 How I Built an AI Agent to Automate my Emails in n8n (Step by Step, No Code)
How I Built an AI Agent to Automate my Emails in n8n (Step by Step, No Code)
Nate Herk | AI Automation
40 This AI Agent automates my customer support emails. #n8n #aiagent #artificialintelligence
This AI Agent automates my customer support emails. #n8n #aiagent #artificialintelligence
Nate Herk | AI Automation
41 Everything I Learned About AI Agents in 2024 in 19 Minutes
Everything I Learned About AI Agents in 2024 in 19 Minutes
Nate Herk | AI Automation
42 Build AI Agents for $0.014 with DeepSeek V3 in n8n
Build AI Agents for $0.014 with DeepSeek V3 in n8n
Nate Herk | AI Automation
43 Having an Actual Conversation with Data Using an ElevenLabs Voice Agent and n8n
Having an Actual Conversation with Data Using an ElevenLabs Voice Agent and n8n
Nate Herk | AI Automation
44 Having an ACTUAL conversation with my data using ElevenLabs Voice Agent #aiagent #elevenlabs
Having an ACTUAL conversation with my data using ElevenLabs Voice Agent #aiagent #elevenlabs
Nate Herk | AI Automation
45 ElevenLabs Voice Agents Are So Easy to Build (No Code!)
ElevenLabs Voice Agents Are So Easy to Build (No Code!)
Nate Herk | AI Automation
46 How I'd Teach a 10 Year Old to Build AI Agents (No Code, n8n)
How I'd Teach a 10 Year Old to Build AI Agents (No Code, n8n)
Nate Herk | AI Automation
47 How I Built A Technical Analyst AI Agent in n8n With No Code
How I Built A Technical Analyst AI Agent in n8n With No Code
Nate Herk | AI Automation
48 This AI Agent Analyzes Stock Indicators! #n8n #artificialintelligence  #coding #agentgpt #techtok
This AI Agent Analyzes Stock Indicators! #n8n #artificialintelligence #coding #agentgpt #techtok
Nate Herk | AI Automation
49 I Built a Team of Research Agents for Newsletter Automation in n8n (No Code)
I Built a Team of Research Agents for Newsletter Automation in n8n (No Code)
Nate Herk | AI Automation
50 This Team of AI Research Agents Automated My Newsletters! #n8n #artificialintelligence #aiagent
This Team of AI Research Agents Automated My Newsletters! #n8n #artificialintelligence #aiagent
Nate Herk | AI Automation
51 The Ultimate n8n Starter Kit (2025) (Free)
The Ultimate n8n Starter Kit (2025) (Free)
Nate Herk | AI Automation
52 Two Ways to Save 96% of Your Money Using DeepSeek R1 in n8n
Two Ways to Save 96% of Your Money Using DeepSeek R1 in n8n
Nate Herk | AI Automation
53 How to Actually Build Agents with DeepSeek R1 in n8n (Without OpenRouter)
How to Actually Build Agents with DeepSeek R1 in n8n (Without OpenRouter)
Nate Herk | AI Automation
54 This Voice Agent Sends Emails for You #artificialintelligence #n8n #aiagent  #coding #agentgpt
This Voice Agent Sends Emails for You #artificialintelligence #n8n #aiagent #coding #agentgpt
Nate Herk | AI Automation
55 Best Model for RAG? GPT-4o vs Claude 3.5 vs Gemini Flash 2.0 (n8n Experiment Results)
Best Model for RAG? GPT-4o vs Claude 3.5 vs Gemini Flash 2.0 (n8n Experiment Results)
Nate Herk | AI Automation
56 How to Locally Host DeepSeek R1 for FREE in Under 10 Minutes in n8n
How to Locally Host DeepSeek R1 for FREE in Under 10 Minutes in n8n
Nate Herk | AI Automation
57 OpenAI Fires Back at DeepSeek With a New Reasoning Model: o3-mini (n8n AI Agent)
OpenAI Fires Back at DeepSeek With a New Reasoning Model: o3-mini (n8n AI Agent)
Nate Herk | AI Automation
58 Run DeepSeek R1 Locally in Under a Minute  #coding #artificialintelligence #n8n #deepseek
Run DeepSeek R1 Locally in Under a Minute #coding #artificialintelligence #n8n #deepseek
Nate Herk | AI Automation
59 I Built the Ultimate Team of AI Agents in n8n With No Code (Free Template)
I Built the Ultimate Team of AI Agents in n8n With No Code (Free Template)
Nate Herk | AI Automation
60 I Built the Ultimate Team of Agents in n8n  #artificialintelligence #n8n #agentgpt  #techtok #coding
I Built the Ultimate Team of Agents in n8n #artificialintelligence #n8n #agentgpt #techtok #coding
Nate Herk | AI Automation

This video showcases building a simple AI-powered Invoice Agent using n8n with no code, leveraging an OCR API to extract text from images and integrating with Telegram and Google Drive. The agent populates the extracted information into a database and uploads the original image to Google Drive. Viewers can learn how to create a similar workflow and automate invoice processing using AI tools.

Key Takeaways
  1. Create a new workflow in n8n
  2. Download an invoice image sent via Telegram
  3. Use an OCR API to extract text from the image
  4. Populate the extracted information into a database
  5. Upload the original image to Google Drive
  6. Configure the agent to provide a quick summary of the invoice
💡 The video highlights the potential of no-code AI automations in streamlining business processes, such as invoice processing, by leveraging workflows and integrating AI tools.

Related AI Lessons

How to Create a Second Version of Yourself Inside Obsidian Using AI (Step-by-Step Guide)
Learn to create a second version of yourself inside Obsidian using AI with a step-by-step guide
Medium · ChatGPT
How to prepare for Spain civil service TIC exam using AI in 2026
Learn how to prepare for the Spain civil service TIC exam using AI in 2026, boosting your chances of success with technology-driven study techniques
Dev.to · David García
Going Viral! How I Created AI Kissing Videos Step by Step Easily Using AIAI.com
Create viral AI kissing videos using AIAI.com in a step-by-step process, leveraging AI technology for creative content creation
Medium · AI
How to prepare TIC teacher exams in Spain with AI (oposiciones 2026)
Prepare for TIC teacher exams in Spain using AI with these actionable steps
Dev.to AI

Chapters (9)

Demo
0:58 How This Works
1:39 Downloading the Image
2:50 OCR API
4:57 Updating Invoice Database
5:55 Adding Original Image to Google Drive
8:40 Giving Agent Data
9:15 Configuring Invoice Agent
9:55 Configuring Telegram Response
Up next
Low-Tech, High-Impact: Replacing Your Receptionist With a $15 AI Phone System
Maximum Lawyer
Watch →