Image Recognition with LLaVa in Python
Key Takeaways
This video teaches image recognition with LLaVa in Python, utilizing the Ollama platform for local development and deployment. It covers the basics of setting up and using LLaVa for image labeling and recognition tasks.
Full Transcript
what is going on guys welcome back in this video today we're going to learn how to use lava or the large language and vision assistant locally in order to do easy image recognition in Python so let us get right into [Music] ited all right so we're going to learn how to use lava locally on our system to do image recognition in Python today and in order to have lava running on our system we're going to use a tool called ol you can get it by going to ama.com you can just download it here for Mac Linux and windows on Linux it's just a simple curl command and then all you have to do to get a model onto your system is you have to open up the command line and you have to type ol Lama pull and then the name of the model with a specification of the parameter size uh if applicable so you can go to models here you can scroll through the different models that we have here you can already see lava here if you don't find it you can just filter by name and type lava for example then you can click on it and you can see we have 7 billion 13 billion and 34 billion I'm going to go with a 13 billion because I think this one is too large for my system and this one is less capable so I'm going to go with the uh middle here and basically all you have to do is you have to say AMA pull and then lava and in my case now colon 13 billion so this will then um pull the model onto your system and then that's basically all you need to do in order to have lava on your system the rest is now just using the olama package in Python to communicate with that model um and to to basically provide it with some images and some text and then get a response from it so that's all going to be done with the python package for this we say pip 3 install o Lama and then we can go right into the coding now for this video I've prepared four copyright free images image one 2 three and four and these are the images that we're going to provide to Lava and then we're going to ask certain questions about them like how many docks do you see in this image or what do you see here or uh maybe I mean in this case probably we're not going to ask about the programming language because the code is not very readable but uh chances are we're not always going to get the perfect responses this is not like a massively powerful model but it does a quite decent job at recognizing what is in the image and describing it somewhat decently so we're going to start by saying import o Lama and to send a basic request what we're going to do is we're going to just say response equals ol Lama chat and here now we're going to provide first of all the model that we want to use in our case this is going to be Lava 13 billion uh of course if you used 7 billion you have to provide 7 billion here and then we're going to provide a message history we're going to say messages is equal to a list and we're going to say here R is going to be user then the content is going to be the text prompt something like describe this image and then the third part here is going to be images and the images are going to just be um a collection of paths to the images so in my case now I'm just going to provide here uh Point slash image one. JPEG and that is basically my prompt so we're asking the model to describe the image we're passing the image and that's literally all we have to do to get the image recognition going so we can print then response and from the response we want to get the message field or the message key value pair and then we want to get from this the content of the message so that we get the text response of the model so I can run this now and we're hopefully going to see a decent description of this image here it should be something like a field of crop or I don't know uh now we have server disconnected without response all right now there seems to be some issue with me using the 13 billion parameter model and recording at the same time it works when I'm not recording but if I start using it and then start recording the recording doesn't even start it seems to be some issue with the GPU uh capacity so I'm going to just remove the 13 billion here I'm going to just use lava which is the 7 billion model uh of course this needs to be pulled separately so AMA pull Lo Jaa and uh then this should work so let's see if this describes the image accurately it should say something like oh there you go the image shows a Serene rural scene in the foreground there's a field of golden yellow crops that have likely been harvested or are in the process of ripening okay very detailed here uh definitely it recognizes the the core of the image so let's go ahead and try something else let's go and say we want to use image 2 and uh image 2 is basically a laptop so let's see if it gives us some information about the programming language I don't think that's too easy I think this is HTML though because we can see the tags here um the image is a photograph featuring an individual working on a laptop appears to be typing on the screen there's a visual representation of a graph or chart displaying various data points okay that's not true this is now hallucination so it doesn't even recognize that this is code not even just a programming language it doesn't recognize that this is coding at all um but it recognizes that the table has a marble pattern I think that's correct above the table there's a surface uh yeah I don't know if that's true let's maybe try again and see if we get something else if it recognizes that this is coding uh laptop computer working on a laptop displaying financial data chart or dashboard no not not quite let's try a third time oh there you go uh displaying code with color synta syntax highlighting the language um uses the track pad okay this is pretty good so the third attempt was quite accurate all right so let's try now also the third one and then I want to do something uh specific for the fourth one so the third one is just the pool uh depicts an indor pool area rectangular with light blue color okay this works fine now for the fourth image I want to now count the number of docks in here and you can see we have two dogs and two cats so um uh I'm curious to see if this works so let's see if we say uh how many let me just see what the prompt is I prepared here how many dogs are in this image and then we're going to go with image 4 there are three docks in the image okay that's not quite correct there are three dogs in this image now this is now the difference I think between 7 billion and 13 billion because when I tested this before recording the video with 13 billion it almost always recognized that there are two dogs in the image so what I'm going to try to do now maybe it will crash the recording is I'm going to try to just see if I get uh if I can get the 13 billion uh model running if not we're just going to accept it but with the 30 billion parameter model it was able to recognize two docks and not more than two docks most of the time so I'm going to try again probably this is going to crash because yeah because I'm recording but you can try at home if you have enough vram and enough RAM but something that we can do besides that is we can ask it to to give us keywords okay it doesn't work so yeah unfortunately but but it did work with a 13 billion parameter model I was able to get two as the default answer now sometimes it would say one or three but most of the time it would say two um now what we can do however for any picture is we can use this to automate some process like hashtags or keywords of the image so we can say something like provide five keywords uh describing the image separated by commas now the 13 billion parameter model that de quite well let's see if we can do the same thing with the 7 billion parameter model there you go dog cat pet cute animals this is pretty good now let's see what happens if I do the same thing on image three pool swimming pool indoor pool gym recreation great let's do it for image two laptop programming Workman okay these are just four so it failed here let's try again technology computer coding workspace again just four now we get five okay and they're actually quite decent um now maybe we can try to just see how capable the model is so we can say something like come on why can I not there you go what programming language is displayed on the laptop now it's probably going to say python because python is the most default answer for everything uh I tried this a couple of times there you go it says python I think python is just a default answer every time you ask for programming language unless it really knows um not clear enough to confidently identify I mean this is a good response to be honest but yeah most of the time it will say python even though it's pretty clearly HTML I think seems so PHP HTML something like this all right but this is how you can use lava locally on your system uh with AMA and Alo on a server if you deploy it um to easily automate these procedures if you have a bunch of images that you want to annotate you want to add some labels to them you want to add keywords to them you want to use hashtags or something for Instagram you can do that easily with a local model like lava and if you have a more uh powerful Hardware than I have or if you're not recording you can probably also use more complex models and this will lead to better results so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a com comment in the comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye
Original Description
In this video we learn how to easily do image recognition and labeling in Python using LLaVa and Ollama locally.
Ollama: https://ollama.com
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: https://www.neuralnine.com/books/
💻 The Algorithm Bible Book: https://www.neuralnine.com/books/
👕 Programming Merch: https://www.neuralnine.com/shop
💼 Services 💼
💻 Freelancing & Tutoring: https://www.neuralnine.com/services
🌐 Social Media & Contact 🌐
📱 Website: https://www.neuralnine.com/
📷 Instagram: https://www.instagram.com/neuralnine
🐦 Twitter: https://twitter.com/neuralnine
🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/
📁 GitHub: https://github.com/NeuralNine
🎙 Discord: https://discord.gg/JU4xr8U3dm
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from NeuralNine · NeuralNine · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
Python Beginner Tutorial #5 - Loops
NeuralNine
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
Python Beginner Tutorial #7 - Functions
NeuralNine
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
Python Beginner Tutorial #9 - File Operations
NeuralNine
Python Beginner Tutorial #10 - String Functions
NeuralNine
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
Python Intermediate Tutorial #6 - Queues
NeuralNine
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
Python Intermediate Tutorial #9 - Recursion
NeuralNine
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
Python Intermediate Tutorial #11 - Logging
NeuralNine
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
Python Machine Learning #4 - Support Vector Machines
NeuralNine
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
Making Text Images Readable Again with Python and OpenCV
NeuralNine
Neural Networks Simply Explained (Theory)
NeuralNine
Motion Filtering with OpenCV in Python
NeuralNine
Top 5 Programming Languages To Learn in 2020
NeuralNine
Simple TCP Chat Room in Python
NeuralNine
Image Classification with Neural Networks in Python
NeuralNine
Edge Detection with OpenCV in Python
NeuralNine
S&P 500 Web Scraping with Python
NeuralNine
Simple Sentiment Text Analysis in Python
NeuralNine
Introduction - Algorithms & Data Structures #1
NeuralNine
More on: CV Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Python
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Deep Learning
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Cybersecurity
Your Face Is About to Become Your Phone Number
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI