Sentiment Analysis in 4 Minutes
Key Takeaways
Sentiment analysis using Python and scikit-learn, with a focus on natural language processing and machine learning.
Full Transcript
hello world welcome to sjy today we're going to be building sentiment analysis in 4 minutes let's get started sentiment analysis is the process of determining the opinion or feeling of a piece of text we humans are pretty good at this I can look at this tweet and immediately know that it's negative it feels like the writer sentiment is one of anger and disgust due to the negative wording companies across the world have implemented machine learning to do this automatically it's super useful for gaining insight into customer opinions once you understand how the customer feels after analyzing their comments or reviews you can identify what they like and dislike and build things like recommendation systems or more targeted marketing campaigns for them in this demo we're going to be building a sentiment analysis program in Python that will identify whether a movie review is positive or negative based on the text in the review we'll get our training and testing data a bunch of labeled reviews from a site called kaggle we'll start off by importing our dependencies we'll import the op operating system module to help us perform commandline functions then we'll want to import the scit learn module which is a machine learning library in Python with a fast learning curve then we'll import a helper class that will help us clean our data pandas helps us read our data CSV files and nltk will be used to remove unnecessary words from our data set all right so step one is to just read the data from our hard disk we'll import the label training data and the testing data then we'll print out the first review to the command line to ensure we read the data set Cor correctly once we've read in our data step two is to clean it that means ensure that we remove all the HTML non-letters and stop words stop wordss are words that are insignificant we can download them from the nltk or natural language toolkit Library words like the or to or as since it's hard to analyze emotion from them we'll iterate over every review in our training data set and fill our new clean review array with the cleaned reviews our helper class will do the cleaning for us step three is to create a bag of words the bag of words model is a simple numeric representation of a piece of text that is easy to classify we just count the frequency of each word in a piece of text and create a dictionary of them this is called tokenization in natural language processing we'll use the count vectorizer object in the pyit learn package to create it we'll set the Max features to 5,000 to keep things simple so our bag of words will contain at Max 5,000 words and their Associated frequencies then we use the fit transform method to fit model to the bag of words and create the feature vectors we can then store the feature vectors in an array step four is to create the classifier a classifier is a machine learning model that will be used to classify whether a piece of text is positive or negative in this example our classifier is a random Forest consisting of 100 trees a random Forest is a set of decision trees decision trees are graphs that model the possibilities of certain outcomes so let's say a piece of text has a word hate appear more than 20 times the probability that it's negative could be something like 80% then based on other word frequencies we increase or decrease that probability accordingly until we get to the leaf of the tree which will be a positive or negative rating this is different from a standard regression classifier where if a data point is on a certain side of the line of best fit we can easily classify it a random Forest tree is more like a series of lines one for every tree that segments are possibilities once we've mapped all the lines onto the graph and we plot a new data point or review based on its coordinates we can then classify based on whether it's in a positive or negative space it's time to test our classifier on our testing data so let's format the test data by cleaning the reviews and creating a bag of words once we have our feature vectors for test data we can move on to the last step the last step is for our program to correctly classify the reviews in the testing data set as positive or negative we use our random Forest to make a prediction we'll then take the result and write it to a new CSV file that's it let's run our program and see what happens okay it printed out the first review that means it's correctly reading our data set then it's going to clean and parse the training set create the bag of words train the classifier then predict the test labels awesome Let's test the first three predictions where one is positive and zero is negative let's see the first three are 1 zero and one so positive negative and positive let's skim these it's truly a masterpiece positive it's so awful that once you know Okay negative awesome looks like it's performing sentiment analysis like a charm sentiment analysis is still an evolving field of machine learning there's so many grammatical nuances and misspellings and slangs involved in human language that we haven't really taken into account but we can with more powerful algorithms so check out the links in the description below for more information and please please subscribe for more technology videos there's so much I want to make thanks for watching
Original Description
Link to the full Kaggle tutorial w/ code: https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-1-for-beginners-bag-of-words
Sentiment Analysis in 5 lines of code:
http://blog.dato.com/sentiment-analysis-in-five-lines-of-python
I created a Slack channel for us, sign up here:
https://wizards.herokuapp.com/
The Stanford Natural Language Processing course: https://class.coursera.org/nlp/lecture
Cool API for sentiment analysis: http://www.alchemyapi.com/products/alchemylanguage/sentiment-analysis
I recently created a Patreon page. If you like my videos, feel free to help support my effort here!:
https://www.patreon.com/user?ty=h&u=3191693
Follow me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/
Signup for my newsletter for exciting updates in the field of AI:
https://goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Siraj Raval · Siraj Raval · 13 of 60
1
2
3
4
5
6
7
8
9
10
11
12
▶
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
What is Bitcoin?
Siraj Raval
5 Ways to Use Bitcoin
Siraj Raval
BTC Fever - Siraj [Music Video]
Siraj Raval
5 Reasons to Build Decentralized Apps
Siraj Raval
The Interplanetary File System
Siraj Raval
How to Build a Dapp in 3 min
Siraj Raval
Life Before Smartphones
Siraj Raval
4 Ways to Use Smart Contracts
Siraj Raval
3 Dapps You HAVE to See
Siraj Raval
Char's Life as a BitTorrent Engineer
Siraj Raval
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
Build a Neural Net in 4 Minutes
Siraj Raval
Sentiment Analysis in 4 Minutes
Siraj Raval
The Hackathon Life
Siraj Raval
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
Build a Chatbot - ML for Hackers #6
Siraj Raval
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
Build a Recurrent Neural Net in 5 Min
Siraj Raval
Build a Simulation in 5 Min
Siraj Raval
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
Tensorboard Explained in 5 Min
Siraj Raval
Generate Music in TensorFlow
Siraj Raval
Build a Game Bot (LIVE)
Siraj Raval
Deep Learning Frameworks Compared
Siraj Raval
Introduction - Learn Python for Data Science #1
Siraj Raval
Build a Neural Network (LIVE)
Siraj Raval
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
Pong Neural Network (LIVE)
Siraj Raval
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
Visualizing Data with D3.js (LIVE)
Siraj Raval
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
Enter Siraj [Music Video]
Siraj Raval
Build a Web Scraper (LIVE)
Siraj Raval
Why is P vs NP Important?
Siraj Raval
How to Make a Neural Network (LIVE)
Siraj Raval
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
How to Make an Amazing Video Game Bot Easily
Siraj Raval
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
The Best Way to Prepare a Dataset Easily
Siraj Raval
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval
More on: LLM Foundations
View skill →Related Reads
📰
📰
📰
📰
AI Model Cost & Routing Comparison for SaaS
Dev.to · Subham
I Built a Personal Scoreboard for AI Visibility — Because Dashboards Lie
Medium · AI
How Docusign is Bringing Contract Table Extraction to Production with NVIDIA Nemotron Parse
Dev.to · dev.to staff
Top 10 AI APIs & Scrapers in 2026 — Ranked by Active Users
Dev.to · Nick Davies
🎓
Tutor Explanation
DeepCamp AI