Sentiment Analysis in 4 Minutes

Siraj Raval · Beginner ·🛠️ AI Tools & Apps ·10y ago

Key Takeaways

Sentiment analysis using Python and scikit-learn, with a focus on natural language processing and machine learning.

Full Transcript

hello world welcome to sjy today we're going to be building sentiment analysis in 4 minutes let's get started sentiment analysis is the process of determining the opinion or feeling of a piece of text we humans are pretty good at this I can look at this tweet and immediately know that it's negative it feels like the writer sentiment is one of anger and disgust due to the negative wording companies across the world have implemented machine learning to do this automatically it's super useful for gaining insight into customer opinions once you understand how the customer feels after analyzing their comments or reviews you can identify what they like and dislike and build things like recommendation systems or more targeted marketing campaigns for them in this demo we're going to be building a sentiment analysis program in Python that will identify whether a movie review is positive or negative based on the text in the review we'll get our training and testing data a bunch of labeled reviews from a site called kaggle we'll start off by importing our dependencies we'll import the op operating system module to help us perform commandline functions then we'll want to import the scit learn module which is a machine learning library in Python with a fast learning curve then we'll import a helper class that will help us clean our data pandas helps us read our data CSV files and nltk will be used to remove unnecessary words from our data set all right so step one is to just read the data from our hard disk we'll import the label training data and the testing data then we'll print out the first review to the command line to ensure we read the data set Cor correctly once we've read in our data step two is to clean it that means ensure that we remove all the HTML non-letters and stop words stop wordss are words that are insignificant we can download them from the nltk or natural language toolkit Library words like the or to or as since it's hard to analyze emotion from them we'll iterate over every review in our training data set and fill our new clean review array with the cleaned reviews our helper class will do the cleaning for us step three is to create a bag of words the bag of words model is a simple numeric representation of a piece of text that is easy to classify we just count the frequency of each word in a piece of text and create a dictionary of them this is called tokenization in natural language processing we'll use the count vectorizer object in the pyit learn package to create it we'll set the Max features to 5,000 to keep things simple so our bag of words will contain at Max 5,000 words and their Associated frequencies then we use the fit transform method to fit model to the bag of words and create the feature vectors we can then store the feature vectors in an array step four is to create the classifier a classifier is a machine learning model that will be used to classify whether a piece of text is positive or negative in this example our classifier is a random Forest consisting of 100 trees a random Forest is a set of decision trees decision trees are graphs that model the possibilities of certain outcomes so let's say a piece of text has a word hate appear more than 20 times the probability that it's negative could be something like 80% then based on other word frequencies we increase or decrease that probability accordingly until we get to the leaf of the tree which will be a positive or negative rating this is different from a standard regression classifier where if a data point is on a certain side of the line of best fit we can easily classify it a random Forest tree is more like a series of lines one for every tree that segments are possibilities once we've mapped all the lines onto the graph and we plot a new data point or review based on its coordinates we can then classify based on whether it's in a positive or negative space it's time to test our classifier on our testing data so let's format the test data by cleaning the reviews and creating a bag of words once we have our feature vectors for test data we can move on to the last step the last step is for our program to correctly classify the reviews in the testing data set as positive or negative we use our random Forest to make a prediction we'll then take the result and write it to a new CSV file that's it let's run our program and see what happens okay it printed out the first review that means it's correctly reading our data set then it's going to clean and parse the training set create the bag of words train the classifier then predict the test labels awesome Let's test the first three predictions where one is positive and zero is negative let's see the first three are 1 zero and one so positive negative and positive let's skim these it's truly a masterpiece positive it's so awful that once you know Okay negative awesome looks like it's performing sentiment analysis like a charm sentiment analysis is still an evolving field of machine learning there's so many grammatical nuances and misspellings and slangs involved in human language that we haven't really taken into account but we can with more powerful algorithms so check out the links in the description below for more information and please please subscribe for more technology videos there's so much I want to make thanks for watching

Original Description

Link to the full Kaggle tutorial w/ code: https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-1-for-beginners-bag-of-words Sentiment Analysis in 5 lines of code: http://blog.dato.com/sentiment-analysis-in-five-lines-of-python I created a Slack channel for us, sign up here: https://wizards.herokuapp.com/ The Stanford Natural Language Processing course: https://class.coursera.org/nlp/lecture Cool API for sentiment analysis: http://www.alchemyapi.com/products/alchemylanguage/sentiment-analysis I recently created a Patreon page. If you like my videos, feel free to help support my effort here!: https://www.patreon.com/user?ty=h&u=3191693 Follow me: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/ Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 13 of 60

1 What is Bitcoin?
What is Bitcoin?
Siraj Raval
2 5 Ways to Use Bitcoin
5 Ways to Use Bitcoin
Siraj Raval
3 BTC Fever - Siraj [Music Video]
BTC Fever - Siraj [Music Video]
Siraj Raval
4 5 Reasons to Build Decentralized Apps
5 Reasons to Build Decentralized Apps
Siraj Raval
5 The Interplanetary File System
The Interplanetary File System
Siraj Raval
6 How to Build a Dapp in 3 min
How to Build a Dapp in 3 min
Siraj Raval
7 Life Before Smartphones
Life Before Smartphones
Siraj Raval
8 4 Ways to Use Smart Contracts
4 Ways to Use Smart Contracts
Siraj Raval
9 3 Dapps You HAVE to See
3 Dapps You HAVE to See
Siraj Raval
10 Char's Life as a BitTorrent Engineer
Char's Life as a BitTorrent Engineer
Siraj Raval
11 4 Reasons AlphaGo is a Huge Deal
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
12 Build a Neural Net in 4 Minutes
Build a Neural Net in 4 Minutes
Siraj Raval
Sentiment Analysis in 4 Minutes
Sentiment Analysis in 4 Minutes
Siraj Raval
14 The Hackathon Life
The Hackathon Life
Siraj Raval
15 Your First ML App - Machine Learning for Hackers #1
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
16 Build an AI Composer - Machine Learning for Hackers #2
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
17 Build a Game AI - Machine Learning for Hackers #3
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
18 Build a Movie Recommender - Machine Learning for Hackers #4
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
19 Build an AI Artist - Machine Learning for Hackers #5
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
20 Build a Chatbot - ML for Hackers #6
Build a Chatbot - ML for Hackers #6
Siraj Raval
21 Build an AI Reader - Machine Learning for Hackers #7
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
22 Build an AI Writer - Machine Learning for Hackers #8
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
23 Build a Chatbot w/ an API - ML for Hackers #9
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
24 One-Shot Learning - Fresh Machine Learning #1
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
25 Generative Adversarial Nets - Fresh Machine Learning #2
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
26 Tone Analysis - Fresh Machine Learning #3
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
27 Generate Rap Lyrics - Fresh Machine Learning #4
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
28 Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
29 Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
30 Build an Antivirus in 5 Min - Fresh Machine Learning #7
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
31 TensorFlow in 5 Minutes (tutorial)
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
32 Build a Recurrent Neural Net in 5 Min
Build a Recurrent Neural Net in 5 Min
Siraj Raval
33 Build a Simulation in 5 Min
Build a Simulation in 5 Min
Siraj Raval
34 Build a TensorFlow Image Classifier in 5 Min
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
35 Tensorboard Explained in 5 Min
Tensorboard Explained in 5 Min
Siraj Raval
36 Generate Music in TensorFlow
Generate Music in TensorFlow
Siraj Raval
37 Build a Game Bot (LIVE)
Build a Game Bot (LIVE)
Siraj Raval
38 Deep Learning Frameworks Compared
Deep Learning Frameworks Compared
Siraj Raval
39 Introduction - Learn Python for Data Science #1
Introduction - Learn Python for Data Science #1
Siraj Raval
40 Build a Neural Network (LIVE)
Build a Neural Network (LIVE)
Siraj Raval
41 Twitter Sentiment Analysis - Learn Python for Data Science #2
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
42 Recommendation Systems - Learn Python for Data Science #3
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
43 Predicting Stock Prices - Learn Python for Data Science #4
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
44 Pong Neural Network (LIVE)
Pong Neural Network (LIVE)
Siraj Raval
45 Deep Dream in TensorFlow - Learn Python for Data Science #5
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
46 Visualizing Data with D3.js (LIVE)
Visualizing Data with D3.js (LIVE)
Siraj Raval
47 Genetic Algorithms - Learn Python for Data Science #6
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
48 Enter Siraj [Music Video]
Enter Siraj [Music Video]
Siraj Raval
49 Build a Web Scraper (LIVE)
Build a Web Scraper (LIVE)
Siraj Raval
50 Why is P vs NP Important?
Why is P vs NP Important?
Siraj Raval
51 How to Make a Neural Network (LIVE)
How to Make a Neural Network (LIVE)
Siraj Raval
52 How to Make an Amazing Tensorflow Chatbot Easily
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
53 How to Make an Amazing Video Game Bot Easily
How to Make an Amazing Video Game Bot Easily
Siraj Raval
54 How to Make a Tensorflow Neural Network (LIVE)
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
55 How to Make a Simple Tensorflow Speech Recognizer
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
56 Joel Shor - Really Quick Questions with an Awesome Google Engineer
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
57 How to Make a Path Planning Algorithm Easily (LIVE)
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
58 The Best Way to Prepare a Dataset Easily
The Best Way to Prepare a Dataset Easily
Siraj Raval
59 Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
60 How to Make a Tic Tac Toe Neural Network Easily (LIVE)
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval

This video teaches sentiment analysis using Python and scikit-learn, covering natural language processing and machine learning basics. It demonstrates how to build a sentiment analysis model in 4 minutes.

Key Takeaways
  1. Import necessary libraries
  2. Read and clean the data
  3. Create a bag of words
  4. Train a random forest classifier
  5. Test the classifier on new data
💡 Sentiment analysis is an evolving field of machine learning that can be improved with more powerful algorithms and techniques.

Related Reads

Up next
How AI Is Transforming Analytics in Tableau Cloud & Server
Salesforce Product Center
Watch →