Prepare your data for ML | Text Classification Tutorial Pt. 1 (Coding TensorFlow)

TensorFlow · Intermediate ·🧬 Deep Learning ·7y ago

Key Takeaways

This video tutorial covers preparing data for machine learning, specifically text classification, using TensorFlow. It focuses on getting the data ready to train a neural network and explains the unique challenges associated with text classification.

Full Transcript

[Music] [Applause] hi everybody I'm Laurence Moroney from the tensorflow team at Google and today we're going to talk about text classification it's part one of a two-part series will will focus on the data and getting it ready to train a neural network you will do this hands-on using a workbook that you can find at the link in the description below and I'll step you through it text classification has some unique challenges so before you get coding let me step you through some of these first of all neural networks typically deal with numbers are not text when learning patterns that can be used for prediction or classification so in this case we're looking at learning from movie reviews to see if those reviews are positive or negative and the first step of course is to change the words into numbers that represent them there'll be a little bit more processing of these words into vectors determining their sentiments and we'll cover that in the next video so let's get coding first first things first I'll have to check the licenses before I begin and now I'll import tensorflow and numpy I'll also use care Us and print out the version of tensorflow that I'm using okay now it's time to get the data set the IMDB set is included with care us so let's download it and let's take a look at what's in there note that in this case the nice folks that care us have done the work for us of converting the words into integers they've also sorted them into a dictionary so that lower numbers are the most common words and higher numbers are the least common words so when we loaded the specified 10,000 words this will then give us the top 10,000 words that are used across all of the reviews okay now we've loaded the data and we have our training data and labels as well as our test data and labels it's also nicely sorted into integers for us which is a great first step for learning let's see what the data looks like next first we'll look at our training data you'll see that we have a total of 25,000 items of data and 25,000 labels describing them the labels are very simple it's zero for a negative view and one for a positive one a reviews look like this it's just a long set of numbers and these are the indexes into the array of words the review will start with a 1 indicating the start of the review so the first word in the review is word number 14 which translates to the word this followed by the value 22 which translates to the word film the next bit of code is then a handy-dandy way of decoding the review note that the values zero through three are reserved with one being the start of the review as we mentioned a moment ago and zero is for padding now this is important and you'll see that in a moment I can now decode the review and see that one 14:22 other start character and this and then film it's pretty cool right now earlier I skipped over this piece of code showing me the length of the review so for example the first movie was 218 words long and the second was 189 words long now that's really awkward and it's confusing to train a neural network if all of the training data is of different lengths so let's pick a standard length for every review and if it's longer we'll trim it to that length and if it's shorter we'll pad it to that length the Charis pre-processing api's make this really easy here you can see I'm taking the training and test data and making sure it's 256 words long if I need to pad it then I'll pad it with the pad character which is the 0 that we saw earlier a quick look will now show that it worked they're all 256 words long and if I now look at my first set of training data you'll see that it's padded by zeros remember it had been 218 words long so the extras get patted out to make it 256 great our training and test data is now ready so in the next episode you'll take a look at how to design a neural network to accept this data and to train a model to determine the sentiment of movie reviews I'll see you there [Music] you [Music]

Original Description

@lmoroney is back with another episode of Coding TensorFlow! In this episode, we discuss Text Classification, which assigns categories to text documents. This is part 1 of a 2 part sub series that focuses on the data and gets it ready to train a neural network. Laurence also explains the unique challenges associated with Text Classification. Watch to follow along and stay tuned for part 2 of this episode where we’ll look at how to design a neural network to accept the data we prepared. Hands on tutorial → http://bit.ly/2CNVMbi Watch Part 2 https://www.youtube.com/watch?v=vPrSca-YjFg Subscribe to TensorFlow → http://bit.ly/TensorFlow1 Watch more Coding TensorFlow → http://bit.ly/2zoZfvt
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from TensorFlow · TensorFlow · 0 of 60

← Previous Next →
1 The TensorFlow YouTube Channel is Here!
The TensorFlow YouTube Channel is Here!
TensorFlow
2 Answering Your TF Questions #AskTensorFlow
Answering Your TF Questions #AskTensorFlow
TensorFlow
3 Chatting With the TensorFlow Community (TensorFlow Meets)
Chatting With the TensorFlow Community (TensorFlow Meets)
TensorFlow
4 All About TensorFlow Code (Coding TensorFlow)
All About TensorFlow Code (Coding TensorFlow)
TensorFlow
5 TensorFlow: an ML platform for solving impactful and challenging problems
TensorFlow: an ML platform for solving impactful and challenging problems
TensorFlow
6 Keynote (TensorFlow Dev Summit 2018)
Keynote (TensorFlow Dev Summit 2018)
TensorFlow
7 tf.data: Fast, flexible, and easy-to-use input pipelines (TensorFlow Dev Summit 2018)
tf.data: Fast, flexible, and easy-to-use input pipelines (TensorFlow Dev Summit 2018)
TensorFlow
8 Eager Execution (TensorFlow Dev Summit 2018)
Eager Execution (TensorFlow Dev Summit 2018)
TensorFlow
9 Machine Learning in JavaScript (TensorFlow Dev Summit 2018)
Machine Learning in JavaScript (TensorFlow Dev Summit 2018)
TensorFlow
10 Training Performance: A user’s guide to converge faster (TensorFlow Dev Summit 2018)
Training Performance: A user’s guide to converge faster (TensorFlow Dev Summit 2018)
TensorFlow
11 The Practitioner's Guide with TF High Level APIs (TensorFlow Dev Summit 2018)
The Practitioner's Guide with TF High Level APIs (TensorFlow Dev Summit 2018)
TensorFlow
12 Distributed TensorFlow (TensorFlow Dev Summit 2018)
Distributed TensorFlow (TensorFlow Dev Summit 2018)
TensorFlow
13 Debugging TensorFlow with TensorBoard plugins (TensorFlow Dev Summit 2018)
Debugging TensorFlow with TensorBoard plugins (TensorFlow Dev Summit 2018)
TensorFlow
14 TensorFlow Lite (TensorFlow Dev Summit 2018)
TensorFlow Lite (TensorFlow Dev Summit 2018)
TensorFlow
15 Searching Over Ideas (TensorFlow Dev Summit 2018)
Searching Over Ideas (TensorFlow Dev Summit 2018)
TensorFlow
16 Reconstructing Fusion Plasmas (TensorFlow Dev Summit 2018)
Reconstructing Fusion Plasmas (TensorFlow Dev Summit 2018)
TensorFlow
17 Nucleus: TensorFlow toolkit for Genomics (TensorFlow Dev Summit 2018)
Nucleus: TensorFlow toolkit for Genomics (TensorFlow Dev Summit 2018)
TensorFlow
18 Open Source Collaboration (TensorFlow Dev Summit 2018)
Open Source Collaboration (TensorFlow Dev Summit 2018)
TensorFlow
19 Swift for TensorFlow - TFiwS (TensorFlow Dev Summit 2018)
Swift for TensorFlow - TFiwS (TensorFlow Dev Summit 2018)
TensorFlow
20 TensorFlow Hub (TensorFlow Dev Summit 2018)
TensorFlow Hub (TensorFlow Dev Summit 2018)
TensorFlow
21 Applied AI at The Coca-Cola Company (TensorFlow Dev Summit 2018)
Applied AI at The Coca-Cola Company (TensorFlow Dev Summit 2018)
TensorFlow
22 Real-World Robot Learning (TensorFlow Dev Summit 2018)
Real-World Robot Learning (TensorFlow Dev Summit 2018)
TensorFlow
23 TensorFlow Extended (TFX) (TensorFlow Dev Summit 2018)
TensorFlow Extended (TFX) (TensorFlow Dev Summit 2018)
TensorFlow
24 Project Magenta (TensorFlow Dev Summit 2018)
Project Magenta (TensorFlow Dev Summit 2018)
TensorFlow
25 TensorFlow Dev Summit 2018 - Livestream
TensorFlow Dev Summit 2018 - Livestream
TensorFlow
26 Introducing TensorFlow Lite (Coding TensorFlow)
Introducing TensorFlow Lite (Coding TensorFlow)
TensorFlow
27 TensorFlow Dev Summit 2018 Highlights
TensorFlow Dev Summit 2018 Highlights
TensorFlow
28 Jeff Dean, Head of AI at Google discusses the impact of ML (TensorFlow Meets)
Jeff Dean, Head of AI at Google discusses the impact of ML (TensorFlow Meets)
TensorFlow
29 TensorFlow Mobile vs. TF Lite and More! #AskTensorFlow
TensorFlow Mobile vs. TF Lite and More! #AskTensorFlow
TensorFlow
30 Using TensorFlow to enable research & production across many fields (TensorFlow Meets)
Using TensorFlow to enable research & production across many fields (TensorFlow Meets)
TensorFlow
31 Teaching TensorFlow for Deep Learning at Stanford University (TensorFlow Meets)
Teaching TensorFlow for Deep Learning at Stanford University (TensorFlow Meets)
TensorFlow
32 TensorFlow Lite for Android (Coding TensorFlow)
TensorFlow Lite for Android (Coding TensorFlow)
TensorFlow
33 Using the tf.data API to build input pipelines (TensorFlow Meets)
Using the tf.data API to build input pipelines (TensorFlow Meets)
TensorFlow
34 Training Models in the Cloud & the Benefits of AI Toolkits #AskTensorFlow
Training Models in the Cloud & the Benefits of AI Toolkits #AskTensorFlow
TensorFlow
35 Execute operations immediately with TensorFlow's Eager Execution (TensorFlow Meets)
Execute operations immediately with TensorFlow's Eager Execution (TensorFlow Meets)
TensorFlow
36 TensorFlow Lite for iOS (Coding TensorFlow)
TensorFlow Lite for iOS (Coding TensorFlow)
TensorFlow
37 Get started with TensorFlow's High-Level APIs (Google I/O '18)
Get started with TensorFlow's High-Level APIs (Google I/O '18)
TensorFlow
38 TensorFlow for JavaScript (Google I/O '18)
TensorFlow for JavaScript (Google I/O '18)
TensorFlow
39 TensorFlow in production: TF Extended, TF Hub, and TF Serving (Google I/O '18)
TensorFlow in production: TF Extended, TF Hub, and TF Serving (Google I/O '18)
TensorFlow
40 Get started with TensorFlow's High-Level APIs in 5 mins |  Google I/O 2018
Get started with TensorFlow's High-Level APIs in 5 mins | Google I/O 2018
TensorFlow
41 TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)
TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)
TensorFlow
42 TensorFlow Lite for mobile developers (Google I/O '18)
TensorFlow Lite for mobile developers (Google I/O '18)
TensorFlow
43 Advances in machine learning and TensorFlow (Google I/O '18)
Advances in machine learning and TensorFlow (Google I/O '18)
TensorFlow
44 Distributed TensorFlow training (Google I/O '18)
Distributed TensorFlow training (Google I/O '18)
TensorFlow
45 Classification using neural networks & ML regression models #AskTensorFlow
Classification using neural networks & ML regression models #AskTensorFlow
TensorFlow
46 TensorFlow and Keras in R - Josh Gordon meets with J.J. Allaire (TensorFlow Meets)
TensorFlow and Keras in R - Josh Gordon meets with J.J. Allaire (TensorFlow Meets)
TensorFlow
47 Focus on your experiment with TensorFlow Estimators (TensorFlow Meets)
Focus on your experiment with TensorFlow Estimators (TensorFlow Meets)
TensorFlow
48 How to get started with AI/ML, retraining models, & more! #AskTensorFlow
How to get started with AI/ML, retraining models, & more! #AskTensorFlow
TensorFlow
49 TensorFlow - the deep learning solution for mobile platforms (TensorFlow Meets)
TensorFlow - the deep learning solution for mobile platforms (TensorFlow Meets)
TensorFlow
50 MiniGo: TensorFlow Meets Andrew Jackson (TensorFlow Meets)
MiniGo: TensorFlow Meets Andrew Jackson (TensorFlow Meets)
TensorFlow
51 The growth of TensorFlow with added support for JS & Swift (TensorFlow Meets)
The growth of TensorFlow with added support for JS & Swift (TensorFlow Meets)
TensorFlow
52 At the intersection of TensorFlow & nuclear physics (TensorFlow Meets)
At the intersection of TensorFlow & nuclear physics (TensorFlow Meets)
TensorFlow
53 NVidia TensorRT: high-performance deep learning inference accelerator (TensorFlow Meets)
NVidia TensorRT: high-performance deep learning inference accelerator (TensorFlow Meets)
TensorFlow
54 Try TensorFlow.js in your browser (Coding TensorFlow)
Try TensorFlow.js in your browser (Coding TensorFlow)
TensorFlow
55 TensorFlow Hub: reusing machine learning modules (TensorFlow Meets)
TensorFlow Hub: reusing machine learning modules (TensorFlow Meets)
TensorFlow
56 How to use TensorFlow in PyCharm (TensorFlow Tip of the Week)
How to use TensorFlow in PyCharm (TensorFlow Tip of the Week)
TensorFlow
57 Training models faster with TensorFlow Hub (TensorFlow Meets)
Training models faster with TensorFlow Hub (TensorFlow Meets)
TensorFlow
58 Prepare your dataset for machine learning (Coding TensorFlow)
Prepare your dataset for machine learning (Coding TensorFlow)
TensorFlow
59 Using ML to predict insulin use for Type 1 Diabetes (TensorFlow Meets)
Using ML to predict insulin use for Type 1 Diabetes (TensorFlow Meets)
TensorFlow
60 TFX: an end-to-end machine learning platform for TensorFlow (TensorFlow Meets)
TFX: an end-to-end machine learning platform for TensorFlow (TensorFlow Meets)
TensorFlow

This video tutorial teaches how to prepare data for text classification using TensorFlow, covering data preparation and the unique challenges associated with text classification. It is part 1 of a 2-part series, with part 2 focusing on designing a neural network to accept the prepared data. The tutorial provides hands-on experience with TensorFlow and text classification.

Key Takeaways
  1. Prepare your data for text classification
  2. Understand the unique challenges associated with text classification
  3. Use TensorFlow to prepare your data
  4. Train a neural network using the prepared data
  5. Design a text classification model
💡 Text classification requires careful data preparation and understanding of the unique challenges associated with it, such as handling large amounts of text data and dealing with class imbalance.

Related Reads

📰
I Found the Neural Network I Built in Class 9 — Here’s What Happened When I Tried to Run It Again
Revisiting a 4-year-old neural network project for handwritten digit recognition using a convolutional neural network and analyzing its performance
Medium · Deep Learning
📰
Introduction to Deep Learning and Neural Networks: From Human Brain to Artificial Intelligence
Learn how biological neurons inspired artificial neural networks and deep learning, transforming the AI landscape
Medium · Deep Learning
📰
Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
📰
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →