How to Understand the Watson Discovery Data Schema - Part 3 - Stock News Crawler

Nicholas Renotte · Intermediate ·📰 AI News & Updates ·7y ago

Key Takeaways

The video demonstrates how to understand the Watson Discovery data schema, specifically for a stock news crawler, by exploring the fields and constructs found within textual documents and how to query them later on. It covers the collection view and document view, and explains how to analyze data using concepts, categories, entities, and extracted metadata.

Full Transcript

happening guys welcome back to the series on how to build a stock news crawler with what's and discovery in the last video what we went through is setting up our web crawl data source and getting that to run every week so now what we're gonna take a look into is what's being created so the data that's been imported and specifically we're going to take a look into the data schema so these are basically the fields and the constructs that Watson discover is found within a textual documents and they're the fields that we're going to be able to query later on so jumping back over to the dashboard that we got to in the last video you should have something that looks a little bit like this now what you've got up here is your web crawl name so you can rename that if you want so let's just do that for now so if you hit the little pencil and rename it to stock news web crawl right then what we can take a look at the number of documents so it looks as though we've got nine hundred and seventy-one documents that have been imported into our new web crawl it looks like we've also got a bunch of errors so we can take a look at these but the majority of them are probably going to be due to our file restraints or storage restraints so it looks like we've probably hit our limit based on the free tier doesn't matter at this stage because we can always bump that up if we want to get more documents at a later stage so jumping back over to overview you can see we've got nine hundred seventy one documents it's also identified one field in our data so text but we're going to look into that a little bit later on and there's also a bunch of different enrichments which we'll look take a look at once we start jumping over to the schema so this is basically your key dashboard for each collection now a collection is basically a data source or set of documents that you've collected from around the web so that's just the way that what's in discovery lumps two documents together right without further ado let's get to the data schema so if you click this little button over here you'll be taken to the data scheme of view which looks a bit like this now there's two ways to analyze your data or your documents from this view and the first one is the collection view and the second is the document view key difference between those is that the collectionview is basically going to group each of the fields together whereas the document view is going to group the fields based on each document so rather than looking at every single type of field here you can actually filter through by document so each one of these here represents a different document in our web crawl whereas in our collection view each one of these just represents a different field so you can see we've got text we've got concept categories and then a bunch of other stuff so let's quickly go through each of those because this is the key part of this video what's actually in our data schema so jumping back into collection view you can see that the first one that we've got is text that one's pretty straightforward it's just the text that's been collected or scraped from each of those documents as part of the web crawl now in the second one or the second thing that we're taking a look at are concepts now concepts are basically key topics or themes that are found within each of the documents so it's really to do with high level objects or subjects that are found within each of these documents so here we've got a subject that Jim Cramer that might also be an entity but for now let's actually cast it as a as a concept and we've got Jim Cramer quite a fair bit and we've got stock market you saw that pop up what you basically got is the text for each of the concepts it's relevant so you can see here that it's got a relevance of 99.6% as well as the resource if we search that we're actually taken to the resource that we've actually found within that document now what we can also do if you just keep hitting show more values it'll go through each one of the different concepts now this is only going to show you a couple so there are more that you'll actually be able to find once you start querying the documents but we'll see that later now you've also got categories these are really high level themes that are found throughout your document so these really differ from concepts in that concepts are constructs or theories or subject matter that that actually are quite nuanced or or detailed within that your documents category is a very high level theme so for example you can see that we've got finance investing trading I was right investing funds and exchange traded fun so is that the theme of the document might be exchange-traded funds if we hit another one we can see investing again very high-level again got ETFs what else we have seems only we're only getting investing in ATS but again once we start treating queries we'll see a lot more of that entities are basically people places companies anything which is a proper noun so you can see here that it's identified Arin Hankin as a key entity and he's been identified as a person so that's the type again we've got the relevance and here we've also got the count as well so how many times this particular entity appears within the documents that we've got so we can also check out some more values you can see that we've got companies so we've got market watch showing up as a company got baron showing up as a person that might not be correct but i mean it's still pretty good bill Bischoff it's a person again we've got a bunch of different people in here so you can see that it's automatically pulling out the concepts categories and entities out of your document without actually explicitly saying or classifying these groups of or they these key entities within the text now the other parts of the document that you also see is the extracted metadata so that really is to do with the file name file type as well as the title of the document you can also grab some other metadata which is really to do with a HTML side of things so you can see it's the content type we've got a bunch of link IDs I've got some URLs as well as the application ID and again you've got a unique ID for that document and then you've got the raw HTML for each one of those documents now again if you go into document view you're gonna get these exact same concepts or fields it's just going to be grouped together for the entire document so you can see that this is one document but again we've got the ID we've got the metadata but the text can see each of the entities that are popping up so we can see that we've got a bunch of other subtypes for each of these entities we've got another entity which is Bitcoin and again we've got a bunch of different entities here so if we keep scrolling down you can say that you've got quite a fair few of those as well as the metadata and there should be their HTML down bottom which there is so that about wraps up looking at the data schema so once again this is really all to do with taking a look at the constructs that have been found within the document in the next video what we're gonna take a look at is how to start querying against these fields if you found this video useful be sure to like share and subscribe it thanks so much for watching peace

Original Description

Tired of searching the web for stock data? Get yourself setup with Watson Discovery and build a stock news crawler in under an hour. What’s Watson Discovery? It’s your own personalised search engine built on top of IBM Watson. You can upload your own documents and search them using natural language queries and the IBM Discovery query language. Here’s what you’ll learn! - Get an understanding of the Watson Discovery data schema - Learn the difference between Concepts, Categories and Entities in WD Rather read a blog post…? Follow along with the blog post? Check it out here: https://https://www.nicholasrenotte.com/how-to-build-a-stock-news-crawler-using-ibm-watson-discovery/ Want more data and analytics goodness?!? Want more awesome data and analytics stuff?? Follow me on… Blog: www.nicholasrenotte.com Twitter: https://twitter.com/nicholasrenotte Facebook: https://www.facebook.com/nickrenotte
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Nicholas Renotte · Nicholas Renotte · 25 of 60

1 Face Detection - Build An Image Classifier with IBM Watson - Part 7
Face Detection - Build An Image Classifier with IBM Watson - Part 7
Nicholas Renotte
2 Food Image Classification - Build An Image Classifier with IBM Watson - Part 6
Food Image Classification - Build An Image Classifier with IBM Watson - Part 6
Nicholas Renotte
3 General Image Classification - Build An Image Classifier with IBM Watson - Part 5
General Image Classification - Build An Image Classifier with IBM Watson - Part 5
Nicholas Renotte
4 Installing Watson Developer Cloud - Build An Image Classifier with IBM Watson - Part 4
Installing Watson Developer Cloud - Build An Image Classifier with IBM Watson - Part 4
Nicholas Renotte
5 Generating Credentials - Build An Image Classifier with IBM Watson - Part 3
Generating Credentials - Build An Image Classifier with IBM Watson - Part 3
Nicholas Renotte
6 Creating A Service - Build An Image Classifier with IBM Watson - Part 2
Creating A Service - Build An Image Classifier with IBM Watson - Part 2
Nicholas Renotte
7 Getting an IBMid - Build An Image Classifier with IBM Watson - Part 1
Getting an IBMid - Build An Image Classifier with IBM Watson - Part 1
Nicholas Renotte
8 How to Analyse Review Data - Part 2 - Python Yelp Sentiment Analysis
How to Analyse Review Data - Part 2 - Python Yelp Sentiment Analysis
Nicholas Renotte
9 How to Lemmatize Text - Part 4 - Python Yelp Sentiment Analysis
How to Lemmatize Text - Part 4 - Python Yelp Sentiment Analysis
Nicholas Renotte
10 How to Calculate Sentiment Using TextBlob - Part 5 - Python Yelp Sentiment Analysis
How to Calculate Sentiment Using TextBlob - Part 5 - Python Yelp Sentiment Analysis
Nicholas Renotte
11 How to Collect Business Reviews Using Python - Part 1 - Python Yelp Sentiment Analysis
How to Collect Business Reviews Using Python - Part 1 - Python Yelp Sentiment Analysis
Nicholas Renotte
12 How to Clean Text Based Data for NLP - Part 3 - Python Yelp Sentiment Analysis
How to Clean Text Based Data for NLP - Part 3 - Python Yelp Sentiment Analysis
Nicholas Renotte
13 How to Setup a IBM Watson Personality Insights Service - Part 1 - Watson Personality Insights
How to Setup a IBM Watson Personality Insights Service - Part 1 - Watson Personality Insights
Nicholas Renotte
14 How to Create a Customer Profile with IBM Watson - Part 2 - Watson Personality Insights
How to Create a Customer Profile with IBM Watson - Part 2 - Watson Personality Insights
Nicholas Renotte
15 Visualising The Profile   Part 3   Watson Personality Insights
Visualising The Profile Part 3 Watson Personality Insights
Nicholas Renotte
16 How to Plot Personality Insights Features at Lightspeed - Part 4  - IBM Watson Personality Insights
How to Plot Personality Insights Features at Lightspeed - Part 4 - IBM Watson Personality Insights
Nicholas Renotte
17 Getting Started With IBM Watson Studio Machine Learning - Part 1 - Predicting Used Car Prices
Getting Started With IBM Watson Studio Machine Learning - Part 1 - Predicting Used Car Prices
Nicholas Renotte
18 Upload and Visualize Data In IBM Watson Studio - Part 2 - Predicting Used Car Prices
Upload and Visualize Data In IBM Watson Studio - Part 2 - Predicting Used Car Prices
Nicholas Renotte
19 Clean Data and Feature Engineer in IBM Watson Studio - Part  3 - Predict Used Car Prices
Clean Data and Feature Engineer in IBM Watson Studio - Part 3 - Predict Used Car Prices
Nicholas Renotte
20 Using Watson Model Builder to Predict Car Prices - Part 4 - Predicting Used Car Prices
Using Watson Model Builder to Predict Car Prices - Part 4 - Predicting Used Car Prices
Nicholas Renotte
21 Deploy and Make Predictions With Watson Studio - Part 5 - Predicting Used Car Prices
Deploy and Make Predictions With Watson Studio - Part 5 - Predicting Used Car Prices
Nicholas Renotte
22 Getting Started With IBM Watson Discovery - Part 1 - Stock News Crawler
Getting Started With IBM Watson Discovery - Part 1 - Stock News Crawler
Nicholas Renotte
23 How to Run Advanced Queries with Watson Discovery - Part 5 - Stock News Crawler
How to Run Advanced Queries with Watson Discovery - Part 5 - Stock News Crawler
Nicholas Renotte
24 How to Run Search Queries with IBM Watson Discovery - Part 4 - Stock News Crawler
How to Run Search Queries with IBM Watson Discovery - Part 4 - Stock News Crawler
Nicholas Renotte
How to Understand the Watson Discovery Data Schema  - Part 3 - Stock News Crawler
How to Understand the Watson Discovery Data Schema - Part 3 - Stock News Crawler
Nicholas Renotte
26 How to Build a Watson Discovery Web Crawler - Part 2 - Stock News Crawler
How to Build a Watson Discovery Web Crawler - Part 2 - Stock News Crawler
Nicholas Renotte
27 AI learns what to do next using Tensorflow and Python
AI learns what to do next using Tensorflow and Python
Nicholas Renotte
28 Chatbot Crash Course for Absolute Beginners - Full 20 Minute Tutorial
Chatbot Crash Course for Absolute Beginners - Full 20 Minute Tutorial
Nicholas Renotte
29 Shopify Customer Service Chatbot using Python Automation
Shopify Customer Service Chatbot using Python Automation
Nicholas Renotte
30 Building a Reddit Keyword Research Chatbot
Building a Reddit Keyword Research Chatbot
Nicholas Renotte
31 Chatbot App Tutorial with Javascript Node.js [Part 1]
Chatbot App Tutorial with Javascript Node.js [Part 1]
Nicholas Renotte
32 Javascript Chatbot From Scratch with React.Js [Part 2]
Javascript Chatbot From Scratch with React.Js [Part 2]
Nicholas Renotte
33 Predicting Churn with Automated Python Machine Learning
Predicting Churn with Automated Python Machine Learning
Nicholas Renotte
34 Sales Forecasting in Excel with Machine Learning and Python Automation
Sales Forecasting in Excel with Machine Learning and Python Automation
Nicholas Renotte
35 Automate Budgeting with Python and Planning Analytics
Automate Budgeting with Python and Planning Analytics
Nicholas Renotte
36 AI vs Machine Learning vs Deep Learning vs Data Science
AI vs Machine Learning vs Deep Learning vs Data Science
Nicholas Renotte
37 Optimizing Marketing Spend using Linear Programming || Marketing Opt PT.1
Optimizing Marketing Spend using Linear Programming || Marketing Opt PT.1
Nicholas Renotte
38 Solving Optimization Problems with Python Linear Programming
Solving Optimization Problems with Python Linear Programming
Nicholas Renotte
39 Loading Data into Planning Analytics with Python || Marketing Opt PT.2
Loading Data into Planning Analytics with Python || Marketing Opt PT.2
Nicholas Renotte
40 Building Marketing Dashboards with Planning Analytics Workspace || Marketing Opt PT.3
Building Marketing Dashboards with Planning Analytics Workspace || Marketing Opt PT.3
Nicholas Renotte
41 Optimizing Resource Allocation with Docplex and Planning Analytics || Marketing Opt PT.4
Optimizing Resource Allocation with Docplex and Planning Analytics || Marketing Opt PT.4
Nicholas Renotte
42 Exploratory Data Analysis With Pandas || Python Machine Learning PT.1
Exploratory Data Analysis With Pandas || Python Machine Learning PT.1
Nicholas Renotte
43 Preparing Pandas Dataframes for Machine Learning || Python Machine Learning PT.2
Preparing Pandas Dataframes for Machine Learning || Python Machine Learning PT.2
Nicholas Renotte
44 Python Machine Learning with Scikit Learn - Regression || Python Machine Learning PT.3
Python Machine Learning with Scikit Learn - Regression || Python Machine Learning PT.3
Nicholas Renotte
45 Deploying Machine Learning Models with Watson Machine Learning || Python Machine Learning PT.4
Deploying Machine Learning Models with Watson Machine Learning || Python Machine Learning PT.4
Nicholas Renotte
46 Mind Blowing Machine Learning Apps with Node.JS and Watson Machine Learning || Python ML PT.5
Mind Blowing Machine Learning Apps with Node.JS and Watson Machine Learning || Python ML PT.5
Nicholas Renotte
47 Build FAST Machine Learning Apps with Javascript React.Js and Watson || Python ML PT.6
Build FAST Machine Learning Apps with Javascript React.Js and Watson || Python ML PT.6
Nicholas Renotte
48 Analyzing Twitter Accounts with Python and Personality Insights
Analyzing Twitter Accounts with Python and Personality Insights
Nicholas Renotte
49 Converting Speech to Text in 10 Minutes with Python and Watson
Converting Speech to Text in 10 Minutes with Python and Watson
Nicholas Renotte
50 Build a Face Mask Detector in 20 Minutes with Watson and Python
Build a Face Mask Detector in 20 Minutes with Watson and Python
Nicholas Renotte
51 AI Text to Speech in 10 Minutes with Python and Watson TTS
AI Text to Speech in 10 Minutes with Python and Watson TTS
Nicholas Renotte
52 Pandas for Data Science in 20 Minutes | Python Crash Course
Pandas for Data Science in 20 Minutes | Python Crash Course
Nicholas Renotte
53 Language Translation and Identification in 10 Minutes with Python and Watson AI
Language Translation and Identification in 10 Minutes with Python and Watson AI
Nicholas Renotte
54 Analyse ANY Conversation in 10 Minutes with Python and Watson Tone Analyser
Analyse ANY Conversation in 10 Minutes with Python and Watson Tone Analyser
Nicholas Renotte
55 Deep Reinforcement Learning Tutorial for Python in 20 Minutes
Deep Reinforcement Learning Tutorial for Python in 20 Minutes
Nicholas Renotte
56 NumPy for Beginners in 15 minutes | Python Crash Course
NumPy for Beginners in 15 minutes | Python Crash Course
Nicholas Renotte
57 Real Time Pose Estimation with Tensorflow.Js and Javascript
Real Time Pose Estimation with Tensorflow.Js and Javascript
Nicholas Renotte
58 Transcribe Video to Text with Python and Watson in 15 Minutes
Transcribe Video to Text with Python and Watson in 15 Minutes
Nicholas Renotte
59 Serverless Functions for TM1/Planning Analytics in 20 Minutes
Serverless Functions for TM1/Planning Analytics in 20 Minutes
Nicholas Renotte
60 Building a AI Budget Bot for Planning Analytics with Watson Assistant in 20 Minutes
Building a AI Budget Bot for Planning Analytics with Watson Assistant in 20 Minutes
Nicholas Renotte

This video teaches how to understand the Watson Discovery data schema for a stock news crawler and how to analyze data using concepts, categories, entities, and metadata. It covers the collection view and document view, and explains how to query data using natural language.

Key Takeaways
  1. Set up a web crawl data source
  2. Analyze data using the collection view and document view
  3. Understand concepts, categories, entities, and metadata
  4. Query data using natural language
💡 Watson Discovery can automatically extract concepts, categories, entities, and metadata from textual documents, making it easier to analyze and query data.

Related AI Lessons

The AI Moat Paradox: The Better Models Become, the Less Models Matter
The AI moat paradox suggests that as AI models improve, their importance may decrease, and understanding this concept is crucial for AI professionals and businesses.
Medium · AI
170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Discover the biggest AI research shifts of 2026 based on 170,927 papers, and learn how to apply these trends to your work
Medium · Machine Learning
170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Discover the major research shifts in AI from 170,927 papers published in the first half of 2026, and learn how to analyze trends in AI research
Medium · Data Science
[PoV] When Everyone Is Smart, No One Is
In a world where AI makes everyone smart, the value of intelligence decreases, and new challenges arise
Medium · AI
Up next
‘ENOUGH IS ENOUGH’: Lebanon is STANDING UP to Iran, expert says
Fox Business
Watch →