How to Understand the Watson Discovery Data Schema - Part 3 - Stock News Crawler
Skills:
Tool Use & Function Calling70%
Key Takeaways
The video demonstrates how to understand the Watson Discovery data schema, specifically for a stock news crawler, by exploring the fields and constructs found within textual documents and how to query them later on. It covers the collection view and document view, and explains how to analyze data using concepts, categories, entities, and extracted metadata.
Full Transcript
happening guys welcome back to the series on how to build a stock news crawler with what's and discovery in the last video what we went through is setting up our web crawl data source and getting that to run every week so now what we're gonna take a look into is what's being created so the data that's been imported and specifically we're going to take a look into the data schema so these are basically the fields and the constructs that Watson discover is found within a textual documents and they're the fields that we're going to be able to query later on so jumping back over to the dashboard that we got to in the last video you should have something that looks a little bit like this now what you've got up here is your web crawl name so you can rename that if you want so let's just do that for now so if you hit the little pencil and rename it to stock news web crawl right then what we can take a look at the number of documents so it looks as though we've got nine hundred and seventy-one documents that have been imported into our new web crawl it looks like we've also got a bunch of errors so we can take a look at these but the majority of them are probably going to be due to our file restraints or storage restraints so it looks like we've probably hit our limit based on the free tier doesn't matter at this stage because we can always bump that up if we want to get more documents at a later stage so jumping back over to overview you can see we've got nine hundred seventy one documents it's also identified one field in our data so text but we're going to look into that a little bit later on and there's also a bunch of different enrichments which we'll look take a look at once we start jumping over to the schema so this is basically your key dashboard for each collection now a collection is basically a data source or set of documents that you've collected from around the web so that's just the way that what's in discovery lumps two documents together right without further ado let's get to the data schema so if you click this little button over here you'll be taken to the data scheme of view which looks a bit like this now there's two ways to analyze your data or your documents from this view and the first one is the collection view and the second is the document view key difference between those is that the collectionview is basically going to group each of the fields together whereas the document view is going to group the fields based on each document so rather than looking at every single type of field here you can actually filter through by document so each one of these here represents a different document in our web crawl whereas in our collection view each one of these just represents a different field so you can see we've got text we've got concept categories and then a bunch of other stuff so let's quickly go through each of those because this is the key part of this video what's actually in our data schema so jumping back into collection view you can see that the first one that we've got is text that one's pretty straightforward it's just the text that's been collected or scraped from each of those documents as part of the web crawl now in the second one or the second thing that we're taking a look at are concepts now concepts are basically key topics or themes that are found within each of the documents so it's really to do with high level objects or subjects that are found within each of these documents so here we've got a subject that Jim Cramer that might also be an entity but for now let's actually cast it as a as a concept and we've got Jim Cramer quite a fair bit and we've got stock market you saw that pop up what you basically got is the text for each of the concepts it's relevant so you can see here that it's got a relevance of 99.6% as well as the resource if we search that we're actually taken to the resource that we've actually found within that document now what we can also do if you just keep hitting show more values it'll go through each one of the different concepts now this is only going to show you a couple so there are more that you'll actually be able to find once you start querying the documents but we'll see that later now you've also got categories these are really high level themes that are found throughout your document so these really differ from concepts in that concepts are constructs or theories or subject matter that that actually are quite nuanced or or detailed within that your documents category is a very high level theme so for example you can see that we've got finance investing trading I was right investing funds and exchange traded fun so is that the theme of the document might be exchange-traded funds if we hit another one we can see investing again very high-level again got ETFs what else we have seems only we're only getting investing in ATS but again once we start treating queries we'll see a lot more of that entities are basically people places companies anything which is a proper noun so you can see here that it's identified Arin Hankin as a key entity and he's been identified as a person so that's the type again we've got the relevance and here we've also got the count as well so how many times this particular entity appears within the documents that we've got so we can also check out some more values you can see that we've got companies so we've got market watch showing up as a company got baron showing up as a person that might not be correct but i mean it's still pretty good bill Bischoff it's a person again we've got a bunch of different people in here so you can see that it's automatically pulling out the concepts categories and entities out of your document without actually explicitly saying or classifying these groups of or they these key entities within the text now the other parts of the document that you also see is the extracted metadata so that really is to do with the file name file type as well as the title of the document you can also grab some other metadata which is really to do with a HTML side of things so you can see it's the content type we've got a bunch of link IDs I've got some URLs as well as the application ID and again you've got a unique ID for that document and then you've got the raw HTML for each one of those documents now again if you go into document view you're gonna get these exact same concepts or fields it's just going to be grouped together for the entire document so you can see that this is one document but again we've got the ID we've got the metadata but the text can see each of the entities that are popping up so we can see that we've got a bunch of other subtypes for each of these entities we've got another entity which is Bitcoin and again we've got a bunch of different entities here so if we keep scrolling down you can say that you've got quite a fair few of those as well as the metadata and there should be their HTML down bottom which there is so that about wraps up looking at the data schema so once again this is really all to do with taking a look at the constructs that have been found within the document in the next video what we're gonna take a look at is how to start querying against these fields if you found this video useful be sure to like share and subscribe it thanks so much for watching peace
Original Description
Tired of searching the web for stock data? Get yourself setup with Watson Discovery and build a stock news crawler in under an hour.
What’s Watson Discovery?
It’s your own personalised search engine built on top of IBM Watson. You can upload your own documents and search them using natural language queries and the IBM Discovery query language.
Here’s what you’ll learn!
- Get an understanding of the Watson Discovery data schema
- Learn the difference between Concepts, Categories and Entities in WD
Rather read a blog post…?
Follow along with the blog post? Check it out here: https://https://www.nicholasrenotte.com/how-to-build-a-stock-news-crawler-using-ibm-watson-discovery/
Want more data and analytics goodness?!?
Want more awesome data and analytics stuff?? Follow me on…
Blog: www.nicholasrenotte.com
Twitter: https://twitter.com/nicholasrenotte
Facebook: https://www.facebook.com/nickrenotte
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Nicholas Renotte · Nicholas Renotte · 25 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
▶
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Face Detection - Build An Image Classifier with IBM Watson - Part 7
Nicholas Renotte
Food Image Classification - Build An Image Classifier with IBM Watson - Part 6
Nicholas Renotte
General Image Classification - Build An Image Classifier with IBM Watson - Part 5
Nicholas Renotte
Installing Watson Developer Cloud - Build An Image Classifier with IBM Watson - Part 4
Nicholas Renotte
Generating Credentials - Build An Image Classifier with IBM Watson - Part 3
Nicholas Renotte
Creating A Service - Build An Image Classifier with IBM Watson - Part 2
Nicholas Renotte
Getting an IBMid - Build An Image Classifier with IBM Watson - Part 1
Nicholas Renotte
How to Analyse Review Data - Part 2 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Lemmatize Text - Part 4 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Calculate Sentiment Using TextBlob - Part 5 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Collect Business Reviews Using Python - Part 1 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Clean Text Based Data for NLP - Part 3 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Setup a IBM Watson Personality Insights Service - Part 1 - Watson Personality Insights
Nicholas Renotte
How to Create a Customer Profile with IBM Watson - Part 2 - Watson Personality Insights
Nicholas Renotte
Visualising The Profile Part 3 Watson Personality Insights
Nicholas Renotte
How to Plot Personality Insights Features at Lightspeed - Part 4 - IBM Watson Personality Insights
Nicholas Renotte
Getting Started With IBM Watson Studio Machine Learning - Part 1 - Predicting Used Car Prices
Nicholas Renotte
Upload and Visualize Data In IBM Watson Studio - Part 2 - Predicting Used Car Prices
Nicholas Renotte
Clean Data and Feature Engineer in IBM Watson Studio - Part 3 - Predict Used Car Prices
Nicholas Renotte
Using Watson Model Builder to Predict Car Prices - Part 4 - Predicting Used Car Prices
Nicholas Renotte
Deploy and Make Predictions With Watson Studio - Part 5 - Predicting Used Car Prices
Nicholas Renotte
Getting Started With IBM Watson Discovery - Part 1 - Stock News Crawler
Nicholas Renotte
How to Run Advanced Queries with Watson Discovery - Part 5 - Stock News Crawler
Nicholas Renotte
How to Run Search Queries with IBM Watson Discovery - Part 4 - Stock News Crawler
Nicholas Renotte
How to Understand the Watson Discovery Data Schema - Part 3 - Stock News Crawler
Nicholas Renotte
How to Build a Watson Discovery Web Crawler - Part 2 - Stock News Crawler
Nicholas Renotte
AI learns what to do next using Tensorflow and Python
Nicholas Renotte
Chatbot Crash Course for Absolute Beginners - Full 20 Minute Tutorial
Nicholas Renotte
Shopify Customer Service Chatbot using Python Automation
Nicholas Renotte
Building a Reddit Keyword Research Chatbot
Nicholas Renotte
Chatbot App Tutorial with Javascript Node.js [Part 1]
Nicholas Renotte
Javascript Chatbot From Scratch with React.Js [Part 2]
Nicholas Renotte
Predicting Churn with Automated Python Machine Learning
Nicholas Renotte
Sales Forecasting in Excel with Machine Learning and Python Automation
Nicholas Renotte
Automate Budgeting with Python and Planning Analytics
Nicholas Renotte
AI vs Machine Learning vs Deep Learning vs Data Science
Nicholas Renotte
Optimizing Marketing Spend using Linear Programming || Marketing Opt PT.1
Nicholas Renotte
Solving Optimization Problems with Python Linear Programming
Nicholas Renotte
Loading Data into Planning Analytics with Python || Marketing Opt PT.2
Nicholas Renotte
Building Marketing Dashboards with Planning Analytics Workspace || Marketing Opt PT.3
Nicholas Renotte
Optimizing Resource Allocation with Docplex and Planning Analytics || Marketing Opt PT.4
Nicholas Renotte
Exploratory Data Analysis With Pandas || Python Machine Learning PT.1
Nicholas Renotte
Preparing Pandas Dataframes for Machine Learning || Python Machine Learning PT.2
Nicholas Renotte
Python Machine Learning with Scikit Learn - Regression || Python Machine Learning PT.3
Nicholas Renotte
Deploying Machine Learning Models with Watson Machine Learning || Python Machine Learning PT.4
Nicholas Renotte
Mind Blowing Machine Learning Apps with Node.JS and Watson Machine Learning || Python ML PT.5
Nicholas Renotte
Build FAST Machine Learning Apps with Javascript React.Js and Watson || Python ML PT.6
Nicholas Renotte
Analyzing Twitter Accounts with Python and Personality Insights
Nicholas Renotte
Converting Speech to Text in 10 Minutes with Python and Watson
Nicholas Renotte
Build a Face Mask Detector in 20 Minutes with Watson and Python
Nicholas Renotte
AI Text to Speech in 10 Minutes with Python and Watson TTS
Nicholas Renotte
Pandas for Data Science in 20 Minutes | Python Crash Course
Nicholas Renotte
Language Translation and Identification in 10 Minutes with Python and Watson AI
Nicholas Renotte
Analyse ANY Conversation in 10 Minutes with Python and Watson Tone Analyser
Nicholas Renotte
Deep Reinforcement Learning Tutorial for Python in 20 Minutes
Nicholas Renotte
NumPy for Beginners in 15 minutes | Python Crash Course
Nicholas Renotte
Real Time Pose Estimation with Tensorflow.Js and Javascript
Nicholas Renotte
Transcribe Video to Text with Python and Watson in 15 Minutes
Nicholas Renotte
Serverless Functions for TM1/Planning Analytics in 20 Minutes
Nicholas Renotte
Building a AI Budget Bot for Planning Analytics with Watson Assistant in 20 Minutes
Nicholas Renotte
More on: Tool Use & Function Calling
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The AI Moat Paradox: The Better Models Become, the Less Models Matter
Medium · AI
170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Medium · Machine Learning
170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Medium · Data Science
[PoV] When Everyone Is Smart, No One Is
Medium · AI
🎓
Tutor Explanation
DeepCamp AI