Scikit Learn Machine Learning Tutorial for investing with Python p. 5
Key Takeaways
This video tutorial demonstrates how to use Scikit Learn and Python for machine learning in investing, specifically by parsing HTML files, retrieving financial data from Yahoo Finance, and visualizing debt to equity ratios using Matplotlib.
Full Transcript
what is going on everybody Welcome to the fifth video in our python for machine learning using sidekit learn tutorial series uh in the last video we were talking about uh pulling some of the ne necessary information from our data file and in this video we're going to talk about how to act to actually acquire the value that we're interested in uh for us the value that we're interested in is this total debt to equity ratio uh again as we move forward we will add many more features to the company I don't think that this is going to give us anything too useful um at least until we separate companies by their sector um then maybe this number might get useful but for now we want to keep it simple so we can actually visualize the out output um and then we'll we'll start making it more complex and likely a little more interesting uh so first of all uh we've got the date and the Unix time and now we have actually need to acquire uh the data so when it comes to parsing a website um again we're not actually parsing a website in this example uh but we are what did I do with the um but what we have is a bunch of HTML files that b is identical to what you would have gotten if you had parsed Yahoo finance so uh for example here I have uh the stock ticker a for um agilant Technologies and if we scroll down we can see here is the total debt to equity ratio to start which is uh 407 now if we uh generally I mean you can use uh like a a module called Beautiful soup to do web parsing in my honest opinion beautiful soup is almost never necessary unless you're doing some really complex part parsing so I'm going to show you how simple it is to parse pretty much anything but um anyway what I tend to do is this is the I want this this right here I want to eventually find this number I'm just going to keep in my mind that the number is 0.407 I'm copying the element right before that number and then if you're on a website we're on an HTML page so we literally can do control U if you're in Chrome or you can do rightclick view page source and then contrl F and what we're looking for is total debt equity mrq which takes us here and then we see here is the actual value that we were interested in 0.47 we can't actually search for 0.407 because um that number is going to change obviously given the document so uh no longer do we need to print uh the date time and the unix's time uh we'll keep the sleep there for now and now what we want to do is figure out how we can pull this data now again you could use something like beautiful soup and you know use their like table reading functionality or you can just do the following so we have um the information there for the date time now what we want to do is we currently have no way of opening the full file yet so we need to specify how to build the entire path So currently we have this path then we have the stats path added to it and then from there we haven't done anything um so we want to open up the file uh so basically path plus stat path plus file equals um what we want so what we'll do is we're going to say uh the file so we'll say full undor file undor path equals each unor dur plus um slash plus file so that gives us the file path because you have to understand um we're currently uh where is each dur eacher here in stock list so this would give us the actual um path to our file now what we want to go ahead and do is the following so uh first let's go ahead and print this right here so we'll just take this copy paste and now we want the source so we're going to say the source for the source code source code equals open normally you would this would be like a URL lib open task with a read at the end but since we're not actually parsing from the website um we're opening a file instead uh open full uncore path oops UND let's just copy and paste copy paste open that with the intention to read and then do read and then let's go ahead and print the source just to see if we're on the right track um cool save F5 to run and might take a second yeah okay so here is the entire source code so we got all of that so cool we are indeed on our right track uh let's comment out the printing of Source before we get in trouble and uh now we actually want to pull the value that we're looking for and it turns out that Yahoo very rarely changes much but uh to get this you'll have you see we have the actual thing that we're trying to gather then we've got basically a colon and this before the actual value okay uh so that's easy enough um so what we'll do is um you'll have something like this so we'll say value equals source dosit and we want to split by gather so then so whatever we're trying to gather because that's going to be um this so we want to split and this is how you split like a big block of string data you can split it up by a value and we're going to say the value we want to split it by this Orange right here is and in fact let me zoom in fancy stuff here um the orange right here see if we can get bigger sometimes it's really difficult whenever I uh that's too big um I look at this sometimes afterwards and I'm like yeah that was pretty much impossible to read since I'm film in the 1080 anyway total debt to equity the Orange is the uh gather and then we want to add this bit to it so we'll literally just highlight that we can hit copy come over here gather um plus uh does it have any quotes in it yeah it's got double quotes so we'll use single quotes paste so these double quotes here as long as you encase them in a different form of quotes you're totally fine so anyway um or you can also Escape character but we'll just this will be fine so do split by that and then when we when we have splitted by that um on what side of this split is the element we're interested in well it's on the right side so we would not use element zero we use element one which is basically this and then everything after right so let's get back to where we were uh so we want the Firth element there and then what do we do well we do one more do split and this splits a little easier uh basically we want everything like since we have all this what what would be the most sane thing to split by well pretty much like this right just the closing table data tag that's it so copy that come over here split by the closing table data tag um and then we want the zeroth element there done we have parched this table um with one line of code basically so we we we read the source and we used one line of code to get the data we needed and this over the course of a decade just simply as far as I've seen thus far has not changed on Yahoo finance okay so now let's say we want to print uh the ticker and then the debt to equity ratio so first what we want to do is we need to define the ticker that we're you know using right now and so the ticker which is like the stock ticker equals uh each dur do split by and basically we want to split by backs slash so we do back slash backs slash and then the element is going to be on the right hand side so that will give us the actual ticker then what we can do is we come down here and we can do basically print uh ticker and then plus colon comma value this will give us the ticker and the price to equity or I mean the debt to equity ratio and then what we'll do is we'll just tab over the sleep so this will give us the ticker and the debt to equity ratio uh for that company for the decade right so we'll save and run that and I forgot where we were printing out the um the directory every time we'll fix that in a second uh but anyway you can see here it is here it is here it is here it is they went to zero for a little bit awesome here it is here it is here it is all the way down to now I mean this company is in massive debt oh wait oh I'm sorry we went to AA uh we stopped it so really this company yeah see at the end this company is in a large amount of debt um since my time here it'll be interesting to see we'll have to graph Deb debt to yeah debt to equity for like all of the companies we'll have to graph that it'll be 500 elements times about probably 20 or something but it should be okay to plot up on on Matt plot lib um I'd be really interested I think all the companies I've seen so far have increasingly take on taken on a lot of Leverage and debt over the years especially very recently um which is really worrying if you ask me all of the companies are in massive debt and it's in a market that's in theory in massive debt from QE right now and it's just it's insane so anyway that's uh interesting so uh now uh we've got the ticker and the value so we've parsed the data that we want and now we need to store it and structure it in such a way that we can use it right so we've acquired the the data right but now we have to save it so we can later on later access that data but we want to save it um with all of the companies so we'll go through all the companies and save these values for them and then maybe eventually we'll go through and save them by sector or something like that but anyway that's it for this video in the next video we're going to be actually uh using pandas to structure our data and then output it to CSV so later we can access it with pandas and be very efficient so uh that's it for this video if you have any questions or comments feel free to leave them below otherwise as always thanks for watching thanks for all the sport subscriptions until next time [Music]
Original Description
In this video, we build on the previous machine learning with scikit-learn tutorial, and we're going to be pulling out the specific data point that we're interested in as using as a feature.
sample code: http://pythonprogramming.net
http://seaofbtc.com
http://sentdex.com
http://hkinsley.com
https://twitter.com/sentdex
Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from sentdex · sentdex · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Matplotlib Python Tutorial Part 1: Basics and your first Graph!
sentdex
Python Encryption Tutorial with PyCrypto
sentdex
Python's Logging Function
sentdex
wxPython Tutorials 1: Making Windows GUIs with Python : Installing + 1st window!
sentdex
wxPython Tutorials 2: Making Windows GUIs with Python: Customizing Window Parameters
sentdex
wxPython Programming Tutorial 3: Menu Bar and Menu Button
sentdex
wxPython Programming Tutorial 4: Panels
sentdex
wxPython Programming Tutorial 5: User Input Saved To Variables
sentdex
wxPython Programming Tutorial 6: Multiple Choice Input
sentdex
wxPython Programming Tutorial 7: Adding Static Text and Colors
sentdex
wxPython Programming Tutorial 8: Custom Button Images
sentdex
wxPython Programming Tutorial 9: Tool Bar Items and Sub Menus!
sentdex
Basic PHP Tutorial 13: Multi-dimensional Array
sentdex
Basic PHP Tutorial 15: Functions and Global Variables
sentdex
Basic PHP Tutorial 12: Associative Array
sentdex
Basic PHP Tutorial 14: Foreach loop
sentdex
Basic PHP Tutorial 16: Include and Require
sentdex
Basic PHP Tutorial 7: Assignment, comparison and Logical operators
sentdex
Basic PHP Tutorial 4: Variables and Comments
sentdex
Basic PHP Tutorial 11: Arrays part 1, basic array
sentdex
Basic PHP Tutorial 6: If else and else if conditionals cont'd
sentdex
Basic PHP Tutorial 1: Intro to PHP
sentdex
Basic PHP Tutorial 3: HTML with PHP
sentdex
Basic PHP Tutorial 9: While Loop
sentdex
Basic PHP Tutorial 10: Switch Statement
sentdex
Basic PHP Tutorial 2: Print and Echo
sentdex
Basic PHP Tutorial 5: If else and else if conditional statements
sentdex
Basic PHP Tutorial 8: Arithmatic Operators: Doing math with php
sentdex
Basic PHP Tutorial 17: User Input Form Example / String Manipulation
sentdex
Basic PHP Tutorial 18: HTML Entities and forms cont'd
sentdex
Basic PHP Tutorial 19: Finding words in strings
sentdex
Basic PHP Programming Tutorial 20: Saving to a File / writing and appending
sentdex
Basic PHP Programming Tutorial 22: Hashing part 2: salting
sentdex
Basic PHP Programming Tutorial 23: Variables in Strings and tokenizing
sentdex
Basic PHP Programming Tutorial 21: MD5 Hashing For Security
sentdex
Basic PHP Programming Tutorial 24: String similarity
sentdex
Basic PHP Programming Tutorial 25: Time and Time stamps
sentdex
Basic PHP Programming Tutorial 26: Die and Exit
sentdex
Basic PHP Programming Tutorial 27: MySQL Databases Part 1
sentdex
Basic PHP Programming Tutorial 28: MySQL Database Part 2: Reading From Database
sentdex
Basic PHP Programming Tutorial 29: MySQL Database Part 3: Inputting Data
sentdex
Basic PHP Programming Tutorial 30: MySQL database in Use
sentdex
Django Tutorial Web Development with Python Part 1: Installing Django
sentdex
Python Tutorial: File Deletion and Folder Deletion / directory deletion
sentdex
Python Tutorial: How to Rename Files and Move Files with Python
sentdex
3D Graphs in Matplotlib for Python: Basic 3D Line
sentdex
3D Plotting in Matplotlib for Python: 3D Scatter Plot
sentdex
3D Charts in Matplotlib for Python: Multiple datasets scatter plot
sentdex
Sikuli Tutorial 1: Visually programming in python!
sentdex
Sikuli Tutorial 2: Program visually in python!
sentdex
Sikuli Tutorial 3: Program visually in python!
sentdex
3D Bar Charts in Python and Matplotlib
sentdex
3D Plane wire frame Graph Chart in Python
sentdex
Raspberry Pi Part 1 Introduction
sentdex
Raspberry Pi Part 8: First Download and Update! (Firmware)
sentdex
Raspberry Pi Part 10: How to set up a Linux Web Server on your Pi
sentdex
Raspberry Pi Part 11: Remote Desktop
sentdex
Twitter Analysis: How to rank a user's influence
sentdex
GPIO Tutorial for Pi Part 2 - Programming the GPIO
sentdex
GPIO Tutorial for Raspberry Pi Part 1 - Setting up
sentdex
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The Python Dictionary Trick That Makes Interviewers Smile
Dev.to · Ameer Abdullah
I Compared 50 Python Courses. Here Are My Top 5 Recommendations for 2026
Medium · Python
Machine learning for beginners #5
Medium · AI
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Medium · AI
🎓
Tutor Explanation
DeepCamp AI