Python 3 Programming Tutorial - urllib module
Key Takeaways
The video tutorial covers the basics of the urllib module in Python 3, including making requests to URLs, reading responses, and handling exceptions, as well as using custom headers and parsing data with regular expressions and Beautiful Soup.
Full Transcript
Hello everybody and welcome to another Python 3 tutorial video. In this video, what we're going to be talking about is another one of our standard library modules and that's going to be URL lib. The idea of URL lib is it allows you to via Python access the internet. So just like the internet allows you to do all sorts of amazing things. URL lib is going to let you do all sorts of the same amazing things only using Python in your programming language. So with that, let's go ahead and get started. So, there's only a few, I guess, a few core things that you need to do in order to connect and and get data from the internet. Um, but then there's a slightly more advanced topics that we do need to cover with URL lib, but uh we'll get there. Uh, it's luckily still fairly simple module. So, with that, let's go ahead and get started. The first thing that you're going to have to do is you're going to need to import URL lib. Now, if you're coming from Python 2.7, you're used to just needing to do import URL lib or import URL lib 2 and that's it. Whereas with Python 3 and onward, you actually have to do you're going to be more so doing at least import URL lib. And when you do like URL open, for example, you'll have to do url request URL open and so on. But anyway, more on that in a little bit. So an example of visiting a website will be as follows. So let's say we'll define a variable as x and we'll say x equals url lib.est url open. And then in these parameters is where we specify the um address that we want to visit. You always have to uh lead this with http or https. So, for example, https uh slash uh and let's go to www.google.com. Okay. And then so that what this is going to do is it's going to make a request to that URL. Okay. And this is by default it will be a get request. So it's going to get some data and that's it. Now what we can do is we could say or let's do print x read. So we're reading the request. Uh, so we can now save and run this. And this is our output. Just a whole bunch of, you know, gobblygook text. But this is basically the source code of google.com. So, for example, uh we could open up a browser and we could go um you know to the top go google.com hit u and or control-U rather and this is the source code right so again it is just a bunch of junk here but uh you get the idea that this is what we've done is we've used Python to reach this page Okay. So, we can minimize that. Uh, let's go ahead and close out of that, too. And naturally, as as time goes on, and we're going to cover that very soon, but when you visit a URL, you're going to need to parse that page a little bit. So, you're not as interested in the HTML as you are in NEC like paragraph text, for example. You're probably going to only care about paragraph text. So, we're going to have to show how to how to handle that. And actually, how to handle that, we'll be using another standard library. So, have no fear. We'll be covering that very shortly. So the next thing I want to talk about is post. So for example, if we were to go to let's say uh we go back to where we were and we do uh the following. Let's say we want to go to python programming.net. And that's where we can get all of our um sample code if you're not familiar. But if we scroll down to the bottom, there's actually a search bar here. We could search and let's say we search for basic. Okay. And you get a bunch of search results for the keyword basic. But if you look at our URL, you see that we have some extra stuff added to the end of our pythonprogramming.net. So what do we have? Well, we've got a question mark. Then we've got a character s. Then we've got the equal sign basic and then and we and then an and sign and then submit and an equal sign in search. So with a little bit of deduction and reasoning, we could assume that submit is a variable and s is a variable and they've been defined as s equals basic and submit equals search. And that is true. So if you look at variables um or at least links that have variables in it, the first variable will have a question mark and then the variable name equals and then all subsequent variables are going to have this little and sign and then the variable equals and it continues on like that. So that's um this is an example of a get request. We're getting data based on these um well actually it's a post right we're getting data based on these posted uh variables. So let's say we want to make a post request. Now first of all you could just get do a get to this URL right? You could just use a request and put in this URL. But the other thing that we can do in the more you know Pythonic thing that we're supposed to do is go to pythonprogrammer.net add in these values and do a post to python programming.net. So let's go ahead and show how that's supposed to be done. So here I'm just going to comment this out because we don't need to be printing that out every time. Just wanted to show an example. I'm also going to comment this out because otherwise it's just going to be visiting every time and we don't need to do that. Now we're going to import another thing and that's going to be import URL lib.parse. And this is going to help us parse uh values to our post request. So um a little bit different. This one's going to be a little bit longer to get data, but uh just bear with me. So first we're going to say URL. And this is just going to be what URL what's like the base URL that we want to visit. And the base URL again always lead in with http colon slash uh we're pythonprogramming.net. That's it. Then we're going to have a dictionary and we're going to say this dictionary is called values and then uh empty parameters for now but very quickly we're going to say s and that s corresponds to basic and then comma and then our next value is if you recall submit and that submit was called search. So I don't know if I closed it or not. No we didn't. Okay. So our first thing so it's basically keyword key and value like a dictionary. Our keyword is the variable and then this is the variables uh definition basically. So that's values. Now what we do is we come down and we're going to say data. So this is going to be data from the website equals uh url lib.parse url encode. And we want to encode values. Um so first we're just encoding simply values. And what URL encode is going to do is it's going to encode it as it should be in the URL. So for example, if we go back to where we've been working and we go to like google.com and we did a search for um hey check that out. Okay, you see that it's hey plus check plus that plus out. You could also do the query is this. Hey check that out. At least usually it's not. There it goes. Okay. And you can see that it has changed now to hey percent 20 check percent 20 that percent. What's that? That's URL encoding. Percent 20 is uh the encode of space. Okay. Um and then obviously you would need to have URL encoding for like a question mark and so on. So anyway, back to what we're doing. And that's kind of why you want to do it the official way rather than hard coding because as soon as you introduce ant signs or question marks, Python is not going to necessarily know which you meant. So that's why you you want to do it the official way that we're doing right now. So first we encode values. Okay. So we've encoded values as the data that we want to post in. Then what we want to do is we're going to say now data equals data.enc. And we want to encode this as UTF8. This is just a type of encoding. Okay. So uh it basically puts your data in bytes. Now the next thing that we're going to do is we're going to say wreck for request equals URL lib. Request capital request. I can't talk. What do we want to request? The URL and then data. So first the URL then any of the data that we want to pass through and we've encoded that data and we've encoded it under UTF8. Okay. So we're going to request now from this URL pythonprogrammer.net. We're going to pass the following variables. S equals basic submit equals search. And then after rec, we're going to say restp for response equals URL lib.est URL open wreck. Okay. So now we're actually URL.est.urop. We're actually visiting the URL now like we did right up here. So the syntax is identical. And really we just see that we had to do all of this first. Uh so anyway, we done that. Uh then we're going to say restp data equals restp. And then let's go ahead and print restp data. So we're going to visit pythonprogram.net. We're going to pass through those variables. Um and then we're going to read the results. And again, the results are going to be like the source code of the results. So it's going to be a little messy, but uh hopefully we'll be able to read a little bit here. So we'll save and run it. Apparently, sometimes it takes a second to like run. There we go. So, it visits it and then here is the messy junk that we get. Um, pretty big mess. I'm hoping we can find something that's actual tech. Yeah, sorry. Here's some text here. Um, you know, paragraph Python 3 basics tutorial if something do something. This tutorial is a part of the Python 3 tutorial series for beginners and so on. So, there is, you know, there's some content there. We're not quite sure yet how to pull that content out, but nonetheless, we did visit the website and we downloaded all the data. So, what your browser does for you and what HTML does is HTML basically tells your browser how it should display data, but that's really it. So, your browser handles the HTML and makes it pretty for you and separates tags from text and and organizes things. Whereas with with Python, Python's just going to look at the source. you have no um organization here, right? So now um what I'd like to have us do is uh change one more thing. And so here we're making a request, right? So we've done a request. Uh so we've done a get really this was a get very simple just because it's that's the default. So we didn't have to change anything. And then we've made a post and we've you know made that post based on data that we've decided to set. But now we come to uh a problem that is uh something that we'll you'll fairly soon uh come across and that is whenever you want to visit a website using Python or any programming language sometimes the website owners don't like that. They don't want you on their website with a robot or a program or whatever. They only want real users on their website. So they will block you if they sense that you're not a real user. Now luckily for us, this is actually somewhat easy to fool uh basic systems. Anyways, um there are some more complex ones. Google just recently made another update um and has made it slightly more difficult to cheat their system, but still you can overcome these things usually fairly easily. It's almost like a I don't know, some sort of filter, right? If you're not good enough, you can't use it, but if you're good enough, I guess you can you can use their services still with your program. Um, that said, usually websites that block your access, they do it because they offer an API and they want you to use their API. Google offers an API. So, try to use Google's API before you cheat Google. Try to use Wikipedia's API before you start cheating Wikipedia and just, you know, programming a way around it. Um, because the API is going to make it easier on Wikipedia and it's going to make it easier on you, too. Uh, because they don't need to send uh all of the HTML data. They don't need to send serve advertisements, right? because your program isn't going to read it. Um, that kind of stuff. So, you do have to kind of pay a little bit of attention there as far as if they have an API or not. Now, moving along. Um, now we're going to need to uh I guess we'll show an example. So, for now, I'm just going to comment this stuff out so we're not doing that over and over. And now I'm going to come down here. And first, let's go and make a try and accept here. We're going to say try uh x equals url lib.request URL open. And the URL we're going to attempt to open is https slash uh www.google.comarch and then question mark. So we're defining a variable here. Uh we're going to say Q equals test. So Q stands for query uh for Google. So we're going to attempt to visit this URL. So this is a search request for the string text. So this is as if you had went to Google and you typed in test in the search bar, you hit enter. Um so we're going to attempt to do that. So now uh that we've done that we'll come down here and we'll say uh save file equals open. Um actually, you know what? I don't even want to do this. I am positive this will fail. So now instead what we'll do is we'll just do um print x read that should be enough and then we're going to come down here and we're going to say except exception as e and then we're going to print string e. So what we're going to do here again attempt to visit Google do a a search query then we're going to read the results the source code of the results and we're going to try to do that otherwise we're going to throw the exception as e and then we'll print the string version of that exception. So let's go ahead and run that and just see what happens to us. So we run that and we get http error 403 forbidden. We're forbidden because Google says hey you're a program and we're going to go ahead and say no. Okay. So if you happen to find yourself in this situation, here's how you get around it. Um, so we try to accept that, we fail. Now, what can we do? Uh, so let's make some more space. And now let's switch this up a little bit. We're going to do try. And I guess what we'll do, we'll use some of the same code up here, but we'll retype it. Um, so we're going to try URL equals, and we'll use the exact same URL. So I'm just going to say URL equals um this. So copy paste the URL. Then we're going to go ahead and say headers equals an empty dictionary. And what headers are headers are way basically what the data you send in. You send in a header every time you visit a website and it contains information on you. Um who you are, your IP, uh your browser, your like operating system, all kinds of stuff. which sends in a bunch of information on you. And so within your headers, uh there is a data piece of data that is called user agent. So now that we've got headers defined, let's make some more space. And we'll say headers um and then square brackets to define a a piece of data in this dictionary. And we're going to call this piece of data user- agent. So user agent is the type of browser basically that you are using. So in our case what uh Python does is it says python- url liib slash and then your python version. So for me it would be 3.4. So within almost an instant when you visit a website with python using the methods that we've shown so far that website knows exactly what you are. They know you're a program. So it's very easy for them to shut you down because you send in the in you basically say hello knockk knockock. My name is Python, right? And and they and then their servers say hell no. And they shut you out. So now what we want to do is we're going to say user agent. And the user agent that we're going to use is a little long. I don't want to have to type it out. So I'm going to go ahead and just copy and paste it uh like this. Uh and I'll just put a link to it in the description if I happen to forget. Uh someone remind me, but hopefully I won't. So paste. And we get this long user agent here. You can't even really see it all in my window, but this is it. Okay. Okay, so super long user agent. Basically, uh this tells um well, this acts like we're using Mosilla and then it gives all this other information and all this compatibility stuff. Um and so basically it just said it just changes that we're no longer are we def uh announcing oursel as Python. Sorry about that. I have no idea where I left off. I'll just kind of start at this point here. Turns out my dog knows how to open a sliding glass door. So, he was running around in here when he uh shouldn't have been. That was very surprising. Anyway, um yeah. Okay. So, here we're just we're we're replacing our uh user agent and so in an attempt to fool Google. So, we'll see if we can. So there's our user agent. Now we're going to do basically the same thing we did before. And we're going to say wreck equals URL lib do request capital request. Uh and then we're going to make the request to the URL. And then we're going to go ahead and remember before we said data. Well, we're not going to pass through any data uh here because um we're like hard coding it in, right? Uh, under normal circumstances, you would maybe say like if we're making a post request, we could do that and then we would add in the whole search data or the values and make the post. But instead, we're just going to hardcode this for now. Feel free to mix them on your own time. Homework assignment uh URL. And then we're going to say headers equals headers. Okay. So, we're telling uh Python now to visit this URL. And instead of setting our normal headers, the default parameter headers, we're gonna change these up and call the headers this. Now, in my opinion, it just kind of makes a little bit of sense to eventually go into URL. That's a function. And in that function, it has function parameters and they define a a default value for headers. Why not just go in there, edit uh the URL lib function there and make this your default header? Just a thought for some of you guys. Anyway, moving right along. So now we that's we've defined what the request is. Now we're going to say response equals URL lib.est URL open. And the URL we want to open basically is request with the following thing as our headers. So that's our response. Now, we're going to say restp data equals restpread. And now, the amount of data here is actually kind of big, right? Because there's a whole search result page and all the HTML that goes with it. It's very big and bulky. And so, if we run this right now and we were just to print it out to console, um it would lag the console fairly well. Um so, we don't want to do that. So instead what we'll do is we're going to call we're going to say save file equals open and we're going to call this file with headers.ext. We're going to open it with the intention to write. Then we're going to say save file.right. And then we're going to write uh we have to write the string version of rest data because right now the response data isn't uh in string format. So that's also kind of newish if you're coming from Python 2.7. Um, and then of course we need to do save file.clo. Now the other thing we have not done is we did a try and we have no except yet. So we're going to say except exception as uh e and then we're going to go ahead and print string e just in case we throw an exception. Hopefully we don't. If unless I screwed up or something uh we shouldn't. Um then we're going to come over here. This is where that file will go is it'll just go right over here. And we should be ready to run this. So, let's go ahead and save and run that. And the first one will throw I can't remember. Yeah. Okay. So, the first one throws the forbidden. Yeah, because we're still trying this. But the second one worked because we didn't see a forbidden. We come over here. Here it is with headers. We can open it with Notepad++. And here's all of our data. So, obviously, it's a bunch of junk, you know, but this is all search results. there were some images there. Um eventually we could maybe get to some some sort of text or something. But anyway, this is a huge mess. Um Google results are pretty messy. Um but anyway, we were able to get by Google's little filter for uh for just anybody, right? So, but if you if anybody had just read the documents, you'd find out how easy it is to change your headers. But a lot of people don't read documents. So, I guess that's that's why. So that's going to conclude the basics of URL lib. Now again the data where we're fed back is just this huge mess of data like what do we even do with all this data. So then you have to kind of parse through the data. So the next thing that we're going to need to learn is uh regular expressions to actually parse through this data. Now regular expressions are kind of scary to people sometimes uh mostly because it's its own programming language entirely. So everything you know about Python up till now doesn't mean anything for when it comes to regular expressions. Um but luckily regular expressions being their own programming language basically um are transferable pretty much anywhere you go the rules of regular expressions will remain. So once you understand the logic of regular expressions you can take it to any language. It's a lot like SQL right if you learn SQL or as the cool kids say it SQL um it's it's its own programming language and you can take it anywhere uh to any other programming language and work with SQL or SQL whatever you want to call it. So anyway, getting a little ahead of ourselves, but I do want to say that we're going to be covering regular expressions very soon. And then after we cover it, we'll mesh regular expressions with URL libs. So a lot like your basic programs are just a combination of very or your complex programs rather are just a combination of very basic tools. Even some of these really complex tasks are a lot of times just a combination of really basic modules and tools that you already have. Maybe not if statements and all that, but you know, URL lib plus regular expressions equals a pretty darn good website parser already. You could also use something like beautiful soup, but if you look into beautiful soup, most of what beautiful soup is is URL lib and regular expressions. So anyway, uh that's going to conclude this video. Uh if you guys have any questions or comments about URL lib, please feel free to leave them below. If you guys have any requests about um more information on URL lib or some of the other built-in site packages or even thirdparty modules that you want me to cover film in the series right now. So if you happen to do it uh you know fairly recently to when this video is posted um I'll probably be able to include it in the series. So anyway uh that's it as always. Thanks for watching. Thanks for all the support and subscriptions and until next time.
Original Description
The urllib module in Python 3 allows you access websites via your program. This opens up as many doors for your programs as the internet opens up for you. urllib in Python 3 is slightly different than urllib2 in Python 2, but they are mostly the same. Through urllib, you can access websites, download data, parse data, modify your headers, and do any GET and POST requests you might need to do.
Sample code for this basics series: http://pythonprogramming.net/beginner-python-programming-tutorials/
Python 3 Programming tutorial Playlist: http://www.youtube.com/watch?v=oVp1vrfL_w4&feature=share&list=PLQVvvaa0QuDe8XSftW-RAxdo6OmaeL85M
http://seaofbtc.com
http://sentdex.com
http://hkinsley.com
https://twitter.com/sentdex
Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from sentdex · sentdex · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Matplotlib Python Tutorial Part 1: Basics and your first Graph!
sentdex
Python Encryption Tutorial with PyCrypto
sentdex
Python's Logging Function
sentdex
wxPython Tutorials 1: Making Windows GUIs with Python : Installing + 1st window!
sentdex
wxPython Tutorials 2: Making Windows GUIs with Python: Customizing Window Parameters
sentdex
wxPython Programming Tutorial 3: Menu Bar and Menu Button
sentdex
wxPython Programming Tutorial 4: Panels
sentdex
wxPython Programming Tutorial 5: User Input Saved To Variables
sentdex
wxPython Programming Tutorial 6: Multiple Choice Input
sentdex
wxPython Programming Tutorial 7: Adding Static Text and Colors
sentdex
wxPython Programming Tutorial 8: Custom Button Images
sentdex
wxPython Programming Tutorial 9: Tool Bar Items and Sub Menus!
sentdex
Basic PHP Tutorial 13: Multi-dimensional Array
sentdex
Basic PHP Tutorial 15: Functions and Global Variables
sentdex
Basic PHP Tutorial 12: Associative Array
sentdex
Basic PHP Tutorial 14: Foreach loop
sentdex
Basic PHP Tutorial 16: Include and Require
sentdex
Basic PHP Tutorial 7: Assignment, comparison and Logical operators
sentdex
Basic PHP Tutorial 4: Variables and Comments
sentdex
Basic PHP Tutorial 11: Arrays part 1, basic array
sentdex
Basic PHP Tutorial 6: If else and else if conditionals cont'd
sentdex
Basic PHP Tutorial 1: Intro to PHP
sentdex
Basic PHP Tutorial 3: HTML with PHP
sentdex
Basic PHP Tutorial 9: While Loop
sentdex
Basic PHP Tutorial 10: Switch Statement
sentdex
Basic PHP Tutorial 2: Print and Echo
sentdex
Basic PHP Tutorial 5: If else and else if conditional statements
sentdex
Basic PHP Tutorial 8: Arithmatic Operators: Doing math with php
sentdex
Basic PHP Tutorial 17: User Input Form Example / String Manipulation
sentdex
Basic PHP Tutorial 18: HTML Entities and forms cont'd
sentdex
Basic PHP Tutorial 19: Finding words in strings
sentdex
Basic PHP Programming Tutorial 20: Saving to a File / writing and appending
sentdex
Basic PHP Programming Tutorial 22: Hashing part 2: salting
sentdex
Basic PHP Programming Tutorial 23: Variables in Strings and tokenizing
sentdex
Basic PHP Programming Tutorial 21: MD5 Hashing For Security
sentdex
Basic PHP Programming Tutorial 24: String similarity
sentdex
Basic PHP Programming Tutorial 25: Time and Time stamps
sentdex
Basic PHP Programming Tutorial 26: Die and Exit
sentdex
Basic PHP Programming Tutorial 27: MySQL Databases Part 1
sentdex
Basic PHP Programming Tutorial 28: MySQL Database Part 2: Reading From Database
sentdex
Basic PHP Programming Tutorial 29: MySQL Database Part 3: Inputting Data
sentdex
Basic PHP Programming Tutorial 30: MySQL database in Use
sentdex
Django Tutorial Web Development with Python Part 1: Installing Django
sentdex
Python Tutorial: File Deletion and Folder Deletion / directory deletion
sentdex
Python Tutorial: How to Rename Files and Move Files with Python
sentdex
3D Graphs in Matplotlib for Python: Basic 3D Line
sentdex
3D Plotting in Matplotlib for Python: 3D Scatter Plot
sentdex
3D Charts in Matplotlib for Python: Multiple datasets scatter plot
sentdex
Sikuli Tutorial 1: Visually programming in python!
sentdex
Sikuli Tutorial 2: Program visually in python!
sentdex
Sikuli Tutorial 3: Program visually in python!
sentdex
3D Bar Charts in Python and Matplotlib
sentdex
3D Plane wire frame Graph Chart in Python
sentdex
Raspberry Pi Part 1 Introduction
sentdex
Raspberry Pi Part 8: First Download and Update! (Firmware)
sentdex
Raspberry Pi Part 10: How to set up a Linux Web Server on your Pi
sentdex
Raspberry Pi Part 11: Remote Desktop
sentdex
Twitter Analysis: How to rank a user's influence
sentdex
GPIO Tutorial for Pi Part 2 - Programming the GPIO
sentdex
GPIO Tutorial for Raspberry Pi Part 1 - Setting up
sentdex
More on: Tool Use & Function Calling
View skill →
🎓
Tutor Explanation
DeepCamp AI