Python Code Review: Refactoring a Web Scraper, PEP 8 Style Guide Compliance, requirements.txt

Real Python · Beginner ·🛠️ AI Tools & Apps ·9y ago

Key Takeaways

The video demonstrates a Python code review, focusing on refactoring a web scraper to comply with the PEP 8 style guide, using tools like flake8 and Black for linting and code formatting, and improving code quality and readability. The review also covers best practices for using requirements.txt, README files, and version control with Git.

Full Transcript

hey Milton thanks for sending me your coat I'm gonna take a look at it now so basically what I've done is I've cloned your repository I've got it here so we can take a look in my editor and I've also got your github page open and I'm basically just gonna go through the whole thing and mention things that stand out to me or you know that that just catch my eye so the first thing that I noticed here is that you have the pie cache folder committed into your repo and when you when you look at that you can see that it has all these temporary files that not the Python interpreter generates when it runs your program and they're actually you know they're temporary and they can be regenerated or just a cache for the byte code that the interpreter generates so usually you don't want to put those into the repository now I was actually surprised that they're in there because when I looked at your git ignore file and I can bump up the font size a bit when I looked at your git ignore file I saw that you have the pie cache folder in there and you also have a reg X here or a pattern for the PI C files so normally they you know this should have just worked and shouldn't have included that folder but probably what I would recommend is that we'll just delete that folder now because I'm suspecting that it was added before you updated the git ignore so what I would probably do well first of all I would create a new branch here and then we're going to delete that folder okay we need to do that recursively all right so we've just got rid of that and then I'm seeing an empty readme here usually what I try to do is you know have some basic information in my readme files so what I what I personally feel is super important is like okay how you know how do I actually run this thing because here there are two files I don't really know how to run this I mean one of them is called main script file it's probably the one we want to take a look at it would really help if you could just drop you know a quick line like hey this is the per program it does the following things and this is how you run it if you have tests this is how you run the tests this is how maybe before you even explain how to run it you could explain how to install it because I'm seeing a requirements txt file here so that probably means we need to install some of these dependencies so I would just call that out and another thing you may be you may want to add is just license information and I've got a longer blogpost on on that topic that I'm gonna link in the description but basically what what that blog post is about is you know it's like a tutorial how to write a minimal github readme file I think this would really help you out make this more awesome so be sure to check in the description later alright cool so the next thing I'm seeing here yeah so what I want to mention about the gate ignore I think it's really good that you have a git ignore in there and it looks reasonable to me so we just need to get rid of that pie cache folder looked at the readme and then now we have the main script file writing data on CSV pi and requirements txt so the naming here is interesting I think on the one hand I like that it's called main script file because well you know it really explains to you what the main script file is on the other hand maybe a better name would be to just name it the way your to just give it the name of your program right if your program is called scrapper maybe you just want to call that scrapper dot pine and I think would be really obvious that it's the main file I think that would be a more conventional naming structure but you know no big deal like I like explicit names so I think this is kind of cute right so with these two so I took a quick look earlier and it looks like they're not really being so it looks like they're two separate programs right or this is like a test where we have this or not a test but like some experiment where you're where you were writing that right on C's female CSV method so what I'm gonna do is I'm basically just gonna can ignore this file and just focus on main script file and we're going to do that in not in github but we're gonna do that in sublime text my favorite editor so all right let's pull that up I've already bumped up the font size a bit and I think we can get rid of the terminal now okay so let's make this a little bit larger still see you can see that well okay so a couple of things that I'm noticing immediately because I'm running a linter sort of embedded in my sublime text setup so linter is like a little tool that checks your code and it's gonna call out stuff dad goes against common best practices or goes against formatting standards and it also catches stuff that are just you know plain errors like a syntax error and I'm using the sublime linter plugin with sublime lint or flake eight and I find this super super helpful whenever you know I'm writing Python code because it's basically it's gonna really help you avoid some of the the things that that I'm seeing in your code here you know before we dive into that I want to say it's pretty cool that you're actually structuring this stuff in two separate functions what I what I'm seeing a lot of times is where people you know they just want to write a simple script quote-unquote and I've done that myself right we restart start implementing hacking away on a simple script and in three four hours later or like a week later it's actually like this super complicated thing that doesn't have a lot of structure to it so I'm really enjoying the fact that you you actually you know have some structure here and you split this up into separate into separate functions which which is cool so right okay so what I said you know you can see there's a lot of yellow here and that is stuff that the linter is calling out and let's just go through that really quickly so the first thing I'm noticing here is that your naming is a little bit inconsistent so here your functions they use these the snake case layout and then here that function is almost named like a class I think that's it's a bit odd so I would probably just rename that and call it generate HTML space missing you don't really need those lines here I think they make it harder to read what's going on so like typically what the Python style guide we recommend is just you know to get rid of these lines and we can do that here everywhere okay cool and then you know we had this inconsistent spacing between functions all of this stuff if it happens a couple couple of times it's not that horrible but I think what it it's really good to clean that up and make it consistent because then your brain doesn't have to focus on that I always like to think of a programmers brain almost like working almost like a compiler right because if I want to understand what your code does I need to do the same things that a compiler would do right I need to parse it I need to understand sort of the structure of the program and then I also need to kind of build up this mental model in my head of what the program actually does and if you have like things like inconsistent spacing I think it makes a lot harder to get a good feeling of how the program is structured quickly so I am really a stickler for using you know formatting best practices or some kind of style guide and now that we have these tools is actually really easy to stick to a style guide so in your case what are in the case of looking at your code what I would do is like just make that a little bit more regular right so so we've we've figured out the spacing here and now just a minor thing so this is using tabs and it's so that that would look okay right because using one tab indentation here but then here we're using to tap two tabs to indent the code and that just becomes a little bit odd so what I'm gonna do here with this whole block I'm just gonna move that over to the left and then I'm gonna actually convert that to spaces because that's that's usually the recommendation that the Python pip eight style guide gives alright so that looks a little bit cleaner what I like to do with the imports here as well and this is something that I do I don't think this is in pep 8 style guide for example so what I like to do is I like to split these up into imports that are part of the Python standard library like the CSV module and the OS module and then third-party imports because that really helps me understand you know what does this code actually rely on like which external libraries do I need for to run that and then another thing that I'm seeing already um is that ok so this includes requests and beautifulsoup and in our requirements there's a lot of other stuff so it might be possible that this is actually getting pulled in by beautifulsoup but I don't think so so there's a lot of stuff you know in your requirements txt that doesn't actually relate to what your program needs and I would usually I would really recommend that you try to keep the requirements txt as minimal as possible so in this case you might actually be able to get away with just putting requests and beautifulsoup and you would kind of get the same the same result right ok all right so back to the formatting to get that out of the way there there are a couple of things that you know just catch my eye here and this is the in consistent formatting again with how you're spacing out your your arguments here I know an all of this stuff it's not bad like this is just part of learning how to write Python I think we're in the beginning you don't notice that stuff right because your your brain isn't used to but like this stuff is really jumps out at me and it it makes it a lot harder to actually read the code understand what's going on so I'm gonna I'm gonna change the formatting on those yeah so what I probably would probably would do here is because print is just a function in Python 3 and you mentioned this is Python 3 code it would just format that like any other function call and in here the extra parentheses they don't actually do anything so we can get rid of those as well so this looks pretty clean now parsing and scraped data ok so another thing that I'm noticing noticing here is that see how the naming all of a sudden is different so we had in HTML and right on CSV and now here it's parsing and scrape data so I would probably just change that to a more like active verb and just say parse and scrape data we passed the raw raw data and again here it would space out the arguments okay so this is gonna this is gonna make me sound like a horrible person but I I really like having having consistent quotes for Strings because again I find a little bit confusing to see you know how we have these strings highlighted here and and they're using single quotes and we get the double quotes here so I would try and make all of those consistent I know in this case you probably did it because you're you know if you using the double quotes you can just use a single quote and I think that's fine you know what you just got to make a decision what kind of quotes you want to use so in this case why don't we use the double quotes everywhere and then again the structure becomes a lot more regular and I can just parse that out mentally a lot easier because it's all the same color in my syntax highlighting and it's just easier for me to understand what the code does and and mentally parse it out so again we can rid of get rid of these parentheses and then here so you're you're doing this thing where your assignments there they're not formatted consistently and and that stuff also I which is try and make it consistent right because if if that's something you can you can pick up and just train yourself to do consistently I think it's gonna have a huge impact and just the cleanliness on your cut of your code and it's just gonna make it a lot easier for other people to read right so we're gonna gonna form that differently and with all the stuff or that I mentioning you know it's actually being called out by the linter right so when I select that piece and you look here I'm not sure if you can read that in the video but um it actually says but Poppa missing whitespace around operator so this is something that a linter can catch so if you just you know install a linter in your editing environment or even if you just run it from the command line so you can actually do that really easily just doing a pip install flake eight and then even if you don't want to go through integrating flake eight into your editing environment you can just run that from the from the command line right it's gonna call out the same kinds of errors and style guide violations so I find this super helpful because it's really gonna help you train avoiding these kinds of formatting errors and it's also gonna know things gonna mention these these extra parentheses but you know it's just gonna make a lot make it a lot easier for you to to avoid these formatting inconsistencies it's probably a typo here yeah that's okay you're on XYZ URL okay cool all right let's take a look at right on CSV and then again what I'm gonna do here is just make all of the the string quotes consistent hmm wait how did I do here okay yeah just needed an extra quote here Doon okay and there's a missing space and then I would dent that a bit differently okay oh yeah error whitespace before let's just want to open that file okay so now there's actually Lintz cleanly right there know linter errors being called out here which which I think this is good you know this is this is really this is the kind of code you should be committing I think because then you know it's really consistent and all of your files are gonna sort of look the same and then you can really focus on the actual structure off the code or the the actual problem that code solves okay so let's take a look at that um what makes this a little bit challenging here is that I don't really know how to run this right so this is main script file it kind of looks like a module that exports a bunch or defines a bunch of functions but I don't really know how they're working together so I'm assuming you you probably run this from from the interpreter somehow they make it work together and that's totally fine right it's actually I think it's the right thing what you're doing here we're just throwing that into a module and then you can import that and work with the interpreter directly but for me it's really hard to actually understand how how these pieces work together so again what would we help there is updating the readme you know just putting there how to actually run this and then what you can do if you have a main script file what you could do is you would typically define or have like a top level if statement that checks if if someone is running that script file directly from the command line with the Python interpreter just type that out right see I'm already starting to use inconsistent quotes because I usually use single quotes but whatever I just like to do be consistent okay so now what happens is when you run this script oh okay so this is interesting now that we actually there's another thing right so now that we actually tried to run this it we don't have requests installed and I did actually install the requirements but it looks like your requirements doesn't actually have oh hold on okay my bad was just my bad I currently forgot to install oh okay it didn't go through right so the problem here is that the requirements don't install cleanly and should it check the error message so this my sequel connector um can't be installed so it looks like right now what do you have in your requirements txt at least on my machine I'm not able to install that so something that could help you with that and that's maybe stuff for a little bit further down the road is just having some kind of automated built set up that whenever you make a commit in github it just tries and install all the requirements and run your code from scratch on a new machine and there are free services you can use you can use Travis CI for that for example it's a great service that I would recommend is actually free for open source or a public repos on github and then you're gonna catch mmediately you know something like that happens where it might not it's probably not your fault even it's just the fact that there was a new release of my sequel connector or some other dependency and they don't play together well and you're gonna be able to catch that if you have automated bills for that but you know again maybe that's something you want to look into in the future alright so back to what I wanted to show you in the first place I'm kind of jumping all over the place here but I still hope it's gonna be helpful when you watch the video in the end so what I wanted to show you is that right okay we need beautifulsoup as well more dependencies so we just want to install beautiful soup for okay alright so now when I run this like so Python main script file up PI then what happens is if you run a script like that if you run a script like like this through this method Python sets that underscore or that done their name variable to dunder main so you can use this as a check to see if someone runs your script like a program because if I do import main script file it doesn't execute that line right because we're not running it like a program so this is super useful in the situation that we're in right now where you want to use a module like that just to work with it from the Python interpreter which which I imagine is what you're trying to do but you also want to provide kind of you know a way to run this as a standalone program so you can use something like this to do that and then you would kind of to go to do you know do something useful with this right you probably call like that's the HTML first and then parse it and scrape it right so you could you could put that stuff here and then you kind of get the best of both worlds but yeah that's just a tip and you could look into that in the future okay so I'm seeing a bunch of print statements I think that's fine in the future you may want to log a look into the logging module but I think would honestly would probably be overkill for what you're doing at the moment so I think it's totally fine all right so this fetch is this thing here I kinda like what you're doing here were you giving this a name right because our dot content it's kind of hard to know what that is and if you're as you're giving it a name then we know that you're returning the HTML for that URL I would probably still shorten that because it's kind of in the name so I would probably just do return our content and then you could put a doc string here and that says you know fetches base URL fetch base URL and return the HTML content or something like that right and then I'm not sure if you want to call that generate HTML maybe fetch HTML because you're actually doing an HTTP request to fetch that I'm not sure if you want to call that base URL because usually what I would think a base URL is you know it's some kind of part of a URL that then becomes the full URL when you tack something on at the end almost like a prefix so maybe just call that URL I would probably rename it and just call it URL I think would be fine okay now okay we're not I don't think we're using the generate HTML proper flow okay so we don't need to rename it any anywhere else okay so and then I think the way this works is yeah would enter here and then this would use right on CSV somewhere here right okay parson scrape data I really like the name I think it's pretty cool because well actually what's the difference between parsing and scraping yeah we parse it out and then figure out the parts that we actually need so you know one could argue here that it would make more sense to split this up into two functions where like one is okay we're gonna parse it and then the another function would actually pull out the interesting bits so again you know it's not a super long function which i think is fine but if you want it to be super clean you could think about splitting it up okay so let's look at this alright so this is just crazy parser mm-hmm we're just iterating over that so I think it would be fair to actually get rid of that local variable and just to say four items in soup find all because I'm not sure if this is actually you know a name like G GData J I'm assuming is something like global data or something it doesn't really give me a lot in terms of understanding so I would probably get rid of it just to not confuse readers but could go either way that's probably what I would do okay so we're creating a new dictionary here and din yeah with this stuff I think would actually be fair to just do this in inline right and instead of saying a tag I think it would be fair to just do this cuz otherwise my brain has to keep track of a couple more variables and here you know maybe this thing that should also be called just item because it looks like we're actually looking at one item would make more sense to me but you know like the naming stuff is usually really hard because I don't have a lot of or you know when I'm looking at this I'm just kind of renaming things here and your coat then for me it's really hard to know what the background story is there what this actually does like I'm assuming you have some kind of specific website in mind that would that would be that we're scraping here so but yeah I would I think this would be fair you know to have this dotted access go to levels deep I think this would be totally okay print link alright okay this is the link yeah this is like a debugging statement yeah if you want to keep that and I think it's okay product details alright and then we're setting this up so again here what I would probably do because we're you know we're creating this structure here and then we're not really doing much with it so I think it would be fair to say to do this actually um I think it would be fair to do this oops we're we're not creating the the dictionary ahead of time but we're just creating it with a dictionary expression right like this and then I think it becomes clearer what's going on here because otherwise I'd have to remember okay we're creating an empty dictionary up here then we're not doing anything with it and then at the end we're just kind of populated so I would probably do it like this and then what you can do here is you can end always end those lines with a comma because that Python allows you to do that and you don't have to remember adding an extra comma when you add a new line at some point so it's something I like to do in my code but it's like super tiny unimportant tip it anyway okay so we're doing this we're passing this through to to write on CSV and I just want to make sure I'm running flake eight cleanly still yeah okay so let's take a look at right on CSV so we're passing so again so we're using the name data here for this thing and we're using that in a bunch of places right remember how this guy takes raw data and we also had G underscore data so I think what's interesting here is that we actually pass the product details to write on CSV I would probably just say instead of calling this thing data we just call it product details right and then here probably I don't know what this really takes but I imagine this will actually take the the HTML right so instead of calling this raw data you could think about calling that pay date page HTML or something like that but let's leave it leave it like that for now because I don't actually know what goes into it okay so with this part file exists equals oh is path is file and so is file tests whether a path is a regular file which kind of goes contrary to what this thing says here we're saying file exists so there is also an exists function in OS don't pass the test whether a given path exists which sounds like that's so what okay wait what's this doing oh okay I guess we're checking if we need to append the header right yeah let's think about that forum for a second here hmm so I'm not gonna go too deeply into this but I would think about mode probably because so what can actually happen here is that you can have a race condition right where you're checking if the file exists but what could actually happen is that your program here is preempted for another process or threat to run and then that other process is gonna delete Google results dot CSV and so you're assuming it still exists but then you're opening the file and the file you actually get doesn't have that header in here and you're not writing the header because you did that check earlier you know and this sounds this may sound like the OL you know what this would actually never happen in the real world but you know it might happen like if you're running if you imagine like later maybe this code you know becomes bigger and bigger and eventually you want to run like a hundred or a thousand scrapers at the same time and then they're gonna you're gonna get into these situations where this kind of stuff is is really gonna lead to some really really tough to the bug problems and and believe me I've been there so that's why I'm mentioning it so I think we would have to think about doing this a little bit differently and at top of my head I don't really know the best way to go about it I mean it's probably fine for for a small program like that for the moment but just you know as a heads up it's potentially something you want to look into I'm thinking maybe it would be better to just you know see how how large the file is if it's completely empty and there might be yeah there might be another way to do it I really know what the best way would be to go about it but I'm pretty sure that this could you into trouble in the future so maybe just something to keep in mind and also I'm not sure if you want is file or if you want dot exists and then also we were using this string here a couple of times so this is actually something I would I would just pass in here and maybe give it a default right because then you get the same result but you don't have the danger of accidently having a typo in the second line here and then this would you know the file exists check would look at another file a completely different file potentially then the then here the context manager when you open that file so again it's you know makes it a little bit easier to avoid that kind of inconsistency um alternatively we could have made that constant as well but I think seeing that this is a function I think it's fair to pass that in here okay so we're opening at CSV file we have the field names here um I think that's fair I potentially could also do that in line III would probably do that in line and I would have constants for these guys right so you could say you could do something like that and then everywhere we're using title I would do this and then with Python 3 you could potentially do oh wait this is actually something else so I don't know what the how dick writer really works but oh okay see this is a little bit more ok so let's actually do this a little bit differently okay this guy this key here is called links and we pass link in here which is a little bit odd I'm sorry probably renamed that as well just to make sure it's consistent and then I would like I said earlier I would say field title and I would do this give it name here and then I would also do this and then here so this is a list which means it's mutable probably want this to be a topple and I think big writer can can deal with a tuple in this case so I would I would probably do something like this and just change the formatting slightly I think that would be fair or depending on how you want your formatting to go you know could also do it like this there's different ways to do that yeah maybe that looks pretty as long as it's consistent doesn't really matter okay so I think that's little bit clearer because now what can't happen is that we use in consistent naming for these guys right because otherwise it would be really easy to actually say whoops really easy to actually say okay this is the tiles field and we're using a different a different key here than we use we'd be using down there so it really helps to make that consistent and you could look into do it making that a Python 3 enum as well but I think you know doing it like that would probably be fine you could think about instead of making this a dictionary you could think about making this a named tupple which would give you a little bit more naming guarantee so that I basically defined like a product details name couple up here and then you could use it in all those places and you would have a little bit more security in terms of how your data structures what they look like instead of passing around a dictionary but I think you know even doing it like this is a little bit better because then you you you you have a gear a better the chance of using the same the same names for keys and stuff so I probably do it like that ok again like the file exists check here yeah we talked about that earlier and then we're writing out that one row and we're printing something out here I mean it's pretty straightforward right so all in all it changes but I wasn't meant to mean spirit I think you know your program probably worked before I couldn't really test it out but I hope you you learned something from from that quick code review and just kind of me going through this and like restructuring everything or it was mostly formatting stuff but I think again you were on the right track and yeah I like the fact that you have sort of a basic structure in here I I really like surf the the data flow through the whole program that was really good and my number one recommendation for you in terms of how can you get to the next level as a Python programmer look into tools like flight eight I think it's gonna have a huge impact it certainly had a huge impact on my ability to write code and to write clean code and to work better with other developers and work better on open source projects because you know if if all that formatting discussion is out of the way then it just becomes a lot easier to scale your programs make them bigger and bigger and work with other people so really really look into that I think that's probably the number one thing I think this would really help you become better as a Python developer and it would just by the fact that would free up a lot of mental capacity because your brain doesn't have to parse out all of that inconsistent formatting you're gonna have a lot more brain cycles available in your brain python compiler to to actually focus on the important stuff and and yeah you know best of luck I hope this was useful useful and I'm gonna just create a pull request for that and you can take a look at that or you know pick up some things that that you find helpful or not cool all right thanks Milton and have a super fantastic day see ya

Original Description

https://dbader.org/python-mastery ► How to become an effective & productive Python developer Python Code Review: Unplugged – Episode 2: Code Review for Milton This is a Python code review I did for Milton's Python project on GitHub. Milton is definitely on the right track with his Python journey. I liked how he used functions to split up his web scraper program into functions that each handle a different phase, like *fetch the html*, *parse it*, and *generate the output file*. The main thing that this code base could benefit from would be consistent formatting. Making the formatting as regular and consistent as possible really helps with keeping the "mental overhead" low when you're working on the code or handing it off to someone else. Besides formatting, the video also covers things like writing a great GitHub README, how to name functions and modules, and the use of constants to simplify your Python code. Again, I left the video completely unedited. That’s why I’m calling this series Code Review: Unplugged. It’s definitely not a polished tutorial or course. But based on the feedback I got so far that seems to be part of the appeal :D The article I mention in the video: "How to write a great GitHub README" » https://dbader.org/blog/write-a-great-readme-for-your-github-project Here's some more background info about this video: https://dbader.org/blog/python-code-review-unplugged-episode-2 FREE COURSE – "5 Thoughts on Mastering Python" https://dbader.org/python-mastery SUBSCRIBE TO THIS CHANNEL: https://dbader.org/youtube * * * ► Python Developer MUGS, T-SHIRTS & MORE: https://nerdlettering.com FREE Python Tutorials & News: » Python Tutorials: https://dbader.org » Python News on Twitter: https://twitter.com/@dbader_org » Weekly Tips for Pythonistas: https://dbader.org/newsletter » Subscribe to this channel: https://dbader.org/youtube
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Real Python · Real Python · 18 of 60

1 A better Python REPL – bpython vs python interpreter
A better Python REPL – bpython vs python interpreter
Real Python
2 Introducing large-type.com – A Utility Website
Introducing large-type.com – A Utility Website
Real Python
3 Reading Hacker News Without Wasting Tons of Time
Reading Hacker News Without Wasting Tons of Time
Real Python
4 Forward References and Python 3 Type Hints
Forward References and Python 3 Type Hints
Real Python
5 Using Sublime Text as your Git Editor
Using Sublime Text as your Git Editor
Real Python
6 Python Code Linting and Auto-Complete for Sublime Text
Python Code Linting and Auto-Complete for Sublime Text
Real Python
7 Make your Python Code More Readable with Custom Exceptions
Make your Python Code More Readable with Custom Exceptions
Real Python
8 Write Better Tests with Sublime Text's Split Layout Feature
Write Better Tests with Sublime Text's Split Layout Feature
Real Python
9 How to Use Sublime Text from the Command Line
How to Use Sublime Text from the Command Line
Real Python
10 Rename Variables with Multiple Selection in Sublime Text
Rename Variables with Multiple Selection in Sublime Text
Real Python
11 Sublime Text Settings for Writing PEP 8 Python
Sublime Text Settings for Writing PEP 8 Python
Real Python
12 Write Cleaner Python with Sublime Text's Indent Guides
Write Cleaner Python with Sublime Text's Indent Guides
Real Python
13 Sublime Text Whitespace Settings for Python Development
Sublime Text Whitespace Settings for Python Development
Real Python
14 Function Argument Unpacking in Python
Function Argument Unpacking in Python
Real Python
15 Python Code Review: Debugging and Refactoring "Conway's Game of Life" +  Automated Tests
Python Code Review: Debugging and Refactoring "Conway's Game of Life" + Automated Tests
Real Python
16 Using "get()" to Return a Default Value from a Python Dict
Using "get()" to Return a Default Value from a Python Dict
Real Python
17 A Python Shorthand for Swapping Two Variables
A Python Shorthand for Swapping Two Variables
Real Python
Python Code Review: Refactoring a Web Scraper, PEP 8 Style Guide Compliance, requirements.txt
Python Code Review: Refactoring a Web Scraper, PEP 8 Style Guide Compliance, requirements.txt
Real Python
19 Click & Jump to Test Failures from the Command Line (iTerm2)
Click & Jump to Test Failures from the Command Line (iTerm2)
Real Python
20 Setting up Sublime Text for Python Developers
Setting up Sublime Text for Python Developers
Real Python
21 Sublime Text + Python Guide Overview
Sublime Text + Python Guide Overview
Real Python
22 Python Code Review: Adding Pytest Tests to an Existing Python Web Scraper
Python Code Review: Adding Pytest Tests to an Existing Python Web Scraper
Real Python
23 Type-Checking Python Programs With Type Hints and mypy
Type-Checking Python Programs With Type Hints and mypy
Real Python
24 A Shorthand for Merging Dictionaries in Python 3.5+
A Shorthand for Merging Dictionaries in Python 3.5+
Real Python
25 Python Code Review Flask Web Security Tutorial + Virtualenvs, requirements.txt
Python Code Review Flask Web Security Tutorial + Virtualenvs, requirements.txt
Real Python
26 My Python Code Looks Ugly and Confusing – Help!
My Python Code Looks Ugly and Confusing – Help!
Real Python
27 Setting Up a Programmer Portfolio/Developer Blog – How To Get Started
Setting Up a Programmer Portfolio/Developer Blog – How To Get Started
Real Python
28 Do I Need a GitHub/GitLab/Bitbucket Profile as a Developer?
Do I Need a GitHub/GitLab/Bitbucket Profile as a Developer?
Real Python
29 Programmer Portfolio – Example and Walkthrough
Programmer Portfolio – Example and Walkthrough
Real Python
30 How to Get Your 1st Speaking Gig at a Tech Conference
How to Get Your 1st Speaking Gig at a Tech Conference
Real Python
31 How to Build Your Public Speaking Skills as a Developer
How to Build Your Public Speaking Skills as a Developer
Real Python
32 The Object-oriented Version of "Spaghetti Code" is "Lasagna Code" ?!
The Object-oriented Version of "Spaghetti Code" is "Lasagna Code" ?!
Real Python
33 Setting up Sublime Text for Python Developers – Lesson #1
Setting up Sublime Text for Python Developers – Lesson #1
Real Python
34 Cool New Features in Python 3.6
Cool New Features in Python 3.6
Real Python
35 "is" vs "==" in Python – What's the Difference? (And When to Use Each)
"is" vs "==" in Python – What's the Difference? (And When to Use Each)
Real Python
36 Emulating switch/case Statements in Python with Dictionaries
Emulating switch/case Statements in Python with Dictionaries
Real Python
37 Python Function Argument Unpacking Tutorial (* and ** Operators)
Python Function Argument Unpacking Tutorial (* and ** Operators)
Real Python
38 What Code Should I Put On My GitHub/GitLab/BitBucket Profile?
What Code Should I Put On My GitHub/GitLab/BitBucket Profile?
Real Python
39 A Crazy Python Dictionary Expression ?!
A Crazy Python Dictionary Expression ?!
Real Python
40 String Conversion in Python: When to Use __repr__ vs __str__
String Conversion in Python: When to Use __repr__ vs __str__
Real Python
41 Method Types in Python OOP: @classmethod, @staticmethod, and Instance Methods
Method Types in Python OOP: @classmethod, @staticmethod, and Instance Methods
Real Python
42 Optional Arguments in Python With *args and **kwargs
Optional Arguments in Python With *args and **kwargs
Real Python
43 Python Context Managers and the "with" Statement (__enter__ & __exit__)
Python Context Managers and the "with" Statement (__enter__ & __exit__)
Real Python
44 Installing Python Packages with pip and virtualenv / venv
Installing Python Packages with pip and virtualenv / venv
Real Python
45 "For Each" Loops in Python with enumerate() and range()
"For Each" Loops in Python with enumerate() and range()
Real Python
46 Python Code Review: LibreOffice Automation and the Python Standard Library
Python Code Review: LibreOffice Automation and the Python Standard Library
Real Python
47 Managing Python Dependencies With Pip and Virtual Environments – Lesson #1
Managing Python Dependencies With Pip and Virtual Environments – Lesson #1
Real Python
48 Python Tutorial: List Comprehensions Step-By-Step
Python Tutorial: List Comprehensions Step-By-Step
Real Python
49 Leveraging Python's Implicit "return None" Statements
Leveraging Python's Implicit "return None" Statements
Real Python
50 What's the meaning of underscores (_ & __) in Python variable names?
What's the meaning of underscores (_ & __) in Python variable names?
Real Python
51 Python Data Structures: Sets, Frozensets, and Multisets (Bags)
Python Data Structures: Sets, Frozensets, and Multisets (Bags)
Real Python
52 Writing automated tests for Python command-line apps and scripts
Writing automated tests for Python command-line apps and scripts
Real Python
53 How to find great Python packages on PyPI, the Python Package Repository
How to find great Python packages on PyPI, the Python Package Repository
Real Python
54 Immutable vs Mutable Objects in Python
Immutable vs Mutable Objects in Python
Real Python
55 PyPI vs Warehouse, the Next-Generation Python Package Repository
PyPI vs Warehouse, the Next-Generation Python Package Repository
Real Python
56 pep8.org — The Prettiest Way to View the PEP 8 Python Style Guide
pep8.org — The Prettiest Way to View the PEP 8 Python Style Guide
Real Python
57 My Experience at PyCon 2017 in Portland
My Experience at PyCon 2017 in Portland
Real Python
58 Pylint Tutorial – How to Write Clean Python
Pylint Tutorial – How to Write Clean Python
Real Python
59 "Reverse a List in Python" Tutorial: Three Methods & How-to Demos
"Reverse a List in Python" Tutorial: Three Methods & How-to Demos
Real Python
60 Python Refactoring: "while True" Infinite Loops & The "input" Function
Python Refactoring: "while True" Infinite Loops & The "input" Function
Real Python

This video teaches how to refactor a Python web scraper to comply with the PEP 8 style guide, improve code quality and readability, and apply best practices for using requirements.txt, README files, and version control with Git. The review covers tools like flake8 and Black for linting and code formatting.

Key Takeaways
  1. Delete the .pycache folder recursively
  2. Create a new branch to delete the .pycache folder
  3. Add a line to the README explaining how to run the program
  4. Install dependencies from the requirements.txt file
  5. Refactor the web scraper code to comply with PEP 8
  6. Use flake8 and Black for linting and code formatting
💡 Using tools like flake8 and Black can significantly improve code quality and readability, and applying best practices for using requirements.txt and README files can make it easier for others to use and contribute to your code.

Related Reads

Up next
How AI Is Transforming Analytics in Tableau Cloud & Server
Salesforce Product Center
Watch →