Automate Tasks With Python & Building a Small Search Engine | Real Python Podcast #194

Real Python · Beginner ·📊 Data Analytics & Business Intelligence ·2y ago

Key Takeaways

Builds a small search engine using Python and discusses automation techniques for tasks

Full Transcript

welcome to the real python podcast this is episode 194 what are the typical computer tasks you do manually every week could you automate those tasks with a python script Christopher Trudeau is back on the show this week bringing another batch of P coders weekly articles and projects we discuss a recent Hacker News Thread about frequently used automation scripts we share the kinds of tasks we've automated with python in our work and personal lives Christopher shares a tutorial about building a micro search engine from scratch using python the post takes you through coding the components of a crawler index and ranker the finished engine is designed to search the posts of the blogs you follow we also share several other articles and projects from the python Community including a news Roundup how a polar's query Works under the hood using python for data analysis understanding op Source licensing summarizing the significant changes between Python versions a robust 2E hex editor and a lightweight data frame library with a universal interface for data wrangling this episode is brought to you by Intel get open- Source Snippets and Sample AI apps to build and deploy faster visit intel.com Edge all right let's get [Music] started [Music] the real python podcast is a weekly conversation about using python in the real world my name is Christopher Bailey your host each week we feature interviews with experts in the community and discussions about the topics articles and courses found at real python. after the podcast join us and learn real world python skills with the community community of experts at real python. comom hey Christopher welcome back to the show hey there all right so we have couple news items following our rust discussion it just keeps going so maybe maybe we'll start there yes with some news it's a conspiracy we're we're all we're all gonna be Rus rust programmers just give it time yeah so if you've been following any of the Python news there was an announcement of a tool called UV this is a replacement for pip and pip tools brought to you by the creators of the Ruff linter like with Ruff and as you might guess by his comment there it's written in Rust and by all accounts is quite Speedy at about the same time it was also announced that the folks who are bringing you UV a group called astral are also taking stewardship of the Rye tool so we'll provide you some links to the blog posts both for the announcement and the stewardship stuff as well as a link to the news thing where people argue about UV and whether they like it or hate it and all the rest of that kind of fun stuff yeah the packaging space in Python keeps being Flex used the word exciting and uh you know may may you live in exciting times yeah yeah and now we're gonna get angry notes about the fact that that quote's wrong the second bit of news is something that seems to be coming a regular feature of the show uh so much so that I've automated the announcement ion 313 Alpha 4 is now ready for your consumption and you can drop in a different sound like that two weeks from now when it becomes Alpha 5 yeah we're moving right along so betas should be shortly after that topics wise I have a lot of data science stuff I I'm kind of following my conversation with Wes mckin and ended up looking at a lot of different tools over the last week the first one I have is kind of a little bit bit of a survey of a couple posts from the polers blog we've talked about polers multiple times on here polers is a replacement for pandas in some cases and a lot of people enjoy it it is again maybe we have a special sound effect for this it is written in Rust has a lot of the same kind of tooling as far as a data frame library in this particular blog post the creators of polers talk about they titled it a bird's eye view of polers and this one's by gel Peters and it really digs deep into how it works it shows the steps from how queries are done how it optimizes them and then the final execution it has a initial introduction to the whole thing sort of a quote there a good Library abstracts away many complexities from its users poers is no different in this regard as it maintains a philosophy that queries you write should be performant by default without knowing any of the internals which is great again this idea that it's doing all this optimizing underneath the hood for you to you know make this happen and so this is letting you kind of peek under the hood a little bit and see what's happening so they give a a little bit of a flowchart you write a query it's parsed into what's called a logical plan that's optimized into an optimized logical plan which is transformed into the physical plan and then it's executed and then they kind of go through that in the article and describe what's happening they provide an example query which is really nice this one uses multiple data sources that need to be joined together then they go through the process very common in data analysis of like okay I want to group that aggregate it and in this case they're using a like a total amount divided by the difference uh it's a taxi data set so there's a difference in the pickup and drop off times and then they do an averaging and so it's kind of like a looking at the average rate that is charged for taxis and so forth and then they do a sorting of it all of this is a lazy frame it returns instantly even though this data set is over three million rows and so they kind of talk about well how does that happen and this sort of lazy evaluation this idea of how things are changed from a query itself and parsed into this plan and executed and then they dig into the code and they if you've never looked at rust code this gives you an opportunity to look a little bit at what rust code looks like it gives a nice overview of in this particular type of application data structures structs enums there is some private kind of stuff happening declarations there uh which is interesting and then it shows these steps of the query again written in python as that example the whole thing written out um optimizing into the logical plan and then the query execution and and two of the important optimizations the projection push down and predicate push down and if you wonder what those are I'm not going to dig it to them they're pretty detailed in this post I think it's a great overview for anybody who's interested in that stuff it's nice to see a library and now a company showing their work if you will and being open to what are the complexities there in and I think that's really neat and then the second post is related to that it came about the same time on on their blog this one is from the original author of polers Richie Vin it is titled why we have Rewritten the string data type and it shares how polar they've gone through and rewriting this whole string data type and strings are kind of a messy data type they're one of the reasons that pandas has had performance issues in some ways with the numpy data type of trying to figure out like well what do we do with that and it has always kind of been a python object until recent days they talk about their relationship with arrow and they've been a consumer of the arrow to Native rust implementation and they had forked it and made this thing called polar arrow and trimmed down the implementation to what they felt was tuned for poers needs they had more control that way and looked to refactor and at the same time it seemed like the Aros spec was also moving in a similar Direction and they both kind of now are a little more similar which is great so they could kind of keep using a lot of the Aros speec there it talks about what is called a German style string type you can learn a little bit more about what that is by reading into this other things that are covered they get into the good and the bad of what's involved in encoding strings uh they cover hyper slash Umbra style string storage they dig into details on changes and some of the benchmarks which is nice they can kind of show you what these changes have done across not only the two versions that they're showing but also small medium and large strings so again if you're interested in data types optimizing of these kind of systems and platforms these changes are not so much user facing the only thing you got to worry about is just updating to the laser version of poers to get the benefits of them but I'm I'm a fan of this stuff I think it's it's really cool to see this I I like that they write these up in posts and I hope they continue it in the future all right so what's your first one Chris I'm starting with an article by Alex Mullis and it's titled a search engine in 80 lines of python Alex had recently started a new job and had to learn solar which is an open source search engine based on Lucine and to better understand the ins and outs of search he decided to write his own engine and this article is about the design and structure of that engine that he built his first goal with it was to help him find sites that are smaller and don't tend to surface on Google addressing what's sometimes called the small discoverability crisis and the code uh going with the article is available on GitHub so you can follow along with the whole thing along with the blog post if you like before digging in he does admit that the 80 lines in the title is an oversimplification uh he did write some companion libraries that he's calling into but the way he puts it is the interesting part is the 80 lines the first step in making a search engine is creating a crawler a crawler's job is to wander around and collect data uh for Google the crawler is crawling the web for other tools it might be crawling internal sites at a company or different sources of data Alex's crawler uses a list of RSS feeds that are from smaller sites that he has frequented a little over 600 of them I think if I remember correctly once you've got the data you've got to do something with it uh the next part is a inverted Index this is a data structure structure that Maps keywords to documents so if you think about what you're searching on it needs to figure out which documents have those words in it Alex's engine uses a python nested default dict to store keywords mapping to the URLs of the corresponding doc as well as a count on how many times those words show up in the doc when the crawler finds a doc it parses all the words in it and constructs the corresponding index structure which he's then wrapped in a class because you know objects with the index ready now you want to figure out how to sort the matching responses for a given query and this is the ranker job Alex goes into detail here with some math that gives me flashbacks to engineering school that i' really rather not relive and then shows you how to create a score for a document and that score then results in the rank and that's how you end up deciding which which articles to show in what order in your results with all those core Parts in place the final step is then to create an interface Alex chose fast a API to build a front end for his query system and then he finishes off the post talking about features that aren't there but could be added uh and summarizes what he's learned so all in all if you're interested in how that tool you probably use every day of your life Works uh search engines then this is a good intro with python as the basis for the examples yeah I explored this topic briefly when I was starting to Learn Python and one of the resources that I I looked at was uh Udacity couldn't think of the name the other day when I was talking to you and they had you know kind of a basics of python course and that was what they had you kind of build as you work through the fundamentals of python which is kind of interesting this looks way more organized and curated in a way and and definitely way more structured it's a little bit more of an intermediate project but I think it's cool that he giving you a jumping off point for customizing and also for for some of your needs but yeah it's a neat project yeah and the fact that it's only 80 lines right so like I I like you said I wouldn't I wouldn't throw a junior if you're just starting python on it this might not be the place but 80 lines is something that's comprehensible so it gives you an idea how it works and uh so it's some interesting code to read if nothing else yeah yeah all right so my next one is from one of our newer authors here at real python Ian air and Ian wrote this great introduction into this topic it literally what the title says uh using if you're interested in using python for data analysis this is a great resource to get you going it covers not only the concepts which I think it does really well and some of the tools as I've discussed over the last several episodes uh are sort of interchangeable there's lots of different tools I'll mention another tool at the end but then probably most importantly some of the methodology in taking through it and so he breaks us into not quite a step-by-step project but definitely provides you lots of tools and examples to work through gives you a data set to kind of play with this particular data set is a James Bond the series of movies for you to follow the the history of7 across his film career and uh it's a modified data set that so you can work with it and go through these steps Ian starts with giving you an overview of this workflow and all of these steps and understand what is the need for data analysis why there's a need for a workflow he takes a moment to focus on really one of the more crucial areas that's given time I feel like this happens in a lot of uh companies and anybody who's probably done data analysis and been asked by a manager or somebody in organization is okay well what's the objective here what is it that we're trying to find out of it it's really the most important question otherwise you're kind of just smashing things together to see what remains or potentially throwing things at the wall to see what sticks and if you have a a core objective you really have a much better way of figuring out the path and the flow to get there again going through the workflow he covers acquiring your data this might be pulling things in from a CSV file one of the things that I really like about this section is he spends some time talking about how to pull in things from other sources so maybe you're pulling things from the web so that might be like a Json resource so he provides a set of four challenges of these different resources one is Json the other is okay what if the data is in an Excel format which is very common what if it's been optimized in something like paret which is a very popular format for data science and then one that I found really cool and I hadn't tried is right from Panda's web scraping from an HTML table he talks about this little addin that you can put into your particular environment that then allows pandas to do this that you can just web scrape out of something from like an HDML table instead of having to go through multiple steps there which I think is neat and then you get into the heart of it which is cleansing your data with python which is usually about 80% of your time he has I I don't know is it nine or 10 different steps uh creating meaningful column names dealing with missing data working with financial notation and then converting that stuff correcting invalid data types fixing inconsistencies correcting spelling checking for invalid outliers and then other common things like removing duplicate data and then finally storing that cleanse data as a a data set and choosing how you want to do that and then this is where a lot of people consider the fun part I personally really enjoy cleaning data I know I'm a weirdo I'm also the one in my household who likes to thoroughly clean things I'm actually sometimes asked by my wife not to clean things because she knows it's going to take too long because it needs to be done thoroughly if I'm GNA bother to do it so I'm a weirdo in that way so he gets into the Performing data analysis using python uh you perform a regression analysis he talks about a variety of different things you can do there building a scatter plot comparing in this case the different sources of uh ratings for the movies which is kind of fun you comparing like rotten tomatoes versus IMDb do they have an actual relationship there uh does it make sense and then gets into a statistical distribution of the film links so a variety of different ways to kind of do the actual analysis part once you've gotten through everything and then a little bit about communicating your fightings I really think it's a great overview if you again are getting started in this realm of data analysis and are interested in using python for it I think this is a great guide it's also a thorough practice session for anybody who wants to maybe practice their chops on this but also it's a good workflow to use as a blueprint to build off of so thanks Ian I think this is a really great resource don't start building your AI app from scratch save time and effort by visiting intel.com Edge get open- source code Snippets and Sample apps for a head start on development so you can reach your seamless deployment faster go to intel.com Edge AI That's e d GE [Music] AI all right so you have a multi-art one like I had at the top here what's your your your next set here yes your multi-art were related and uh my multiart is borderline schizophrenic but uh there I've got a I do have a twofer uh the Articles aren't related at all except for the fact that they're both helpful they're both reference posts there isn't much to say about them because they're reference posts but if I just sort of thought I'd highlight them in case these were things that might be useful to other people as well nice the first one is called understanding open ource licensing and it's by Uma Victor the article does a deep dive on the different kinds of Open Source licenses why as a developer you might care why the company you might work for might care why those two things might be different the article talks about uh starts out by talking about the two common categories of Open Source license permissive and copy left permissive are things like the MIT license that more or less say you can do whatever you like uh typically with some limitations on liability that tends to be MIT and BSD are the ones that are the most popular in the python ecosystem and then the copy left group uh by contrasts or things like the GPL family of licenses they have conditions on using or modifying the software such as making the software and your changes available to others there's a deep dive on each of the popular types and their consequences and uh you know if you're new to this topic the article gives you a good overview rather than just sort of clicking in GitHub and picking one randomly maybe you can understand what your choice is a little bit you might be surprised at how often you might run into this in your career a few years back I was at a medium-sized organization that got bought by a very large organization I won't say who but the first version of their mascot recently became public domain that large organization didn't want any GPL in the org at all so we ended up having to do a massive audit on all our code and replacing a whole bunch of libraries because as a big lawyer driven organization a they were worried about the consequences of the GPL so as a developer who works in these spaces and you're using libraries all the time you don't have to understand all the ins and outs but getting the basics can be helpful if you really want to dig into this topic and the article is not enough I can also recommend uh there's an oriley book it's a few years old but I still I think it's still kicking around it's by Lawrence Rosen and it's called open source licensing uh will include a link directly to it in the show notes yeah we talked a little bit about licensing with my conversation with Wes mckin we kind of dug into the relationship between you know the arrow format and how Apache kind of tied to that and so forth so I know the the blog post that you are pointing to talks quite detailed about about Apache licensing so that's kind of interesting yeah and Apache is another one of the permissive ones there's very little difference between Apache BSD and MIT at the high level so yeah they tend to be compatible with each other and onto the second one again another resource kind of Link this is called summary of major changes between Python versions it's by Nicholas hairs and as you might guess by the title it covers the big things that happen in each python version so if you're trying to remember whether you can use the wallrus operator depending on what version your client's using this is the page for you uh as someone who writes python educational content occasionally I find myself going oh I need to make a feature what can I do that I need to make a note as an example I just finished building out a course on exceptions I talked about the ad note feature and I needed to be able to tell the people taking the course that hey you need python 311 to use this or newer so this kind of article is really helpful if you're having to dig into that I've also used a similar article by uh Ned belder that covers more or less the same topic but if the info you're looking for in isn't in one of them it's probably in the other so we'll link to both of them in the show notes yeah and in the spirit of the show that in inspired the name of your favorite programming language I think we should rename our discussion segment to the argument room how do you feel about that I we gonna have an actual argument we tend to agree with each other a lot but I can pretend okay all right yeah I guess I can be uh combative if that's what we're doing here all right yeah so that's leading into our discussion or argument uh this one I guess there's actually a little bit more of a showcase or at least a you know talk about our our history with it we found a thread on Hacker News that talked about actually asked the question what python automation scripts do you reuse frequently at work for me I wanted to talk about this partly because that was like the title of the the first python job I had was automation engineer and so I was writing scripts for a marketing department at at a bank and these were big long multi-step processes not like the data analysis article that I talked about earlier where I was grabbing data sets connecting other data sets to it adding Fields cleaning cleaning and more cleaning and then narrowing the scope of that and then finally you know sending this out to various destinations depending on who was requesting this data set and and so forth there was like sort of weekly jobs and monthly jobs but yeah it was really data analysis kind of stuff and and these were typically done in like a Jupiter notbook type of situation that's what you know other co-workers were using and slowly but surely I thought about well maybe I want to switch these to being something I would run in the terminal so I started to experiment there but it was one of these things that it was a lot of text and a lot of messy data and so by me running it through Jupiter and doing it as a multi step kind of thing it ended up allowing me to see what was happening and sometimes you know depending on the changes to the job it was something that I would very often run manually and then after that was working well I could then go ahead and set it to to automate it on a personal thing there's a lot of stuff that I do for real python I I do a lot of the video courses here and so I've had the need to know like generally if I look at a entire folder filled with video files could just give me the total like how many minutes is that or hours and minutes uh that was actually a handy script that I created which you should have told me you had because I would wrote the same thing myself the same thing all right we can compare I have you know other things that rename files and push things into different folders I'm trying to work toward automating a few other things that I do often which is like backing up stuff and pushing stuff to different repos and so forth uh we do a lot of stuff through Google and so I'm trying to find some things there but I'm interested in this topic like in finding other tools uh I'm interested in making things a little more CLI friendly and maybe get advice on you know what do you think is a great tool for here so you had a couple notes Here you wanted to mention maybe what was your uh first Python program yeah so the the you know Python's one of languages where like you've got people who are writing the you know the little 10line script because it's it's the solution it often ends up being glue yeah yeah the hack Hacker News discussion actually talks about that like a lot of the stuff was just I have these 15 things that have to happen in a row so I write the python that calls those 15 things and then I invoke it all from KRON right so and that's a common sort of usage yeah that same company that I was at that got acquired the acquiring company was concerned about the image content we had on a public photo sharing site that was one of our uh subsidiaries and the police got involved okay uh so I had to I had to go and grab all of the content for one of the users for an investigation and so my first python script was named dous and the photos were all in S3 buckets and they were in random places and there was a database that essentially stored where this person's account was and where their images were and so this really just was again it was like 15 20 lines of code it grabbed a bunch of information out of a database grabbed some separate information out of an Excel file did a the equivalent of a join and then fetched all that stuff off of S3 so that I could burn it to a CD and submit it to the cops so it's a weird first experience but I yeah yeah ignoring the context of it I think it's a common thing that python coders end up with right like you know you're talking about you know having the the script that does the little lengths I use a tool called shotcut for doing a lot of my video editing and it's got an export me mechanism built in and you can do Mass exports but you have to add a single file at a time the export is actually sitting on top of an underlying executable that it calls and it does give you the information to be able to call that executable manually if you want and so I sort semi- reverse engineered that information out of its detail files and stuck it in a python script and so now I've got a file called Glinda that I go off and it grabs all of my uh stuff and renders it for me so that I don't have to go through and you know click it over and over and over again clean up I have something called clean pie uh that goes through and removes all the pyc files so it gets rid of your Dunder directories and uh any pyc files that are there before shipping them off because of running into problems with size limitations on email I've got a script that looks at a zip file and reconstitutes it in chunks so that you basically it takes one zip file and turns it into three if it needs to be three so that each one's small enough chunkies it basically and and each one's rather than just splitting the file directly they're each is a valid Zip by itself Yeah by itself so you can send it in in different email and then I've got some work stuff um there's a Canadian radio station that uh has podcasty things on it but they insist on not putting it out in a format that is friendly to users they want you on their site yeah so I have you know reverse engineered their protocol and have a little script that goes off and grabs each of the segments from the site and downloads it so that I can listen to it the way I want couple honorable mentions that aren't python they're bash uh but again same thing it's like that quick and easy yeah yeah I have something called term ruler that just gives me a asky colorcoded print out of the number of characters in a on a screen and again this is one of those video and resizing things like might need to make sure the terminal I'm in is 80 characters or is 132 characters and so it just prints out the numbers but then it color codes every 10 so that you can visualize quickly visualize whether or not the size is correct and the last one I've got is uh I called it d zip I find I frequently need to take a directory and or several directories and turn each directory into a zip file so it's basically just a wrapper around zip that you hand it directory names and it creates a zip uh so you know a little bit of inside baseball I send Mr Bailey here files frequently that are demo code and script for different components of the video courses I keep those in directories so I go dzip demo code and zip and then I get those three zip files and dump it there so that it's there right so nice I I think this kind of stuff is sort of inevitable if you're a programmer it's often more fun to actually write the solution even if writing the solution takes longer than doing it manually uh I think that I think that's a problem a lot of us share and I think uh once you've been doing it for a few years you end up with a with a small collection if you're new to this and you're starting to do it one of the things I would suggest is create like one GitHub repo with like directories rather than having to you're not going to create packages for all these CU you're not trying to share them with your friends or make them useful to somebody else so like just having a my miscellaneous scripts repo then you can you know put them in there and then because what you'll find is even if you stop using some of them you'll be like wait a second I used to I did that I've got some sample codes somewhere so you know even if you don't intend on reusing it much uh sticking it up on a on a repo is there's value in being able to track it for yourself even if it's not for anybody else yeah when I think about this topic it immediately you know turns to Al Swagger's book right the whole automate the board stuff thing yeah and he definitely focused on that and that was of all the books that they had about python at the job I was starting that that was the one they had it was kind of interesting that they were that was the sort of practical set of examples of like oh you want to try to do these different things and and I I still think it comes up all the time do you have a particular CLI p thawn library that you maybe you've had choices over time or you've got one you always use I'm I'm old school ARG parse I don't use anything else um all right so just built in and and even that it was uh it took me forever to get to that because I I had I was intimately familiar with the old one which was what opt parse I think it was and so even even even moving to ARG parse took me a while but uh okay Django switched their command line mechanism to arcar from opt parse so I was forced to learn it and so you end up moving along ended up moving along I've used other libraries in the past when I've had to like build things for customers and things like that but usually when I'm doing it for myself it's quick and dirty it's help and the name of a file and it's three lines and I don't usually have to do much after that yeah right you're not necessarily needing a lot of flags or anything explanations of things because it's really just for you yeah yeah yeah to something that's a little more deliverable to like another end user than you're going to want yeah although I will still recommend you know comments and things like that for yourself because I know when we said we were when we said we were gonna have this discussion I went into my bin directory and went okay so what do I have in here and I'm like what is that what does that do why did I write that oh okay so uh so yeah there were a couple couple pieces of software that I found on my hard drive that I wrote and uh I didn't comment and I had to read the code to understand what the thing did so so yeah a sentence at the top is valuable yeah yeah I think it's a great area to look at starting you know like if you're you're a beginner yes solve something you know solve a little problem in your life and yeah you could do it in bash but but it's a great way for you to Learn Python and learn you know okay well how does it handle files which is a really common thing you you'll learn way more by solving a problem and you'll retain more of it than by some toy that you know uh the site you're learning from uh is teaching you yeah without a this week I want to Spotlight on another real python video course it's related to our discussion topic this week and can provide some more details on how to turn those Python scripts into an application you can share with others using a command line interface it's titled building command line interfaces with arars and it's based on real Python tutorial by leodas poo Ramos in the course real python instructor and my co-host Christopher trau takes you through what the python arcar library is and and why it's important to use it if you need to write commandline scripts in Python how to use the arcar library to quickly create a simple CLI in Python you learn how to implement positional versus optional arguments employ Flags add custom actions and even how to add a subp parser if you're interested in making reasonable applications and tools for yourself or to share with others a flexible command line interface is a good place to start and Python's built-in Library are parts has got you covered real python video courses are broken into easily consumable sections and where needed include code examples for the technique shown all lessons have a transcript including closed captions check out the video course you can find a link in the show notes or you can find it using the enhanced Search tool on real [Music] python. all right well that takes us into projects it looks like you have a a a twoe if you will yes yes yeah nice so many many moons in the past in the in the darker ages before the time where IBM decided to hand Microsoft the gift of NT there was an operating system called os2 it was 180th as good as OS 360 but it didn't require a Mainframe to run in this darker time I wrote device drivers and spent a lot of my days and night fiddling little bits and playing with serial interfaces and one of my best friends at the time was a hex editor yeah okay uh hex editors are tools for looking at binary data oftentimes a compiled program or in my case it was often a data dump from a Serial stream and as binary data can contain values that are unprintable on a terminal for example ask8 goes Bing uh you need to look at the data as a giant table of numbers this kind of tool used to come with most operating systems but now you know tends to be bit more of an add-on so yeah yeah so my project this week is hexte which is a twoe based hex editor which is by a gentleman named Justin who goes by the handle Theta con and uh since I'm talking about it on the real python podcast you'll probably guess this was written in Python and uh you'd be right it's built on top of the rich toolkit and which is part of that whole textual tuy family so you can edit up to two files at a time hexy doing side by-side comparisons you can switch between a modified printpal aski view a modified UTF view or the usual hex value notation editors like these usually have a column showing the address value for each row you're viewing and this one lets you even change the format of that so you can do it in HEX or or in binary or octal or turn it off Al together it has the ability to do searching for Content so it allows you to look for a particular bite of strings so pretty much everything I've ever seen in most hex editors is here uh Snappy performance-wise even though it's python so if it's the kind of tool you're needing this is a decent one also sufficiently self-contained that if you're just looking for another example of how to code using a library like Rich this is a good example of that as well because it's a small enough project you can wrap your head around it yeah at the time of the recording version 0.8.4 had been released it has a bug that I bumped into where I wasn't able to launch the viewer backing up and doing a specific install of 0.8.3 fixed the problem somebody had already logged it and it had been bumped a couple times so I suspect it'll get fixed shortly uh but you might want to if you're playing around with it you might want to avoid the uh Point 8.4 version bugs happen so no big deal there and seeing as I started talking about my project as if I was an ancient Storyteller I'm going to challenge you to do all of yours in ha cou how's that well I have a uh I have a different sort of language thing that's related to what you're talking about I had a friend who really was into graphic art stuff on the Atari platform back in the day the Atari St and that was my experience with the hex editor we found a program but it was written in French and we wanted to change all the pull down menu so that they made sense to him and not to me um and so I was the translator because I knew French yep and uh you know at least high school and college and so uh we went through with the hex editor and edited all those pull down menus and those options and so forth and kind of made a an English version of it using a very simple hex letteror at the time so that it was interesting yeah if I remember correctly like the tagging inside of MP3 isn't encoded either right so like you'd be able to see the strings you could you you'd probably be able to get the strings in them you'd be able to sue the same thing get your hex editor open and uh going in and muck around with the uh with your with your titles yeah find the Easter eggs and other things that are in there yes yeah it's very interesting yeah so my project is something that I spoke about last week when I had uh Wes mckin on we talked about a variety of different projects that he does and I mentioned that I was very interested in this project Ibis and so I decided to dig a little deeper into that this week partly it was featured in P coders I'm like oh nice okay cool maybe I could talk about this a little bit more it is similar to things we've already talked about we've talked about polers we've talked about pandas Ibis is a another tool for data processing and and working inside this environment one thing that it provides that's kind of unique is that it tries to be portable which I think is an interesting term that they use there Ibis has three kind of primary components to it it's a data frame API it has some similarities to not only pandas but also the D plier stuff from the r World python users can write IIs code it to manipulate tabular data it it supports so many these backends these uh query engines if you will it has like 20 different ones that it can support so you can move your code you know across different database systems and so forth so it's nice if you need to have the portability to you move what you're writing as queries across them it has deferred execution again that kind of lazy idea so execution of code can be pushed to the query engine users can execute at speed of the back end and not worry about the local computer which is kind of neat their goal is to be future proof the idea that you can write this code that again can be moved across these other Solutions it has a lot of flexibility things that I like right off the bat is again I've mentioned multiple times on projects that I've tried to install things and and you know kind of like gone through the nightmare of trying to get things running and so forth and this was super easy not only does it have you know very simple pip install kind of methodologies but it has lots of uh those kind of install methods where you have like the square brackets and you're saying oh I want to have it talk to duck DB or I want to have it talk to my SQL or I want to have it talk to so forth and it'll automatically install all the different hooks that it needs for that to go so that I like that about the the way that's set up the documentation is really good again it shares a lot of the same ideas that holders has as far as like being more efficient and it's just I think in some ways Wes and his team that work on it had a chance to have another stab at like oh how could we do data frames and and do this in a much more modern way if you're interested in this whole thing our conversation was very enlightening like he created pandas at a time when it wasn't even really call data science you know it was more of this sort of world of Statistics data has just blown up in such a huge uge way since that time I really like the way the data frames show up in this thing they they have a very polar kind of style where you actually can see the different data types and so forth one of the things I think it does really well is it has the ability to export your queries as raw SQL and show you that information if you've ever written SQL it can be kind of cumbersome this can really be a way around that and make it a lot easier and you know kind of be a nice friendly interface for doing data exploration so it's called Ibis all right well thanks again for bringing all these articles and tutorials and projects this week Christopher yeah see you next time all right talk to you soon and don't forget jump start your AI apps at intel.com sledai for open source code Snippets tutorials and more save time deploy faster visit intel.com Sledge I want to thank Christopher Trudeau for coming on the show again and I want to thank you for listening to the real python podcast make sure that you click that follow button in your podcast player and if you see a subscribe button somewhere remember that the real python podcast is free if you like the show please leave us a review you can find show notes with links to all the topics we spoke about inside your podcast player or at real e.com/apps-script

Original Description

What are the typical computer tasks you do manually every week? Could you automate those tasks with a Python script? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects. 👉 Links from the show: https://realpython.com/podcasts/rpp/194/ We discuss a recent Hacker News thread about frequently used automation scripts. We share the kinds of tasks we've automated with Python in our work and personal lives. Christopher shares a tutorial about building a micro-search engine from scratch using Python. The post takes you through coding the components of a crawler, index, and ranker. The finished engine is designed to search the posts of the blogs you follow. We also share several other articles and projects from the Python community, including a news roundup, how a Polars query works under the hood, using Python for data analysis, understanding open-source licensing, summarizing the significant changes between Python versions, a robust TUI hex editor, and a lightweight dataframe library with a universal interface for data wrangling. This week's episode is brought to you by Intel. Topics: - 00:00:00 -- Introduction - 00:02:23 -- uv: Python Packaging in Rust - 00:02:43 -- Rye Grows With uv - 00:03:20 -- Python 3.13.0 Alpha 4 Is Now Available - 00:03:45 -- A Bird's Eye View of Polars - 00:07:28 -- Polars: Why We Have Rewritten the String Data Type - 00:09:33 -- A Search Engine in 80 Lines of Python - 00:13:14 -- Using Python for Data Analysis - 00:18:22 -- Sponsor: Intel - 00:18:53 -- Understanding Open Source Licensing - 00:21:54 -- Summary of Major Changes Between Python Versions - 00:23:19 -- What Python automation scripts do you reuse frequently at work? - 00:34:21 -- Video Course Spotlight - 00:35:52 -- hexabyte: A modern, modular, and robust TUI hex editor - 00:39:56 -- ibis: The Flexibility of Python With the Scale of Modern SQL - 00:43:31 -- Thanks and goodbye 👉 Links from the show: https://realpy
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Real Python · Real Python · 0 of 60

← Previous Next →
1 A better Python REPL – bpython vs python interpreter
A better Python REPL – bpython vs python interpreter
Real Python
2 Introducing large-type.com – A Utility Website
Introducing large-type.com – A Utility Website
Real Python
3 Reading Hacker News Without Wasting Tons of Time
Reading Hacker News Without Wasting Tons of Time
Real Python
4 Forward References and Python 3 Type Hints
Forward References and Python 3 Type Hints
Real Python
5 Using Sublime Text as your Git Editor
Using Sublime Text as your Git Editor
Real Python
6 Python Code Linting and Auto-Complete for Sublime Text
Python Code Linting and Auto-Complete for Sublime Text
Real Python
7 Make your Python Code More Readable with Custom Exceptions
Make your Python Code More Readable with Custom Exceptions
Real Python
8 Write Better Tests with Sublime Text's Split Layout Feature
Write Better Tests with Sublime Text's Split Layout Feature
Real Python
9 How to Use Sublime Text from the Command Line
How to Use Sublime Text from the Command Line
Real Python
10 Rename Variables with Multiple Selection in Sublime Text
Rename Variables with Multiple Selection in Sublime Text
Real Python
11 Sublime Text Settings for Writing PEP 8 Python
Sublime Text Settings for Writing PEP 8 Python
Real Python
12 Write Cleaner Python with Sublime Text's Indent Guides
Write Cleaner Python with Sublime Text's Indent Guides
Real Python
13 Sublime Text Whitespace Settings for Python Development
Sublime Text Whitespace Settings for Python Development
Real Python
14 Function Argument Unpacking in Python
Function Argument Unpacking in Python
Real Python
15 Python Code Review: Debugging and Refactoring "Conway's Game of Life" +  Automated Tests
Python Code Review: Debugging and Refactoring "Conway's Game of Life" + Automated Tests
Real Python
16 Using "get()" to Return a Default Value from a Python Dict
Using "get()" to Return a Default Value from a Python Dict
Real Python
17 A Python Shorthand for Swapping Two Variables
A Python Shorthand for Swapping Two Variables
Real Python
18 Python Code Review: Refactoring a Web Scraper, PEP 8 Style Guide Compliance, requirements.txt
Python Code Review: Refactoring a Web Scraper, PEP 8 Style Guide Compliance, requirements.txt
Real Python
19 Click & Jump to Test Failures from the Command Line (iTerm2)
Click & Jump to Test Failures from the Command Line (iTerm2)
Real Python
20 Setting up Sublime Text for Python Developers
Setting up Sublime Text for Python Developers
Real Python
21 Sublime Text + Python Guide Overview
Sublime Text + Python Guide Overview
Real Python
22 Python Code Review: Adding Pytest Tests to an Existing Python Web Scraper
Python Code Review: Adding Pytest Tests to an Existing Python Web Scraper
Real Python
23 Type-Checking Python Programs With Type Hints and mypy
Type-Checking Python Programs With Type Hints and mypy
Real Python
24 A Shorthand for Merging Dictionaries in Python 3.5+
A Shorthand for Merging Dictionaries in Python 3.5+
Real Python
25 Python Code Review Flask Web Security Tutorial + Virtualenvs, requirements.txt
Python Code Review Flask Web Security Tutorial + Virtualenvs, requirements.txt
Real Python
26 My Python Code Looks Ugly and Confusing – Help!
My Python Code Looks Ugly and Confusing – Help!
Real Python
27 Setting Up a Programmer Portfolio/Developer Blog – How To Get Started
Setting Up a Programmer Portfolio/Developer Blog – How To Get Started
Real Python
28 Do I Need a GitHub/GitLab/Bitbucket Profile as a Developer?
Do I Need a GitHub/GitLab/Bitbucket Profile as a Developer?
Real Python
29 Programmer Portfolio – Example and Walkthrough
Programmer Portfolio – Example and Walkthrough
Real Python
30 How to Get Your 1st Speaking Gig at a Tech Conference
How to Get Your 1st Speaking Gig at a Tech Conference
Real Python
31 How to Build Your Public Speaking Skills as a Developer
How to Build Your Public Speaking Skills as a Developer
Real Python
32 The Object-oriented Version of "Spaghetti Code" is "Lasagna Code" ?!
The Object-oriented Version of "Spaghetti Code" is "Lasagna Code" ?!
Real Python
33 Setting up Sublime Text for Python Developers – Lesson #1
Setting up Sublime Text for Python Developers – Lesson #1
Real Python
34 Cool New Features in Python 3.6
Cool New Features in Python 3.6
Real Python
35 "is" vs "==" in Python – What's the Difference? (And When to Use Each)
"is" vs "==" in Python – What's the Difference? (And When to Use Each)
Real Python
36 Emulating switch/case Statements in Python with Dictionaries
Emulating switch/case Statements in Python with Dictionaries
Real Python
37 Python Function Argument Unpacking Tutorial (* and ** Operators)
Python Function Argument Unpacking Tutorial (* and ** Operators)
Real Python
38 What Code Should I Put On My GitHub/GitLab/BitBucket Profile?
What Code Should I Put On My GitHub/GitLab/BitBucket Profile?
Real Python
39 A Crazy Python Dictionary Expression ?!
A Crazy Python Dictionary Expression ?!
Real Python
40 String Conversion in Python: When to Use __repr__ vs __str__
String Conversion in Python: When to Use __repr__ vs __str__
Real Python
41 Method Types in Python OOP: @classmethod, @staticmethod, and Instance Methods
Method Types in Python OOP: @classmethod, @staticmethod, and Instance Methods
Real Python
42 Optional Arguments in Python With *args and **kwargs
Optional Arguments in Python With *args and **kwargs
Real Python
43 Python Context Managers and the "with" Statement (__enter__ & __exit__)
Python Context Managers and the "with" Statement (__enter__ & __exit__)
Real Python
44 Installing Python Packages with pip and virtualenv / venv
Installing Python Packages with pip and virtualenv / venv
Real Python
45 "For Each" Loops in Python with enumerate() and range()
"For Each" Loops in Python with enumerate() and range()
Real Python
46 Python Code Review: LibreOffice Automation and the Python Standard Library
Python Code Review: LibreOffice Automation and the Python Standard Library
Real Python
47 Managing Python Dependencies With Pip and Virtual Environments – Lesson #1
Managing Python Dependencies With Pip and Virtual Environments – Lesson #1
Real Python
48 Python Tutorial: List Comprehensions Step-By-Step
Python Tutorial: List Comprehensions Step-By-Step
Real Python
49 Leveraging Python's Implicit "return None" Statements
Leveraging Python's Implicit "return None" Statements
Real Python
50 What's the meaning of underscores (_ & __) in Python variable names?
What's the meaning of underscores (_ & __) in Python variable names?
Real Python
51 Python Data Structures: Sets, Frozensets, and Multisets (Bags)
Python Data Structures: Sets, Frozensets, and Multisets (Bags)
Real Python
52 Writing automated tests for Python command-line apps and scripts
Writing automated tests for Python command-line apps and scripts
Real Python
53 How to find great Python packages on PyPI, the Python Package Repository
How to find great Python packages on PyPI, the Python Package Repository
Real Python
54 Immutable vs Mutable Objects in Python
Immutable vs Mutable Objects in Python
Real Python
55 PyPI vs Warehouse, the Next-Generation Python Package Repository
PyPI vs Warehouse, the Next-Generation Python Package Repository
Real Python
56 pep8.org — The Prettiest Way to View the PEP 8 Python Style Guide
pep8.org — The Prettiest Way to View the PEP 8 Python Style Guide
Real Python
57 My Experience at PyCon 2017 in Portland
My Experience at PyCon 2017 in Portland
Real Python
58 Pylint Tutorial – How to Write Clean Python
Pylint Tutorial – How to Write Clean Python
Real Python
59 "Reverse a List in Python" Tutorial: Three Methods & How-to Demos
"Reverse a List in Python" Tutorial: Three Methods & How-to Demos
Real Python
60 Python Refactoring: "while True" Infinite Loops & The "input" Function
Python Refactoring: "while True" Infinite Loops & The "input" Function
Real Python

Related Reads

Up next
Google Analytics Alternative For WordPress | AnalyticsWP Tutorial
Matt Tutorials
Watch →