Q&A about Machine Learning with Text (online course)

Data School · Intermediate ·🔧 Backend Engineering ·9y ago

Key Takeaways

The video discusses the Machine Learning with Text in Python online course, covering topics such as natural language processing, machine learning fundamentals, and text-based data science problem solving, with tools like scikit-learn, pandas, and Kaggle. The course is designed for beginner and intermediate Python users, with a focus on hands-on experience and practical applications.

Full Transcript

All right. Hello everyone. Welcome, welcome. Good to see all of you here. Super excited. Um I will introduce myself in a moment. But uh this is the info session for machine learning with text in Python, the upcoming online course. And uh just want to make sure you can see me and hear me. So, if you can see me and hear me, um, feel free to post where you are coming from in the chat. Some folks have already already, but love to see, uh, where everyone is from. Let's see. Marvin from Germany, an Anthony from Texas, Joseph, Atlanta, uh, many, many more. Seattle, Arizona, Spain, uh, DC. All right, that is where I am living. Seattle, uh, France, India, Miami, San Fran, LA. Oh, this is great. Philly, Philippines. Oh, wow. Uh, Mark, if I don't know my time zones off hand, but it might be in the middle of the night for you. Um, Spain, great. New York City, Nigeria. Wow, this is awesome. Um, I love the data school community, by the way. It is so cool. Um, for those of you who have been in the community for a while, I I sent out some stickers around the world, um, a couple months ago and I mailed to like 20 different countries. It was great. Um, okay. Uh, let's see. Says, "Video stream is not loading." Um, we have, uh, you know, anytime that has happened, uh, the most common resolution is just to refresh, uh, your browser and it will probably have to reload. Um, if that doesn't work, you can always try a different browser or device. Um, and I'm so sorry if you have problems with uh CrowdCast. It usually works quite well. Um, and worst comes to worse, this is being recorded. And the cool technology is that like within a minute of this info session ending, you can watch the entire recording. So, um, uh, that's just to say that's your kind of out if you really can't get it working. So um all right so welcome. It's great to see all you all of you here. As I said I'll introduce myself in a moment. Um but let me just give you the agenda for today. Um first thing uh or the main thing is we are going to be chatting about the course machine learning with text in Python. Um I'm going to be walking through the course info page. Of course I'm not just going to read it to you. I'm mostly going to kind of comment on it, answer your questions about it, give you context to why I created the course uh the way I did. Um so that is the main thing. I would strongly encourage you to open up the course info page if you don't already have it open. And uh there's a green I think green button below the screen that says click here to view detailed course information. If you click that button, it should open the info page in a new window and uh highly encourage you to do that. Um in terms of your questions, I will um I will be answering some of them as we go and the rest I will answer at the end. Uh this um I'm planning for this to go about an hour and at the very end of the hour I will if there's extra time I will answer your questions about machine learning or data science in general I'm happy to do that if we have time and uh then in an hour or like 56 minutes or something uh enrollment will open for the course. So um yeah that's the plan. Um, let me introduce myself and then want to know a tiny bit more about you. So, uh, as I said, um, or maybe not. My name is Kevin Markham. Uh, I live in Washington DC. Um, and I'm a full-time data science educator. Uh, I founded, uh, data school. Um, and my goal with data school is to help you master crucial data science topics so that you can begin or accelerate your data science career. So that's my goal. Um uh I've been doing data school for about two and a half years I think. Um and for those of you who are familiar with data school um I'd love to hear in the chat if um you know how you first heard about data school. Was it a particular video? Was I a TA in some course? Do you know me from uh an in-person course? So uh yeah, Eric Pon 2016. All right. I was speaking at uh PyCon 2016 um the huge Python conference and if you've not been which is probably most of you um really recommend it. The Python community is very welcoming and Pyon is where you can feel it. Uh especially if you don't have like a Python community locally. So uh highly recommend it. But I was um doing that. William, you taught a short course for GA. Yeah, I've taught a number of things at GA at General Assembly in DC. Now I'm focused online, but um Mark, learning pandas from you. Awesome. My pandas video series. Uh YouTube. Um Twitter. All right. Awesome. Um well, great. Uh it's great to have all of you here. Um, one thing I I would love to know a bit about bit more about you and rather than obviously I can't get into de in too much depth but if you scroll below the screen there's a tab there that says polls and it'd be cool if you click on polls and then I've got three questions in there about Python experience, machine learning experience and NLP, natural language processing experience. So, I'd love it if you take a minute to answer it. I understand some of you are probably in your phone and it's annoying to have to switch around stuff. Don't worry about it. I just want to gauge um the experience levels of the group. And we will tie this back uh in a little while to whether this course is a good fit for you um in terms of Python experience, machine learning experience. um NLP. So, uh just to comment briefly, uh got a lot of beginner and intermediate folks in Python, um and a few advanced, nobody that has never used it, which which is good. Uh machine learning experience level um uh some that don't know anything about it. I've got a good I did a previous crowdcast about getting started with machine learning but um a variety there both folks that have built machine learning models um either in Python or not in Python or they get the concept they just haven't written the code all of that is fine and then NLP experience uh natural language processing is skewed uh mostly towards um little experience with it uh no people working professionally in NLP. Okay, so that is useful for me to know. Uh thank you for sharing that. Um and as I said, I'll kind of get back to um how we can use this information to help you decide whether it's the right course for you. Um uh before I start walking through the course page, I want to tell you one thing about questions, and this will help it work best for everyone. There are essentially two ways you can ask me questions. One is the chat on the right side of the screen and I assume it's on the right for everyone and the other is these questions below the screen. Okay. So if you I will try to monitor the chat while I am talking. Okay. Um in in I won't try I will um because I want to know when you have questions about what I'm saying. Um, so the chat is great when you have a question about what I'm saying right then and you're trying to get my attention or you're just sharing something with someone else and interacting with others and you want an answer. Um, which I'm happy to give you. Uh, the questions below the screen. Uh I you should post there when you don't need it answered right at that moment or it's not relevant to that moment but you want to make sure I answer it today in this webcast and I will go through all those questions by the end of this hour I promise. Okay. So chat for immediate questions under the screen for uh like questions for a little bit later. Okay. All right. Um, that's uh that's the intro and let me get right into the content. I am going to I've got two monitors set up here so just got to manage a few things and I'm going to try to share my screen so that uh we can walk through this together. Let me see how this works. Let me get it going. Whoops. That did not do it. Uh oh, here we are. Share my screen. Screen. All right. Okay. So, I think right now you can see me and the screen. I'm actually going to close uh let's see. I think I will hide me so that you can see the entire screen. Okay. So, I think that worked. Okay. Um, if anyone could just give me confirmation, can you see my screen? All right. Great. I will trust it. Thank you. Thank you all. Okay. So, this is the course page. You've probably seen it. Um, but I'm just going to give you some commentary on it and walk through some of the key points and I'll try to do it quickly. I don't want this to get too um too boring. But um what is this course about? This course is about mastering machine learning in Python. Um if you're not interested in machine learning, this is not the course for you. Um that part is critically important. Now, one question I get often is, well, I love I'm really want to get good at machine learning. I don't know if like text is what I want to focus on. That's okay. Um, text is the content we will focus on in this course in terms of machine learning with text, but it's not critically important that you are passionate about working with text data. And in fact, you don't need any experience working with text data. Um, you don't even have to be convinced it's useful. I promise you will see that working with text is critically useful in machine learning because there is so much unstructured text data out there. Like everything that is text um there is so there's so much more text out there than just numerical data sets. Okay. So if you see the value of textbased data, this is definitely a good course for you. Even if you're not sold on it, but you're just interested in mastering machine learning, uh I think this also is is a good course for you. Okay. Um I am going to focus in the course on handson experience. Um this is not a theory course. We're not going to learn things um just to learn them and know a bit about them. uh with the tiny ex exception of I'll introduce like an overview of natural language processing and tell you a lot about uh what you can do with it without us doing it. Okay. So with that exception it is very much a handson course. Um it is a complex topic and so I want you to get the support and feedback you need. Okay. So I want you to get really far with machine learning with text during the course of these eight weeks and ultimately I want you to be able to apply these techniques to your own problems. Okay? So I've designed the course in a way that there's a lot you can reuse and reapply to other problems as well as understanding the entire workflow. Okay. So, um I I love this from Ryan, uh one of my students from April, and I was able to apply topics covered during my first week to work. That was awesome. Uh when he shot me an email with that, I thought, great, you're one week in and you're already using it at work. Um and I just think that's that's awesome. Um okay. Uh let's see how uh let's see Ari asked how much of the course will be done outside of scikitlearn and NLTK. I will come back to libraries in a little bit. Um yeah I'll come back to that in a little bit. Thanks for the question. Um okay so how is this course different from other online courses? I will uh not um you probably also know the pain of bad online courses. I've taken some. Uh you probably have too. Um and I'm not going to focus on what's wrong with other courses. Some are great, some are not. A lot are not, but that's not important. What's important is what am I going to do in this course that is kind of different and I I think better. Um we are going to be application focused. Okay. Um, so, uh, yeah, I really want you to be able to apply what you're learning. Okay. You're going to understand every single line of code we write. Um, I say that I mean, there is one thing in the course that I I remember skimming over because it it wasn't important for you to understand that thing. But basically, I am walking through every line of code because I want you to be able to reuse this code knowledgeably and modify it to suit your needs. You're going to get practice with what we're learning. Okay? And you're going to get feedback when you submit homeworks. Okay? So, you don't have to submit homeworks. You don't even have to do homeworks. You're all adults. You can do what you like. But if you do the homeworks, you submit them. You're going to get feedback from us. Okay. Um when you need help during this course, um that will be myself and my uh assistant, Alex Eager. Um it's just the two of us. Uh Alex is amazing. He used to be one of my in-person students. Then he was my TA. Now he's instructor as well. He's worked with me on the two previous versions of the course. He is a data scientist just like myself. We are the ones who are going to help you. And and I know I'm comparing this between help from us versus a volunteer TA. I'm not criticizing volunteer TAs. There are lots of great volunteer TAs. I used to be one for the first course in Corsera's data science specialization. Um feel free to post in the chat if you've ever taken that course. I was a TA for 16 sessions in a row, I think. And you know, I was a good TA, but I was a TA when I felt like it and um uh and you know, when I had time. Okay, so that's the difference here between this and most courses is we are dedicated to your success. Um someone asked what's a volunteer TA? Um a TA stands for teaching assistant. I I realize now maybe that's not universal terminology. Um basically like on a Corsera course, those are previous students who help out. Okay? They volunteer. They're not being paid to be there. Um there's no requirement that they help every day. It's just they're there to help. Um but we will be there to help every day. Okay. Um okay. Uh, I think that's um all I'll say about um how it's different. Um I'm going to scroll down to a bit of the course description. And this is an 8week course. That 8 weeks does not start today. That 8 weeks starts the first day of the course. You get a full 8 weeks. And um it runs from September 28th, a Wednesday through a Tuesday, November 22nd. Okay. And uh what is included in the course? Um 14 hours of videos. Okay, if you're if you watched my mini lessons that I sent out in my email newsletter, you've seen a sample of what this looks like. These are recorded from past classes. I have taught this course live before. Um why is it recorded this time instead of live? Well, there are lots of things to like about live classes, but there's a couple huge problems. Uh, one is I have an audience from around the world that wants to take this class, and it's hard to get everyone or even a majority in the same place at the same time online for a class. And second, I had a lot of students that are just like, I want to watch this class whenever I want. I want to watch it in an hour chunk, not a three-hour chunk. Um, I get tired during class. I want to pause it. I don't have a three-hour block to sit and participate in a class, so I want to use the videos. Or people who say, I like to watch the videos faster because you talk too slow, which is fine. I'm not offended by that. Um, some people, a lot of people I think will go back in the video to help them uh if they missed a point, they don't get it. You can immediately go back in a video. So, that's why I like using video and I'm doing it this way. And the classes are carefully edited. I've spent over 50 hours uh 50 hours editing the videos so that they're as packed as possible and there's not a lot of time wasted in there. every moment that's in there is there for a reason. Okay. Um, instructor-led webcasts. This is a new thing I'm doing. Um, this is going to be great. We're going to do this twice a week for half an hour. You can show up if you want. You can watch the recording if you don't. I'll be answering your questions. And when there aren't questions, I'm going to teach stuff that's not in the videos and the rest of the course material. Okay? So, that's going to be really fun. we're going to use uh Crowdcast. So, if um hopefully you're liking this platform, this is exactly what the live webcasts twice a week are going to um feel like. Um so, there you go. Uh two questions up here. What's the average daily effort required? Just a ballpark based on past incoming student skills. Great question. Uh past students have told me six to eight hours per week. Okay. Um, so 40 to 60 hours I think um over the course of your eight weeks is I think what what folks have said. 5 to eight hours a week. Sorry, that's what people have said. Depends upon your experience and how deep you want to go. Felipe or Phipe, sorry, asked, "What time will these webcasts be?" I'll It's on the page below and I'll you'll see it again. Sundays at 8:00 p.m. Eastern, Tuesdays at 100 p.m. Eastern. Okay, I tried really hard to pick two times that would appeal to the m to nearly everyone in the world. Uh, at least one of those times hopefully works for you, but um, you can always watch the recording. Okay. Um, can we have panc instructorled webcast over the weekend? Well, I realize for some the Sunday night is not the weekend. For many of my students, it is. Um I, you know, I I tried really hard to come up with two times. Um that would work for as many people as possible. Hopefully, one of those can work for you. I understand if it's not, if neither is ideal, hopefully the recordings are good enough for you. You can always post it ahead of time and watch a recording, see the chat. Um, so hopefully that it's helpful. Wolf Gang asks, "Will the topics for the two webcasts in a given week be the same?" I don't think so, but I'm not sure. This is this is a new thing I'm trying, so I don't want to promise it will be one way or another. I tend to think it will be different. Um, I tend to think there will be questions and I will spend a lot of time on that and maybe I'll teach the same thing each uh webcast. um if I'm adding an additional little lesson. Um but it's it's really hard to say. Um what I will say is even if you don't want to attend, um it should be really easy uh to kind of scan through that 30 minutes sometime during the week, see if you missed anything. Um yeah, hopefully that's that's helpful. It's a new thing, so I just I don't want to promise it'll be a certain way. Um, well commented lesson notebooks. If you've gone through my scikitlearn series or my Panda series and seen the notebooks, you know that I am obsessive about really good comments. Um, you will get well commented code. Okay. Um, the homework assignments, they are long and thorough. Uh, they contain a lot of guidance to help you. The solutions are long and detailed and everything is explained. Um, so I really want you to get a lot out of the homeworks. Um, feedback. Um, we will usually give feedback on your homework, um, within one day, within 24 hours. Sometimes it'll be 48. Um, because, you know, it it takes time. So, it'll mostly be Alex. Um, but I will be doing any he's not able to get to in time because we want to make sure you get feedback quickly. Um, how long Elanor asked, "How long do the homeworks take per week?" So, that is baked in to the the hours estimate of 5 to 8 hours a week. So, uh, just to give you a preview, videos per week is going to be usually 2 to 3 hours. And um you could easily spend a ton of hours going through the pre-class work or the post-class resources. Most people pick a thing or two in there and consume those during the week, but um that amount of time is up to you. For the homeworks in particular, um they differ. Uh the shortest homework for someone who's already like comfortable with the material is probably like 30 minutes to an hour. Um, one of the homeworks is super long and pretty hard and even if you're pretty good um, already at some of this material, it would take you a couple hours. Um, but you could take a lot longer. Now, that being said, the homeworks, um, what was I going to say about that? Um, so I was reading the next question and you lose your train of thought. Uh, the homeworks. Oh, okay. This is super important. A week two homework, which by the way is, I think, the longest homework, you don't have to submit it during week two. You have eight weeks. So, it's if you're finding that a homework is taking too long, don't worry. There's no homework deadline other than you have eight weeks for everything. Okay? You have eight weeks total for the course. So, the week 2 homework, you have seven weeks to work on if you really want to. Okay? So, the homeworks will probably on average take a few hours. Um, but it's really dependent upon your level of fluency. Um, no, no worries. Uh, uh, Philipe or Felipe, uh, yes, the materials are both Python 2 and three compatible. Okay. Um, let's see. I want to make sure I get through all this stuff, so I will I will try to move quickly here. Um, Slack team, that's where you can ask myself and Alex for help. We will usually get back to you within like 12 hours, um, if not sooner, but rarely will it be more than 24 hours before you get an answer. Um, there are some preclass resources before every class. There are postclass resources after every class. Um, and new this time around is a private Kaggle competition, which should be fun. I love private Kaggle competitions. Um, and I've done them before in my in-person courses, and I think this will be a lot of fun. Completely optional. Um, let's see. Uh understand way uh penosh I'm not clear on your question. Uh feel free to clarify or shoot me an email afterwards. I'm not clear about that. Eleanor asked for intermediate uh I read this course is for intermediate Python users. I'm more of a beginner. Um does this mean I won't be able to do the coursework? Let me um that's a tough one. Let me um get back to that in a little bit and uh when I talk about prerequisites and I think you'll see you'll have a better sense and if it it doesn't give you enough sense, let's communicate over email and figure it out. Okay. Um okay. Uh and finally, uh money back guarantee. Just so you know, um out of a 100 plus students, only one person's ever asked. They they were just like, I don't have time for this. Never mind. And that's fine. um within the first two weeks of the course. Um, no problem. I want this course to be a great fit for you. Um, I'm just saying it's only happened once where anyone's ever um done it. So, okay. Uh, course outline. There's a lot of detail in here. I will save most of this. I'm just going to give a highlevel overview real quick. Um, there's five modules. Uh, these modules are spread out over six weeks. Um module one is about the basic machine learning workflow for working with text. Uh module two is about using natural language processing techniques to benefit your machine learning models. Okay. So this is not a a kind of like this is not a classic NLP course. Okay. This is a machine learning course that uses natural language processing to its benefit. Okay. Um, and I could spend a while teach about that. I had a lesson in my miniourse that talks about kind of the difference between machine learning with text and uh natural language processing. Um, but um yeah, I'll I'll leave it at that. This is a machine learning focused course. Okay. Um, module three is about uh extracting text features from messy data sources using regular expressions. If you've tried to learn reax regular expressions and you found it painful or confusing, I promise I teach it in a different way and I think it's better. Um, I'll let you decide ultimately. But, um, I think Rejax is a powerful tool and you'll be glad you learned it. And if you've had pain in the past learning it, um it will be better this time around. Um okay, module four is about the entire workflow for solving a textbased data science problem using scikitlearn and pandas. So I want you to get good at using scikitlearn and pandas together and I want you to see and participate in how you work a data science problem from start to finish. not just how do we do the machine learning part but how do we do the entire workflow. Okay. And then module five um super fun advanced machine learning techniques. Um this is the stuff that a lot of people on YouTube keep asking me for. Um and uh I it's just in the course. I haven't put it publicly but this is advanced machine learning techniques to improve model accuracy and the efficiency of your workflow. Okay. So um most of this it's hard to understand if you haven't done it and I don't expect maybe there's a few of you who have used all of these techniques but for the most part um this is new material for everyone. Okay. Um, lots of cool techniques you will learn in scikitlearn especially. Okay, couple questions. I'll catch up on that. Uh, Pankosh, uh, being a working professional, I only find time over the weekend. Um, yeah, I think that if I will say two things. One is if you are dedicated and um you can really put in time like and you can really focus on it for an entire day on the weekend. Yes. Um one downside is you know I want you if you get stuck to ask a question and if you get stuck at one point and then you ask a question and you really can't get any further without it. there's some benefit in kind of splitting up your work over multiple days. So, that would be the downside. Um, but ultimately it's up to you. Uh, can you focus for that amount of time uh on a weekend day? Um, and uh yeah, I mean I h I definitely have had students do that. Okay. Um, it's just there's a downside of being able to ask questions at the right time. Uh S asks, is the Kaggle competition similar to a data science project that we can publish on a resume? Um it is a data science project, but it's it's like a Kaggle competition in that I've defined the problem for you and I've given you the data. So I don't know if like I think the most valuable projects for a resume in particular are projects you've done from start to finish and the Kaggle competition is not like I'm only giving you I'm giving you the data set and defining the problem. So whether or not you would put it on your resume um it's hard to say. Uh but I tend to think start to finish projects are are best. Um, okay. Uh, and Marvin, uh, suggested an edex course on Python that I've heard is very good. Um, so feel free to check that out. Okay. Boy, I um I know I'm probably not as far in this page as I should be, but that's okay. Um, I'm answering questions and we're we're getting to the the right stuff. So, okay. Uh, when is the course? Uh, September 20 through November 22nd. Uh, is this a beginner course? And actually, um, there's this neat feature where, let me click on, uh, start answering. Let's see. Um, sorry, one second. This is just to be of benefit to folks who will watch this later. Um, are there Oh, okay. So, I just clicked a button that indicates I'm answering the question. what are the prerequisites for this course? And that will answer a couple questions folks have asked. Um, so this is not a beginner course, but what does that mean? Okay, here are the prerequisites. You should be comfortable working in Python. Now, what does comfortable mean? Comfortable means does not mean you are master. It means you understand the basic data structures. You know uh like how to use like functions, you know how to run code, you know how to use lists and dictionaries, maybe sets, maybe tupils. You know how to create strings. You know how to interact with the interpreter. You're comfortable with your environment. I don't care if you've I mean I I welcome you to be a a Python master but that is not required for the course. So it's hard for me to say you know how comfortable do you need to be. Um it should not make you nervous to open your Python uh environment and start working on well um it's really hard for me to give like a a general answer other than comfortable with data structures and writing basic code. I'll come back to one other thing that might help on that. Um second prerequisite you should understand the basic principles of machine learning and uh almost everyone um that is here has said uh in the poll that they do understand the basic machine learning principles. Okay. Um probably more challenging. You should be comfortable using scikitlearn. Okay. So, Scikitlearn, um, I've got a video series on that. This, if you are wondering if the course is right for you, um, I would say, uh, going through if you, if you've never used Scikitlearn, it is possible that you, uh, in fact, it's very possible you can be ready for the course in two weeks. Now, um, I've had students that have never used Scikitlearn and sign up for the course and get ready and are successful in the course. Most of them have built machine learning models in other languages and they're just picking up like how scikitlearn thinks. Now, um, if you've never built a machine learning model in any language, it's hard for me to say whether you can be ready in two weeks. I will say I have an excellent 4hour video series on machine learning with scikitlearn. It will take you from the basics to intermediate level. Okay. Um I highly recommend going through that and if the level of code and this may answer your question Elanor I think. Um, if the level of code in the scikitlearn video series is not too challenging for you, if you can get it, then you're going to be fine with the level of Python in the course. Okay, so that's a tricky one. Feel free to email me more about this point and we can talk through it. Um, but you do like the my this course, the master course does not assumes you know some scikitlearn. Okay, basic learn. Uh, you should have at least limited experience with pandas. You don't have to be fluent. You can definitely get up to speed on pandas between now in the course. And you don't need any advanced math skills. Okay. Um, let me answer some questions. Uh, if I'm late on some homework, will I be able to submit and get feedback after November 22nd? Um, no. I'm sorry. I can't like I can't commit myself or my um assistant to you know kind of help for the course uh after November 22nd. So that is the end date of the course. I will be in Slack every now and then uh for the um students and I'm happy to answer questions but we won't be doing homework review after the course officially ends. Um let's see. Will this be a paid course? Yes, and I'll get to the cost in a little bit. Um, great. Uh, Eleanor asks, "Can you share the Scikitlearn video series you just mentioned?" Yes, I will just paste it in um I was going to say Slack, but in Crowdcast right now. Um, check that out. Um, okay. It is already 2:40, sorry, 2:40 Eastern time. Um, so I will try to browse through. Let's see. Let me uh go back to questions and done answering. Okay. Um next up, uh how do I know whether I'm ready for the course? Um we we talked a little bit about that. What types of people have taken this course? A lot of different folks. A lot more than I expected. Um you are most likely, if you're wondering, am I going to be out of place in this course? The answer is probably no. Um, I've had a lot of analytics folks and data science folks take the course, business intelligent folks, a data journalist, lots of engineers, directors of engineering, um, folks who manage data scientists, um, as well as kind of classic scientists, researchers, computational linguists, mathematicians, but also plenty of grad students, actually a lot of grad students. Um, and then people whose at least job title doesn't necessarily indicate data or science. Um, like a creative director, project managers, Python instructors. I even had one Kaggle master enroll which was pretty cool. Um, uh, yeah. Why should I learn how to work with text? Well, um, I don't have time to really sell you in depth, but most knowledge out there is in text form. And if you can learn how to use text in your machine learning models, you are far better off in terms of um how your fluency with machine learning. Okay. Um this slide I did not make it. Let me just make sure you know. Uh I don't make beautiful uh images like this. It's not my uh it's not my forte. Um uh let's see. Let me kind of continue on lower. Um, as I mentioned previously, Python 2 and Python 3 both 100% acceptable. What libraries will we be using? Uh, this was asked um, previously. We're going to be focusing on scikitlearn and some pandas. We're going to use the re module for regular expressions. We're going to make limited use of numpy, scypi, mattplot liib, seabour, and text blob. Um, you do not need to know anything about modules other than scikitlearn and pandas to enroll in the course. Okay? So, don't worry if you've never even heard of scypie or seabor. Okay? We'll use them a little bit. We will not however be using NLTK and that's the subject for a longer discussion. But basically my philosophy is learn a small amount of libraries in Python and learn how to use them really well. And if you are machine learning focused, you probably belong in scikitlearn rather than NLTK. And there's a lot you can do with text in scikitlearn. So I find NLTK unnecessary for almost all of what I do. Okay. So, uh, NLTK definitely has its place for classic natural language processing, but, um, you know, scikitlearn is where we're going to spend our time, and you'll find you can do a ton in terms of machine learning with text data. Now, that brings us back to uh, one of the poll questions that I wanted to mention, which was um, what's your experience with natural language processing? And I want you to know you do not need to know anything about natural language processing in terms of um enrolling in this course. Even if you have no understanding of it, totally fine. Okay, it's different when I'm talking about machine learning. You do need some knowledge there and it looks like most almost all of you have some already. Okay, but natural language processing, you do not need to know anything about NLP in order to enroll in this course. Okay. Uh let's see. Um the course material is definitely up to date. I challenge you to find other online courses out there where every time a library gets instructed, sorry, every time a library gets updated, the instructor goes back and makes sure all their code still works. So like scikitlearn.18 is going to come out and there's going to be some breaking API changes um maybe in the next month or two I think um I will go and update the code so that it still works um or at least provide different options so you can make sure the code you are getting from this course always works. I continue to work on this course adding resources improving lessons. Uh, I've still got more stuff I'm going to record or create this time around. Okay. Um, okay. Let's, uh, we've already talked a lot about how this course is different from other courses, so I'm going to kind of skip by that. Um, you know, a little bit about me. Um, and you can read more if you like. Alex, as I said, is excellent and we are super lucky to have him in the course. It's not just like I posted an ad like I desperately need a helper. This is someone I know and trust. Um and in fact, you know, sometimes in fact many times he answers questions and I think man Alex you know more than me in certain things. So we are super lucky to have him here. Uh Pancage asked any specific reason not to choose Python and not R for this course? I have a long blog post called should you teach Python or R for data science? Um and I enjoy both. Um and this course is in Python, but you could create a course like this in R. I will say I don't like working with text in R. Personally, I find the packages clunky as well. I prefer machine learning in Python, but that is a personal preference. Um which is why the course is in Python. Um uh let's see. Miguel, which distribution do you recommend to follow the course Anaconda? Um yes uh yes I do recommend Anaconda but you are welcome to use whatever distribution you like or you can install packages yourself. When you enroll you'll get uh instructions for getting set up. Um I'm interested in that link on R versus Python. I will um I will make a note to paste that um R versus Python and I'll paste it in the chat for you. Uh Phipe uh Annie Ruda, will we have access to materials after the course? Oh, thank you. Um will we have access to the resources after the course is finished? The answer is yes. And I will say more about that in a moment. Okay. Um, we're getting near the bottom. And then I will take all the rest of the questions. Let's see. Uh, 40 to 60 hours to complete the course. 5 to 8 hours per week on average. Um, I've designed this course in a way that you can stay on track with it more easily, I think. Um, and here's how we're going to do it. The course materials will be released to you one week at a time for six weeks, which should help you to focus. So that'll be materials get released on Wednesday. The webcasts will occur on Sunday and Tuesday so that you have a chance to work through some of the material, ask some questions in Slack and then if you like attend to the webcast, get more depth on a particular topic or assignment. Okay? So that will hopefully help you to focus on the material and uh keep track um because it is a lot of material and I want you to succeed in the course. I don't want to just sell you a course. I want you to get a lot out of it and I think this will help. Okay. Um homework feedback will come in a day or two. Slack feedback will come usually in about a day. The course material will be released over the course of six weeks. And the final two weeks is the Kaggle competition. Now, if you fall behind during the course, you don't have to do the Kaggle competition. Well, it's always optional, but you can spend the last two weeks catching up. And this addresses someone question, someone's question about, you know, what if I don't submit it in time. Well, I can't accept it after in terms of giving you feedback, but we are giving you two weeks of buffer to catch up because this has happened um frequently before where folks are like, "Oh, I just need a little more time." Okay. Um Pankage, I'll come back to uh I'll have to come back to that. Um let me just make a note uh about the Python stack. Um, let's see. What else do I want to say? Um, okay. What if I don't have enough time to complete the course? Oh, I want to let's see. So, uh, one of the questions, it was Joseph's question. Will we continue to have access to the course materials after the course ends? And I want to make very clear, yes. When you enroll in this course, you will have lifetime access to everything in the course. The only thing that ends on November 22nd is assistance from the instructors. Okay? So, the videos, the materials, the notebooks, as I continue to update notebooks and add things to the course, add new resources, you will continue to get access to it. Um, so it really is lifetime access. Okay. Uh let's see what else and then I promise I will jump over to the qu the rest of the questions. Um you will receive a certificate of completion at the end of the course. Um I do send out anonymous postc course surveys at the end of the course. Students are asked to rate the scale the course on a scale from 1 to five from poor to excellent. You can read my ratings right there. Um, very close to five on average uh in terms of content quality, instructional quality, and value provided by the course. Um, 100% of students said that it helped them to make progress to their own personal or professional goals. Okay, so uh take that for what it's worth. Um, but past students have been happy. Harvey um has has been really nice in uh just kind of sharing data school with others and basically said this course is better than what he takes at university courses. Um which I really appreciate. Um, as I mentioned, uh, I offer a what I call a love it or leave it guarantee and I'm happy to give you a full refund if you, uh, are not if you enroll and you find this is not the course for me or I don't like it. Okay. Um, okay. Course options. So, um I'll just talk briefly about this and you can ask me more questions about this if you like, but uh let me find that. Um what is the difference? Benny asks, "What's the difference between the master and the standard course?" Um so the difference between the master and the standard course, they are the same exact content. Not a thing is different in terms of the actual course content and the videos. Okay. However, the standard course does not include instructor support. So, what does that mean? The master course students have access to the live webcasts twice a week. They have a an ability to submit their homeworks to us for feedback. uh master course students can get help from the instructors in Slack and Mastercourse students uh can compete in the private Kaggle competition. Okay, so um those are the four things that differentiate the master course from the standard course. By all means, if those four things are not of interest to you, I would encourage you to enroll in the standard course because I mean, you know, why why pay more than what you need? Okay, so uh this the standard course is great if you just want to work through it on your own. You know, you don't need any help, you don't want to ask questions, uh and you you know the private Kaggle competition doesn't interest you. Okay. Uh Erin Prasad asks, "Do we get a recording of the webcast?" Um that is not uh part of the standard course. So um the m the webcasts are really just it's a little community we'll have in the master course myself, Alex and 50 students and um the webcasts are just for them both in terms of the recordings and in terms of uh participating live. Okay. All right, let's see. What else did I want to mention? Um, oh, well, of course you asked, how much does the course cost? The master course is $595. The standard course is $295. Okay, both include lifetime access to the course. Um, both include a money back guarantee. I realize that this course is not affordable for many people and I understand. I totally get that. Um, I offer a lot of free content. I'm going to continue for years to come to put out more free content and I want my free content to be better than what everyone else is selling. Um, but my premium content that I take hundreds of hours to build, um, well, my free content takes hundreds of hours also, but some some content I do sell. And, um, I hope that you'll find it's it's the best best content out there in in this particular topic area. Um, okay. I've said a lot about the structure of the course. Um, but just kind of a quick recap. uh when you enroll in the course you'll get immediate immediate within a day you'll get access to the slack team and precourse materials. So if you are asking me like what do I need to do to prepare for the course to be successful in the course you will get that as soon as you have enrolled in the course or within a day. Okay. Um, starting September 28th, uh, every Wednesday, a new week of course materials will be released. Um, live webcasts begin October 2nd, every Sunday and Tuesday. Homework solutions will be posted a week after um, the homework has been released and the course ends on November 22nd. Um, I've mentioned the timing of the live web live live webcasts, Sunday at 8 and Tuesday at 1, uh, both PM, both Eastern time. Um, figure out if one of those is a good good fit for you. Uh, the Kaggle competition is during the last two weeks of the course. Um, uh, and as I said, you can use those two weeks, uh, for your own purposes just to catch up. Uh, will the course sell out? I don't know. It might um I set a limit because I can't myself and Alex can't support an unlimited number of students. We want to provide a high quality student experience for every person that enrolls and we can't give our attention like answering questions and and reviewing homework with an unlimited number of students. So, I've set it at 50 students. It has sold out before twice before. Um but I can't tell you whether it's going to sell out. I just don't know. Um, but it could happen. Uh, okay. I think that's it for what I had kind of prepared. I'm going to switch back to my face and then I'm going to start answering some of the other questions and um and I in in about two minutes I'm going to have three minutes I'm going to have to pause I'm going to like actually pause the webcast. Um, sorry. Let me I've uh Whoops. Uh, okay. I All right. My apologies. I need to share my Ah, here we go. Sorry, I'm new to CraftCast. And let's see how I can There we go. Sorry about that. Okay. So, um you should be able to see my face once again and not my screen. Um okay. So, great. Great. So, I'm going to start answering some of the questions below the page, but in two minutes, I'm going to stop for about 30 seconds and uh turn on course enrollment. So, if you already know you want to enroll, I would encourage you to go ahead and enroll. Um, okay. Let's see here. Other questions. Uh, top voted question. Okay. Will there be a course in the winter in case we can't do this one? Great question. Uh, there will definitely not not be this course in the winter. Okay. So, um, you know, obviously I have to kind of plan my life around when I, uh, run these courses and I'm not available to for kind of substantial projects this winter. I've got other things going on. Um, the follow-up question is when is the next course? And I would say the earliest it might be March, April, May, June next year. Um, I can't give you any more specifics. I haven't decided. I haven't planned it out. I I just can't tell you. So, if you know you want to take this course, um I would encourage you to enroll and try it out and you've always got lifetime access to the material if um uh if you run out of time. And that's why I've given you eight weeks rather than six. But it's really up to you what timing works for you. I just can't tell you today when I'm going to um uh offer the course again. Okay. All right. So, I am going to answer the rest of these questions as well as ones people have posted in chat. I am going to like literally take 30 seconds and uh I am need to launch the course uh using Gumroad. I will paste a link you can use to enroll. And thank you for uh bearing with me while I do this. Um and I think that's that and publish. Okay, you should I'm going to paste a link in chat to enrolling in the course. If you're interested in enrolling, you should be able to enroll right now. Um, and let me know if that link doesn't work. Um, okay. And, uh, let me get back to answering your questions. And I will, just so you know, I will stick around here while there continue to be questions about the course. You're welcome to drop off if you've got what you need. Um, but uh, yeah, next question. Okay, from Jonah Keegan. In my limited experience, the often boring data cleaning process for unstructured text is equal in length and complexity to the sexy and fun classification work. Does the course cover the data cleaning stuff? That's essentially what he said. Um, and I like that you describe classification, modeling, and analysis as sexy. Um, yes, the answer is yes. Um, we're going to be building a data set. We're going to be extracting features. Uh we are going to be dealing with uh talking about how we handle Unicode errors. We're going to be using regular expressions to clean uh clean messy data sets. We're going to be using pandas to get good at um we're going to learn how to use pandas to clean data. Okay. So the answer is yes. Um, there's a lot of modeling of course, but I try to provide a realistic workflow. Okay. Um, Eric just said, "Enrolled. See you guys on the inside." Enricon probably. Um, okay. Next question from Saul. What NLP problems are best solved via machine learning versus classical language analysis approaches? Great question. So um machine learning and NLP. Okay. Uh here's how I would summarize it. Okay. Superi supervised machine learning is its primary focus is prediction and predictive accuracy and understanding is a secondary goal of the methodology. I'm not talking about people. I'm talking about the methodology. Understanding is a secondary consideration. It's a side effect. Okay. natural language processing and those classic techniques, they are focused on analysis and understanding first and prediction second. Okay. Um, in fact, there's lots of NLP that's not focused on prediction. It's just understanding. So, it all depends what your interests are. If you are into machine learning, you are probably interested in prediction and maybe interested in understanding your models in depth and we will talk about that. But uh that's not core to the machine learning workflow whereas the NLP classic uh procedures focus on analysis and understanding. Um, if you're someone who already knows some NLP, um, this course provides kind of like an alternative mindset. Okay. A different way of thinking about how to solve problems with text. Okay. All right. Uh, let me just catch up here. Uh, Arun Prasad has enrolled. Awesome. Elenor enrolled. Awesome. Um Howard says, "Enrollment page. The only option is standard class, not master." Um just use this link right here uh that I'm going to paste in Slack. Whoops. Uh it's telling me I've already sent that message, but I'll say use this link. Okay. Um let me know if you have any problems once we're done with the webcast. shoot me an email and I'm happy to troubleshoot any problems folks are running into with uh enrolling. Okay. Uh I'll answer these and then there were some other questions I missed in the chat. Uh all right. What is the demand for these skills machine learning with text in the job market compared to other specializations? Love it. Great question. So uh let me say this about the course. It will help you become better at machine learning and machine learning is a super useful data science skill. It's core to the data science workflow. Um most most people would say I would agree. Um so even if you're not interested in the text part, it will help you become more marketable as a dat

Original Description

"Machine Learning with Text in Python" is now available as self-paced online course. Learn more about the course and enroll TODAY: https://www.dataschool.io/learn/ This info session was recorded on September 13, 2016. View the chat history and complete Q&A: https://www.crowdcast.io/e/text-course?rfsn=402783.36d99 In this course, you'll learn how to build effective machine learning models using text-based data to solve your own data science problems. Topics include: - Feature extraction from unstructured text using scikit-learn - Model building, evaluation, and inspection - Using Natural Language Processing techniques to improve your models - Feature engineering from messy data sources using regular expressions - Creating an effective machine learning workflow - Advanced machine learning techniques (pipelines, ensembles, model stacking, randomized search, etc.) "One of the best, if not the best course I have taken." - Amit Dingare, Director of Data Science Subscribe to the Data School newsletter to receive priority access to future courses: http://www.dataschool.io/subscribe/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data School · Data School · 54 of 60

1 Setting up Git and GitHub
Setting up Git and GitHub
Data School
2 Navigating a GitHub Repository - Part 1
Navigating a GitHub Repository - Part 1
Data School
3 Forking a GitHub Repository
Forking a GitHub Repository
Data School
4 Creating a New GitHub Repository
Creating a New GitHub Repository
Data School
5 Copying a GitHub Repository to Your Local Computer
Copying a GitHub Repository to Your Local Computer
Data School
6 Committing Changes in Git and Pushing to a GitHub Repository
Committing Changes in Git and Pushing to a GitHub Repository
Data School
7 Syncing Your GitHub Fork
Syncing Your GitHub Fork
Data School
8 Allstate Purchase Prediction Challenge on Kaggle
Allstate Purchase Prediction Challenge on Kaggle
Data School
9 Troubleshooting: Updates Rejected When Pushing to GitHub
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
10 Hands-on dplyr tutorial for faster data manipulation in R
Hands-on dplyr tutorial for faster data manipulation in R
Data School
11 ROC Curves and Area Under the Curve (AUC) Explained
ROC Curves and Area Under the Curve (AUC) Explained
Data School
12 Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
13 What is machine learning, and how does it work?
What is machine learning, and how does it work?
Data School
14 Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
15 Getting started in scikit-learn with the famous iris dataset
Getting started in scikit-learn with the famous iris dataset
Data School
16 Training a machine learning model with scikit-learn
Training a machine learning model with scikit-learn
Data School
17 Comparing machine learning models in scikit-learn
Comparing machine learning models in scikit-learn
Data School
18 Data science in Python: pandas, seaborn, scikit-learn
Data science in Python: pandas, seaborn, scikit-learn
Data School
19 Selecting the best model in scikit-learn using cross-validation
Selecting the best model in scikit-learn using cross-validation
Data School
20 How to find the best model parameters in scikit-learn
How to find the best model parameters in scikit-learn
Data School
21 How to evaluate a classifier in scikit-learn
How to evaluate a classifier in scikit-learn
Data School
22 What is pandas? (Introduction to the Q&A series)
What is pandas? (Introduction to the Q&A series)
Data School
23 How do I read a tabular data file into pandas?
How do I read a tabular data file into pandas?
Data School
24 How do I select a pandas Series from a DataFrame?
How do I select a pandas Series from a DataFrame?
Data School
25 Why do some pandas commands end with parentheses (and others don't)?
Why do some pandas commands end with parentheses (and others don't)?
Data School
26 How do I rename columns in a pandas DataFrame?
How do I rename columns in a pandas DataFrame?
Data School
27 How do I remove columns from a pandas DataFrame?
How do I remove columns from a pandas DataFrame?
Data School
28 How do I sort a pandas DataFrame or a Series?
How do I sort a pandas DataFrame or a Series?
Data School
29 How do I filter rows of a pandas DataFrame by column value?
How do I filter rows of a pandas DataFrame by column value?
Data School
30 How do I apply multiple filter criteria to a pandas DataFrame?
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
31 Your pandas questions answered!
Your pandas questions answered!
Data School
32 How do I use the "axis" parameter in pandas?
How do I use the "axis" parameter in pandas?
Data School
33 How do I use string methods in pandas?
How do I use string methods in pandas?
Data School
34 How do I change the data type of a pandas Series?
How do I change the data type of a pandas Series?
Data School
35 When should I use a "groupby" in pandas?
When should I use a "groupby" in pandas?
Data School
36 How do I explore a pandas Series?
How do I explore a pandas Series?
Data School
37 How do I handle missing values in pandas?
How do I handle missing values in pandas?
Data School
38 What do I need to know about the pandas index? (Part 1)
What do I need to know about the pandas index? (Part 1)
Data School
39 What do I need to know about the pandas index? (Part 2)
What do I need to know about the pandas index? (Part 2)
Data School
40 How do I select multiple rows and columns from a pandas DataFrame?
How do I select multiple rows and columns from a pandas DataFrame?
Data School
41 Machine Learning with Text in scikit-learn (PyCon 2016)
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
42 When should I use the "inplace" parameter in pandas?
When should I use the "inplace" parameter in pandas?
Data School
43 How do I make my pandas DataFrame smaller and faster?
How do I make my pandas DataFrame smaller and faster?
Data School
44 How do I use pandas with scikit-learn to create Kaggle submissions?
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
45 More of your pandas questions answered!
More of your pandas questions answered!
Data School
46 How do I create dummy variables in pandas?
How do I create dummy variables in pandas?
Data School
47 How do I work with dates and times in pandas?
How do I work with dates and times in pandas?
Data School
48 How do I find and remove duplicate rows in pandas?
How do I find and remove duplicate rows in pandas?
Data School
49 How do I avoid a SettingWithCopyWarning in pandas?
How do I avoid a SettingWithCopyWarning in pandas?
Data School
50 How do I change display options in pandas?
How do I change display options in pandas?
Data School
51 How do I create a pandas DataFrame from another object?
How do I create a pandas DataFrame from another object?
Data School
52 How do I apply a function to a pandas Series or DataFrame?
How do I apply a function to a pandas Series or DataFrame?
Data School
53 Getting started with machine learning in Python (webcast)
Getting started with machine learning in Python (webcast)
Data School
Q&A about Machine Learning with Text (online course)
Q&A about Machine Learning with Text (online course)
Data School
55 Your pandas questions answered! (webcast)
Your pandas questions answered! (webcast)
Data School
56 Machine Learning with Text in scikit-learn (PyData DC 2016)
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
57 Write Pythonic Code for Better Data Science (webcast)
Write Pythonic Code for Better Data Science (webcast)
Data School
58 Web scraping in Python (Part 1): Getting started
Web scraping in Python (Part 1): Getting started
Data School
59 Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
60 Web scraping in Python (Part 3): Building a dataset
Web scraping in Python (Part 3): Building a dataset
Data School

The Machine Learning with Text in Python online course covers the fundamentals of machine learning and natural language processing, with a focus on practical applications and hands-on experience. The course is designed for beginner and intermediate Python users, and covers topics such as text-based data science problem solving, data cleaning, and feature extraction. By the end of the course, students will be able to build and deploy machine learning models for text data, and apply machine learni

Key Takeaways
  1. Enroll in the course and access the course materials
  2. Complete the video lectures and homework assignments
  3. Participate in webcasts and ask questions
  4. Apply machine learning principles to text-based problems
  5. Build and deploy machine learning pipelines for text data
  6. Use scikit-learn, pandas, and Kaggle to solve text-based data science problems
  7. Clean and preprocess text data
  8. Extract features from text data
  9. Build and train machine learning models for text data
💡 Machine learning with text is a super useful data science skill that can make users more marketable as data scientists, and the course provides an alternative mindset for solving text problems.

Related AI Lessons

Up next
This Cop Was Held Accountable For His Brutality! #police #lawyer
Hampton Law
Watch →