Regular Expression Tutorial Python | Python Regex Tutorial

codebasics · Beginner ·🛠️ AI Tools & Apps ·4y ago

Key Takeaways

The video tutorial demonstrates the use of regular expressions in Python to extract information from text, specifically from Tesla's company filing, using tools like regex101.com and the re module in Python.

Full Transcript

are you building your career as a programmer or a data scientist and you want to know about one technical skill that can differentiate you from rest of the crowd well that skill is regular expression assume you are working in some finance company and you want to extract information from tesla's company financial report in this video i'm going to show you how you can do that in python using regular expressions and make sure you practice the exercise that i'm going to give you at the end of the video i google tesla's company filings which will take me to this particular website where you can see companies annual and quarterly filing reports and i'm going to just apply a filter here and click on this 10q pdf and that pdf will help have information on companies financial numbers i'm going to keep things simple and extract the titles of these two notes section so note 1 is overview note 2 is summary of significant accounting policies let's say you are a python engineer and you you have already used ocr to extract this text from this document and now you are using regular expression to get the titles you know these two titles so that's your end goal we are going to open regex101.com which is an amazing website that i love personally i use it all the time to build my regular expression here first and then i use that in my code this is short of like a test pad that you can use it all the time again i'm using this website all the time here on the right hand side you will see the special tokens that you will be using in regular expression for the pattern matching now for pattern matting pattern matching of course you can use simple string match in python but using regex is going to make your life much easier and we'll see that in few seconds let's say i have a very simple string i mean i'm going to go to this note section little later but let's say just for net practice let's say this is a simple string i'm giving you elon musk number if you have any questions on dodge point by the way this is a dummy number so don't even try okay here i want to extract a phone number okay from this number you can say okay tesla's revenue is whatever it's not 40 billion really but there are a couple of numbers in this text and you want to extract a phone number if phone number is 10 digit so whenever you say a 10 digit sequence or a pattern in your text you want to extract that now this is 10 digit but this is two digits so you want to extract this but not that how do you do that look at here slash d so any digit is represented by slash d so let's try that slash d so when you say slash d and if you look at match information see it it has so many matches because it matches a single digit how do i say that i want to you know match multiple digits well you can do this lastly now it will match two digits see two consecutive digits see 99 91 and so on how do i say three consecutive days it went see three consecutive days 999 one one one six six six four consecutive digits five six seven eight nine 10. now this is one way you can extract phone number because this regular expression is saying that extract any text which is 10 consecutive digits there is a better way to write the same thing where see in this help if you click on common tokens check this it says exactly three occurrences of a so i want to do same thing i want to say exactly 10 occurrences of digits 0 to 9 and if you look at match information see it exactly match that so this is how you use these tokens what if i have a different number in a different format sometimes in u.s the numbers can be represented as like this you know let's say test class ea phone number is this now this number is in a little different format but it's still a valid usa number if you want to extract this as well as that what do you do okay first we know okay this expression let me just put it here this is the expression for this simple number let's write a regular expression to extract this particular number okay in this number we have a bracket first so whenever you want to do a match let's see if i to do elon say it is it is matching that so i can do this and it will match that but bracket has a special meaning it's a spatial character see bracket let's see capture everything and close so it has a different meaning i will go there in a bit but assume bracket is a special character and whenever you have special character and if you want to do a literal match you will put slash here doing that will exactly match that and then i have three digits exactly three digits and see if you have 10 digits you do this if you're three digits you do that cool and then you have another bracket but bracket again bracket is a special character so you need to put slash here see this looks complex but if you gradually build it it's not that complex then you have hyphen then you have exactly three digits then hyphen then exactly four digits see now it is matching this particular pattern pattern if you have anything less it will not match see it is highlighting which means it is matching and you can see it here so i want so i have this regular expression for this number and this expression for that number so how do i do or i want either of these to be matching and for r i have this character see a or b so i can now do a or b cool see now it is matching this number and that number now let me do the same thing in python code so in my jupyter notebook i have imported a module called re and that's what we use for regular expression and i have that text here okay so what is my text so my text i'll just copy paste that text here so my text is this my pattern is this and you'll do re.find all find all will find all your matches you will supply your pattern here then text here and then use the matches okay see it match both of these numbers so this is a very effective way now if you want to write the same thing in a plain python without using regular expression try it out it's gonna be much difficult all right awesome now let's go back to our main problem which is extracting the note title so i'm going to copy this paste this huge code block and again my goal is to extract node title which is overview and summary of significant accounting policies so let's try this out so to match note you will say note then there is space okay and after that there could be any digit 1 2 and so on and you already know that four digit what do we use well we use here see slash d so you will do slash d then space then hyphen then space okay so so far we matched note one and note two and you'll see the character range is zero to nine character in my text block is note one and this character range is not two now think about it you want to capture everything that starts from here till you find slash n so slash n is a new line so anything that comes in a way before you encounter slash n which is a new line you want to capture all of that how do you say that in regular expression language check this any character except a b or c so when you want to say i want any character except this particular character you will say this particular um you know this this carriage return so let me remove it and just just to make things simple i will type something in so let's say i have all these characters you know and now you want to say any character except this and that so to do that you will do this particular character and you will say say so now it matches all this character a s j l f l whatever but it did not match this character and that character and if you want to do a sequence you can do plus as well so when you do plus it is like one or more of those characters so one or more of those characters which is not semicolon and hyphen so see now look at match information it match this this that but it did not match semicolon and this okay we have back to our node example sorry for back and forth so node then space then slash d node space and then slash d then space hyphen space now any character except slash n so you would do something like this okay so you're saying any character but slashing now when you do that see it is matching o it is matching o here it is matching s here but i want that repetition you know i want any character except slash n and a character sequence i don't want only one character and when you want to match one or more of those characters you will say a plus c plus so i will say plus if you want to match zero or more of character a you will do star so the right thing if you have a blank title and if you still want to get that the right thing to do here will be star okay hooray look at this i have my title see overview summary of significant policies so i matched those two but i want only titles so my title is see my title here is overview i want to extract overview and then summary of significant policies so to match those titles you know see there are two things one is match information which is the which is sort of like a string match but you want to extract only a portion of that match so the portion of that match is anything that start after this pc after this space so you will put a bracket here when you put a bracket what happens is it will perform a match but from that match it will capture everything that is within those brackets so what is within those brackets see here capture everything and close meaning bracket and after that inside those brackets whatever that is it will capture all of that so now i am going to just copy paste this pattern here in a pattern variable and i will say re dot find all pattern text and see now what i find is i have overview and summary of significant accounting policies see when you look at the expression itself it looks difficult i mean to me it looks difficult but if you build it slowly step by step it is not that hard now the next thing i am going to do is i took some text block from this document and i will extract the company financial periods so company financial period is anything that starts with fy after that there is a year and then there is a space and then there will be quarters so it could be q1 q2 q3 q4 one quarter is three months that cannot be q5 so i want to extract from this tax fy 2021 q1 and f5 2020 q4 so let's do this in our test pad first as usual so i'm just going to copy paste this thing here and remove this okay so what is the pattern pattern is it always starts with f5 so let's put fy first then there should be four digits exactly four digits how do you do that you already know that it's less d four see four digits so now my match one is this match two is this after that there is a space see this dot means space so space and then there is q and then see now i cannot do slash d see doing slash d matches this but it will match things like this too q5 q5 is not a valid financial quarter the number has to be either one two three or four and if you look at this help it says a single character of this so if you want to explicitly mention your choices what you can do is one two three four so match either one or two or three or four and see it match these two but this did not match okay you can see this in match information this is one way the other way is this see you can specify a range so any number in range 1 2 4 any number in range one two four friends this is not hard stay on this i'm telling you this is easy see think about it take a pause think about this one to four and it matched that see this is super easy it's not hard at all and i'm going to now copy paste this pattern here and then um you know just copy this and by the way you can store this in a variable for matches and print matches instead see it matched both of this now what if there is a you know this case sensitive f5 for example see it's a lower case and if you want to tackle lower case there there are flags that you can match the flags is equal to re dot and if you read the documentation you can just say python regex flags you will find all these flags and i'm going to use ignore kc re dot ignore case and when you ignore case it will match capital f y as well as small f5 now when i extract this financial information sometimes i want to extract only 2021 q1 and 2020 q4 i don't want to put fy okay one option is okay i extract this and then i remove f5 character explicitly or you can be a little smarter kid and you can use this bracket we already saw that in regular expression there is something called match and after you match something you can extract sub the part of that match using a bracket so part of that match using a bracket would be this so i put a bracket here and when i put a bracket see my group 1 is 2020 q1 q4 so now let me extract that particular thing here well i can just put a bracket here and you will see now i have only the information that i need now instead of financial number let's extract the actual values you know instead of financial basically periods we are extracting the actual values for those periods so which was 4.85 billion and 3 billion how do you do that well see i can have n number of numbers you know i can have things like tesla's employee count is let's say 5400 so when i extract those numbers i want to extract i don't want to extract this i want to extract anything that starts with dollar so how about we put a dollar sign well if you put a dollar sign again dollar is a special character see dollar means end of the string so it matched this thing i don't want that actually so for that reason what we will do is this we will put slash so when you put a slash it is an escape sequence and now you are doing a literal match literal match meaning you are not using dollar in a special way your if you have a dollar in your actual string you are doing that match and it found these two matches after that you want to say any digits so any digit is slash d okay but see when you do that it is matching this i can do any digit again but now it did not match because there is this decimal so instead i can do something like this you know any digit and then decimal now decimal again dot is again a special character and you need to put slash here and when you say plus see doing plus will match one or more character if you do just this it will match only first character you want to say match all the repeating characters until you find space or something you know so any digit or dot match that cool you can also do something like zero to nine zero to nine again means the same thing any digit okay see here a character in this range so i am going to now put this here and see it meant this and once again you don't want to have dollar sign in your end result so for capturing group you can do bracket when you do bracket see in the match information match is dollar 4.85 but the group is 4.8503 so it removed dollar so whenever you put something in bracket the group will have the content inside those brackets only it will not include this dollar sign so i will do bracket here and see those dollar signs are gone now let's take a little difficult task which is i want to extract both the financial period so fy 2021 q1 and also 4.85 you see so there should be financial period which says fy 2021 q1 is this and f5 2020 q4 is 3 billion so i want to extract both of it how do i do that well first you will write an expression to to extract the financial period okay for financial period what was our expression it was this right so i will just copy paste that okay now after i have my financial period there could be you know n number of characters and then there will be a dollar sign so my pattern is this after my financial period any character but dollar so how do you do any character any repeated character except dollar how do you do that well you use this one right if you already saw previously any character but dollar is this okay so this is saying we are putting this last because dollar is a special character but this is really dollar and this will say any character and this bracket is just a syntax and you will say plus plus means repetition you know okay now let me just make this a little bigger here okay so we are going from here and we are going all the way here after you find dollar so dollar is what slash dollar so see go friends go slowly this is not complex i'm telling you go slowly don't get confused focus slash dollar so slash dollar is matching this after that you know what is the expression for matching this number where expression is this so i'm just going to copy paste see we are building these blogs one by one and just doing copy waste okay dollar i remove cool and then you are putting that in a bracket so as a result what happened was see group one is this group two is this so now i'm saying 2021 q1 number is 4.85 2020 q4 is number three so i can now copy paste this particular thing here see i copy wasted that's that thing and now matches is this it's a tuple and this is awesome now i'm saying 2021 q1 number is this 2020 q4 number is that other than find all there is a method called search okay and let's see what hap the search has a different response where you have to do matches dot groups but here it will search for the first occurrence whereas the find all method will find all the occurrence so that is the difference all right so that's all i had for this tutorial now let's move on to the most important part of this video which is an i have given the link of this exercise in the video description below you have this notebook where you need to extract few things from this text and all you need to do is you need to fill out this blank you need to say what is the right regular expression for this given problem so read this problem and then try to write the regular expression and make sure you use regex101.com and once you have attempted on your own you can click on this solution link don't click on the solution link if you have not tried it otherwise it will download a special virus and your computer will start burning in fire all right i hope you enjoyed this if you did please give it a thumbs up share it with your friends and make sure you practice practice is the most important thing when it comes to coding thank you

Original Description

Regular expression python tutorial. I will take a real life example of extracting information out of tesla's company filing and show you how you can use regular expression in python to extract some of the required information easily. Code: https://github.com/codebasics/py/blob/master/Advanced/regex/regex_tutorial_python.ipynb Exercise: https://github.com/codebasics/py/blob/master/Advanced/regex/regex_tutorial_exercise_questions.ipynb ⭐️ Timestamps ⭐️ 00:00 Introduction 00:32 Coding 24:34 Exercise Do you want to learn technology from me? Check https://codebasics.io/ for my affordable video courses. 🌎 Website: https://codebasics.io/ 🎥 Codebasics Hindi channel: https://www.youtube.com/channel/UCTmFBhuhMibVoSfYom1uXEg #️⃣ Social Media #️⃣ 🔗 Discord: https://discord.gg/r42Kbuk 📸 Instagram: https://www.instagram.com/codebasicshub/ 🔊 Facebook: https://www.facebook.com/codebasicshub 📱 Twitter: https://twitter.com/codebasicshub 📝 Linkedin (Personal): https://www.linkedin.com/in/dhavalsays/ 📝 Linkedin (Codebasics): https://www.linkedin.com/company/codebasics/ 🔗 Patreon: https://www.patreon.com/codebasics?fan_landing=true ❗❗ DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from codebasics · codebasics · 0 of 60

← Previous Next →
1 Python Tutorial - 1. Install python on windows
Python Tutorial - 1. Install python on windows
codebasics
2 Python Tutorial - 2. Variables
Python Tutorial - 2. Variables
codebasics
3 Python Tutorial - 3. Numbers
Python Tutorial - 3. Numbers
codebasics
4 Python Tutorial - 4. Strings
Python Tutorial - 4. Strings
codebasics
5 Python Tutorial - 5. Lists
Python Tutorial - 5. Lists
codebasics
6 Python Tutorial - 6. Install PyCharm on Windows
Python Tutorial - 6. Install PyCharm on Windows
codebasics
7 PyCharm Tutorial - 7. Debug python code using PyCharm
PyCharm Tutorial - 7. Debug python code using PyCharm
codebasics
8 Python Tutorial -  8. If Statement
Python Tutorial - 8. If Statement
codebasics
9 Python Tutorial - 9. For loop
Python Tutorial - 9. For loop
codebasics
10 Python Tutorial -  10. Functions
Python Tutorial - 10. Functions
codebasics
11 Python Tutorial - 11. Dictionaries and Tuples
Python Tutorial - 11. Dictionaries and Tuples
codebasics
12 Python Tutorial - 12. Modules
Python Tutorial - 12. Modules
codebasics
13 Python Tutorial - 13. Reading/Writing Files
Python Tutorial - 13. Reading/Writing Files
codebasics
14 How to install Julia on Windows
How to install Julia on Windows
codebasics
15 Python Tutorial - 14. Working With JSON
Python Tutorial - 14. Working With JSON
codebasics
16 Julia Tutorial - 1. Variables
Julia Tutorial - 1. Variables
codebasics
17 Julia Tutorial - 2. Numbers
Julia Tutorial - 2. Numbers
codebasics
18 Python Tutorial - 15. if __name__ == "__main__"
Python Tutorial - 15. if __name__ == "__main__"
codebasics
19 Julia Tutorial - Why Should I Learn Julia Programming Language
Julia Tutorial - Why Should I Learn Julia Programming Language
codebasics
20 Python Tutorial  - 16. Exception Handling
Python Tutorial - 16. Exception Handling
codebasics
21 Julia Tutorial - 3. Complex and Rational Numbers
Julia Tutorial - 3. Complex and Rational Numbers
codebasics
22 Julia Tutorial - 4. Strings
Julia Tutorial - 4. Strings
codebasics
23 Python Tutorial -  17. Class and Objects
Python Tutorial - 17. Class and Objects
codebasics
24 Julia Tutorial - 5. Functions
Julia Tutorial - 5. Functions
codebasics
25 Julia Tutorial - 6. If Statement and Ternary Operator
Julia Tutorial - 6. If Statement and Ternary Operator
codebasics
26 Julia Tutorial - 7. For While Loop
Julia Tutorial - 7. For While Loop
codebasics
27 Python Tutorial  - 18. Inheritance
Python Tutorial - 18. Inheritance
codebasics
28 Julia Tutorial - 8. begin and (;) Compound Expressions
Julia Tutorial - 8. begin and (;) Compound Expressions
codebasics
29 Python Tutorial - 12.1 - Install Python Module (using pip)
Python Tutorial - 12.1 - Install Python Module (using pip)
codebasics
30 Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)
Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)
codebasics
31 Julia Tutorial - 10. Exception Handling
Julia Tutorial - 10. Exception Handling
codebasics
32 Python Tutorial  - 19. Multiple Inheritance
Python Tutorial - 19. Multiple Inheritance
codebasics
33 Python Tutorial - 20. Raise Exception And Finally
Python Tutorial - 20. Raise Exception And Finally
codebasics
34 Python Tutorial - 21. Iterators
Python Tutorial - 21. Iterators
codebasics
35 Python Tutorial - 22. Generators
Python Tutorial - 22. Generators
codebasics
36 Python Tutorial - 23. List Set Dict Comprehensions
Python Tutorial - 23. List Set Dict Comprehensions
codebasics
37 Python Tutorial - 24. Sets and Frozen Sets
Python Tutorial - 24. Sets and Frozen Sets
codebasics
38 Python Tutorial - 25. Command line argument processing using argparse
Python Tutorial - 25. Command line argument processing using argparse
codebasics
39 Debugging Tips - What is bug and debugging?
Debugging Tips - What is bug and debugging?
codebasics
40 Debugging Tips - Conditional Breakpoint
Debugging Tips - Conditional Breakpoint
codebasics
41 Debugging Tips - Watches and Call Stack
Debugging Tips - Watches and Call Stack
codebasics
42 Python Tutorial - 26. Multithreading - Introduction
Python Tutorial - 26. Multithreading - Introduction
codebasics
43 Git Tutorial 3:  How To Install Git
Git Tutorial 3: How To Install Git
codebasics
44 Git Tutorial 1: What is git / What is version control system?
Git Tutorial 1: What is git / What is version control system?
codebasics
45 Git Tutorial 2 : What is Github? | github tutorial
Git Tutorial 2 : What is Github? | github tutorial
codebasics
46 Git Tutorial 4: Basic Commands: add, commit, push
Git Tutorial 4: Basic Commands: add, commit, push
codebasics
47 Git Tutorial 5: Undoing/Reverting/Resetting code changes
Git Tutorial 5: Undoing/Reverting/Resetting code changes
codebasics
48 Git Tutorial 6: Branches (Create, Merge, Delete a branch)
Git Tutorial 6: Branches (Create, Merge, Delete a branch)
codebasics
49 Git Github Tutorial 10: What is Pull Request?
Git Github Tutorial 10: What is Pull Request?
codebasics
50 Git Tutorial 7: What is HEAD?
Git Tutorial 7: What is HEAD?
codebasics
51 Git Tutorial 9: Diff and Merge using meld
Git Tutorial 9: Diff and Merge using meld
codebasics
52 Difference between Multiprocessing and Multithreading
Difference between Multiprocessing and Multithreading
codebasics
53 Python Tutorial - 27. Multiprocessing Introduction
Python Tutorial - 27. Multiprocessing Introduction
codebasics
54 Python Tutorial - 28. Sharing Data Between Processes Using Array and Value
Python Tutorial - 28. Sharing Data Between Processes Using Array and Value
codebasics
55 Git Tutorial 8 - .gitignore file
Git Tutorial 8 - .gitignore file
codebasics
56 Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue
Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue
codebasics
57 Python Tutorial - 30. Multiprocessing Lock
Python Tutorial - 30. Multiprocessing Lock
codebasics
58 Python Tutorial - 31. Multiprocessing Pool (Map Reduce)
Python Tutorial - 31. Multiprocessing Pool (Map Reduce)
codebasics
59 What is code?
What is code?
codebasics
60 Python unit testing - pytest introduction
Python unit testing - pytest introduction
codebasics

This video tutorial teaches how to use regular expressions in Python to extract information from text, with hands-on examples and code snippets. It covers topics like pattern matching, string matching, and text extraction, and provides practical steps to build and test regular expressions.

Key Takeaways
  1. Open regex101.com to build and test regular expressions
  2. Use special tokens in regular expressions for pattern matching
  3. Match 10 consecutive digits in a text using regular expressions
  4. Extract numbers in different formats using regular expressions
  5. Use the 'or' operator to match either of two patterns
  6. Use the 'findall' function in the 're' module to find all matches in a text
💡 Regular expressions can be complex but can be built gradually, and using tools like regex101.com can make it easier to test and refine them.

Related AI Lessons

How to Create a Second Version of Yourself Inside Obsidian Using AI (Step-by-Step Guide)
Learn to create a second version of yourself inside Obsidian using AI with a step-by-step guide
Medium · ChatGPT
How to prepare for Spain civil service TIC exam using AI in 2026
Learn how to prepare for the Spain civil service TIC exam using AI in 2026, boosting your chances of success with technology-driven study techniques
Dev.to · David García
Going Viral! How I Created AI Kissing Videos Step by Step Easily Using AIAI.com
Create viral AI kissing videos using AIAI.com in a step-by-step process, leveraging AI technology for creative content creation
Medium · AI
How to prepare TIC teacher exams in Spain with AI (oposiciones 2026)
Prepare for TIC teacher exams in Spain using AI with these actionable steps
Dev.to AI

Chapters (3)

Introduction
0:32 Coding
24:34 Exercise
Up next
Low-Tech, High-Impact: Replacing Your Receptionist With a $15 AI Phone System
Maximum Lawyer
Watch →