Regex for Hackers (with Python)

John Hammond · Beginner ·🔐 Cybersecurity ·8mo ago

Key Takeaways

The video 'Regex for Hackers (with Python)' by John Hammond demonstrates the use of regular expressions for pattern matching and redaction in cybersecurity, utilizing tools such as Python, Kali Linux, and Sublime Text. The course covers topics such as regular expression basics, pattern matching, and data redaction, with a focus on ethical hacking and cybersecurity.

Full Transcript

Today we'll be doing some work with regular expressions or regex. And if you aren't familiar, regex is an awesome tool for you to be able to track down information across whatever data that you might be looking at. It's pattern matching to find text. You can think of it really simply as like control F, but with a heck of a lot more power to find, and search, and even replace any info you'd like. So, I'm on my Kali Linux virtual machine, and for us to be able to play with this, I'm going to open up a terminal with control alt T, F11 to full screen, and I'm going to move into a directory that I have for us called regex redact. And I have this super simple file here, info.md or info.text, and I can open that up in a text editor. I'll use Sublime Text. And this includes just some sample fake data for us to try and search through or do some regular expressions work. Now, I want to use this example because you might readily think of regular expressions or regex used for hacking. And that's really still what it is. Awesome, incredible tool for computer science, for technology, for cybersecurity, ethical hacking, penetration testing, bug bounty. But to have a little bit of a practical example here for us, I wanted to use regular expressions and pattern matching to be able to more redact or obscure some sensitive information that might, I don't know, be in your reports for work that you might do, for anything that you're up against. I think redacting and trying to remove private sensitive information is still one awesome way to use and put to use regular expressions. I did want to give you a heads-up though on HackingHub, hackinghub.io, where you can learn a little bit more ethical hacking, penetration testing, bug bounty, and a whole lot of sweet hubs or exercises where you can learn, and even some courses to get some more hands-on material with all of this work. They just released their regular expressions or regex for hackers course. There's a ton of other awesome sweet stuff on HackingHub, but if we were to zoom in on the regex for hackers material, this whole thing is put together by Nam Sec, Ben Sadeghpour, and Adam Langley Build Hack Secure talking about how you can use regular expressions to bypass filters, spot vulnerabilities, and bring regex and pattern matching to real-world hacking. And the best part is all of the hands-on challenge labs. All the practical application-based learning so that you can actually build patterns and solve real problems. Whether you're beating up server-side forgery, cross-origin resource sharing, working in recon, working in OSINT, all the awesome stuff that you might dig into for bug bounty hunters, penetration testers, and red teamers. Look, if you're interested, there is a link in the video description that will include the voucher that you could apply. It'll automatically be included in the link below, or you could manually go ahead and enter jhregex, and that will take about 50% off, so you can get this thing pretty easy, just 25 bucks. Big shout-out, good stuff, HackingHub, link below in the video description. So, all of this is fake data, right? Not my real email, not my real date of birth, or phone number, or card number, or all of this. But this is what we could get started with as just a small simple activity to test and learn some regular expressions. And I think it would be awesome if we could build out a tool that will do this regular expression pattern matching and redaction in mass. Like, why not have something that could rip through all the files in a folder for whatever data you're up against, or just have something that could read through a file and then clean up, redact, and obscure the data that shouldn't be publicly shared. I've had to do this quite a bit, and look, I think, "Hey, it'd be fun to maybe work with Python, maybe script some of this." So, I'm here at the documentation page for the RE module in Python to work with regular expressions. We'll go ahead and write some Python code so we could build out a tool to try and redact or obscure or make private some of the sensitive information in our info.md or .text or .log or whatever you want in your sake. I'm going to link the documentation in the video description because I think this is super duper helpful as a reference if you're just aren't familiar with a lot of the different things regex can do because even that includes sort of a little library as to what all the syntax or the special characters inside of regex really mean. Because when we're used to using something like control F or the hotkey to search for something in a document, you're used to just searching for the literal string. Like, if I want to go find my email address john.hammond@fake or whatever, but what if I wanted to redact not just my email, but everyone's email in a giant pile of data, like a breach report even. Same thing for date of birth, for phone numbers, for card information, blah, blah, blah. There's a lot of variety there, so we'll use regular expressions to find all that in mass. The thing about regular expressions though is that it normally ends up considering regular, usual, or ordinary characters in ordinary text as a literal value. Like, ordinary characters of the letter A, uppercase A, lowercase A, or zero, they're the simplest regular expressions. They just match what they are. Like, the letter A, number zero, whatever. Think of that as like the regular control F or find functionality that you're used to. But regular expressions amps that up with some special characters. That affects how regular expressions or the pattern that you're searching for could be interpreted. Like, you could repeat any certain kind of character, or you could match anything, any kind of thing, or the position, like the beginning of a line or the end of a line where some data or info could be. I'll walk through a good many of these special characters while we get to use our practical example, but again, if you want to learn more, I'll leave a link to this in the video description and how that could be used within Python. So, let's go ahead and get started with our own regex redact.py or Python script, and I'll use a shebang line as usual, user bin environment Python 3, and let's import that built-in RE regular expressions module. And to make this more of a sweet utility for us, let's go ahead and import argparse so we have an argument parser, and we could just simply provide our info.md or any other file that we want to work through. Let's define the argument parser object as a parser, and then let's go ahead and add some arguments for us to work with. Add argument. Let's use the input that we want. That's fine as just a positional parameter, and then let's go ahead and parse our args from our parser object and store that in the args variable that we could now work with. With that set for us, let's try and open up like the file contents of our args.input. Realistically, that probably could have been named better as like a file or file path. And let's read the contents, and we can store that as a variable contents. How about that? So, let's get back to our terminal, and since we added the shebang line, we can add the executable bit to our regex redact script. So, rather than needing to run it just with Python as the prepending command, we could just dot slash or actually execute the script. With that, it will ask for our input, and we could provide our info.md file. Now, it will display all of the contents because it has read that in the script. But now we got to do the actual regular expressions part. So, now let's get back to our script, remove our print line, and let's start to define some patterns or regular expression patterns and actually regular expression objects that Python will be able to use to search for data or search for that pattern in whatever we're looking at. Before we do though, honestly, let's play a little bit with the info and the data that we have because Sublime Text, the text editor that I'm working in, when I use control F or control H for even find and replace, while we search for whatever we want down below, I can also toggle on regex or regular expressions mode. Let me click that, that dot star, and you might be familiar with that because the dot in a regular expressions matches anything, a special character to represent any other kind of character. Let me go ahead and click the find all button down in the bottom right. Obviously, this now has everything selected, and you can see I even have multiple cursors because Sublime Text is nice and convenient and lets that happen. But we've only matched one character. We just have multiple matches over and over and over again. A match of every single character. But realistically, I want to be able to match everything in one go, right? If that's where the example of like a dot star, like you might have seen just down below when we used the dot and this icon to refer to regular expressions, the dot asterisk is now working line by line. Let me click find all, and you can see all that. Now, there aren't all these vertical bars cuz we're not matching one character at a time, but now line by line, we're matching the entire length of the string. And that, when we use that dot star, the star or the asterisk, by default, is greedy. And what that means is that it's going to try to get multiple repetitions of the previous character, and we use the dot or the special character to mean anything, and it's going to get as many of them as it possibly can. The star actually means zero or more of the previous character, but another special character for the plus symbol kind of does the same thing, or repeating it indefinitely as possible to match as much as it can, but that is one or more of the previous character. Again, greedy. But if I were to add another special character here like a question mark, I know this is weird, that makes it lazy or non-greedy. It's not going to try to capture everything for as much as it can in one match. Once again, you see all those sort of columns or boxes where we've only got just one character returned in multiple matches. Maybe that doesn't make sense, so let's get back to our Python code so you can see that in action. Let me define a simple silly everything RE or everything regular expression where we'll use an R prefix for a raw string in Python, and I'll move my face again. When R is the preceding character before single quotes or double quotes in Python, it'll naturally literally interpret the raw contents of the string. You normally see that used with regular expressions. So, let me search for a dot just as we have previously, but now let's go ahead and compile that everything RE pattern. Realistically, it's probably smarter to actually use the re.compile function around our regular expressions pattern. So, then this everything_re object is something that we could then start to match and work with things on. Let me zoom out a little bit when we have the contents data now defined, and let's try and print our everything.re.match to find regular expression matches based off of the data that we want to work with. Wrapped inside the parentheses as the argument to work with, let's use the contents that we've just read in from that file. Now, if I were to run this back in my terminal, I know this will probably look kind of silly and stupid, but we could just use that single dot character now in automated way with our script and our code to get one match object. The very beginning of the file, just the character J. And that's all it returned because regex by default will give us the first occurrence of a match. And there's another function you could use, you could use something like search. And if I were to go run this again, you'll get the exact same result, but there's some nuance there. We'll talk about that as we get into other lines that we're going to be processing, but obviously this is getting just the absolute beginning of the data. What's to stop us though from using another function included in the regular expression library and these regular expression compiled pattern objects, one that's called find all. That's not going to return a match object like a type that's available within and because of this library. It's actually going to return a big long list of all of the matches. And since we've only matched a single character or a dot, it's getting us a single character of everything. But what if we were to add in our special characters like the dot star or the dot plus? Remember, the dot star was zero or more of the previous character, plus is one or more of the previous character. So, let's try to run that. Okay, now this is retrieving all of the lines using that find all function, but if we were to switch that to go back to match or search, and we'll talk about the difference in between the two of them in just a moment, but that's only going to return match being just John or the very first line, even including the space at the very end. So, let's look back at the documentation here. So, there's a section here called module contents that outlines and lists all the different sort of variables, all the different sort of flags, properties, functions, and methods, and things you can do with the built-in RE module. But the functions here are worthwhile for us to explore because we just used that re.compile. Remember, that's how we built out sort of our regular expression object, which we could then use the match and search and other methods that we've seen. We passed in a pattern, and then it was able to build this out. You could just as easily use the re.search and re.match patterns and functions like at the top level of the library, but usually it's recommended to just compile the regular expressions pattern matching object first because it's better, it's faster, it's it's more clean, and you can do a little bit more, and that way you can reuse a pattern if you need to rather than just passing it in alongside the string that you're working with with these other functions. There's full match, there's split, they give us a couple examples here. Here's our find all again. And let me get to once we have a compiled object, we'll actually be able to work with those functions like we saw with our regular expression objects. We could retrieve out of it the pattern itself like the one that was used by and within re.compile, and we could search just like we have. The distinction here though is that dot search and dot match are both going to return a corresponding match object. We'll explore that in just a moment, but search looks through the entire string or a line of text just like we'd seen cuz it works line by line here. And that's probably what you're most used to when you're thinking of oh, control F or just naturally searching for something, right? But the match function that you might normally think of because oh, you're getting a match object back has a little nuance. It's searching at the beginning of a string, the absolute start. So, here the example they show, if you were to compile just a pattern to look for the letter O, and then you match it against the word dog as okay, the data you're looking through, it's actually not going to find it. It's not going to return a match because O is not at the very beginning. Letter D is. But if you were to provide a position just following okay, one character into the string dog, then we start here, and we can go find O at the very beginning. Take a look at those other arguments, you can include the position or end position whenever you want to look for a match with either search or match or full match or find all and all the other functions that we could call here. Anyway, I didn't want to belittle that too much, but I did want to make sure that was clear to you. There is a distinction between the search and match function, and they call this out even with some of the others. Match checks only the beginning of the string while search matches anywhere in the string, and that's probably what you might expect or used to when trying to control F search for something or pattern match with any data. But remember, our objective here is to actually replace or redact some of the sensitive and private details here. So, let's start small. Let's look for a date of birth, how about that? And while we could just search for the literal characters 1983, 07, 12, or whatever, that isn't really going to be all that useful if there's more data. We want to look for anything that looks like a year or month or date pattern. Now, this has some nuance, right? Because you got to keep in mind, what if the data were to look like uh month month date date year year year year and be in a different format. You kind of have to have that understanding or know a little bit of what the data is going to look like or try to build out all those different possible edge cases. For the moment, let's keep it easy, let's start with just what we know we're going to see with maybe a hyphen or a forward slash for any of these representations, but we could make that sort of delimiter character totally optional or have regex still be able to figure out okay, I'll match both or either one of those. The way that we look for those numbers though is by using some of these special characters. And we can assume okay, it's going to be maybe four digits for a full year or maybe one digit or two digits for any month or date. So, we can use some repetition that uses these curly brace characters. Could be something that is exactly however occurrences or repetitions of the previous character or a min max sort of bounds like oh, maybe just one day like October 7th or two days like October 17th, right? And the question mark again might be an option to make that non-greedy, of repetitions or the max when we use a plus sign. But how could we search for these numbers for one thing? Well, you could use some square braces, that's one option. The square braces will indicate a set of characters or like anything that you want to be able to look for. You could just list out AMK as an example they use here, or you could use a range from like zero to five or zero to nine or A through F all upper case, A through F lower case, and anything that you might be looking for. While you could do all of that within the square braces, there are some other options if we were to use like a backslash character. I'm going to skip over some of the grouping information because we're just going to replace that, and I do want to let you explore that within the course, but you could use maybe backslash D as a special symbol to indicate look, I want any decimal digit. That also includes the zero through nine set within square braces. When you see that backslash D for like digits you can think of, there's also backslash S for sort of spaces or white space characters like new lines or the space character or a tab character. There's even backslash W to look for word characters, maybe your letters. That does match some digits as well though, alphanumeric, and even includes the underscore, so you should know what you're using when you reference those. But I bring that to your attention because each of those as their representations has an inverse opportunity. Like if you were to look for digits with backslash D, you could just as easily find anything that's not a digit with backslash upper case or capital D. That uses the same sort of set with the square braces we just discussed, but it adds essentially a carrot inside of the square braces. And that is another special nuance. Outside of square braces, the carrot should refer to the beginning of a line, but if it's the first character inside of a square brace set, that means not any of the characters that are included here. Let me go find that. Here it is, back in that section. If that includes the vertical carrot up at the beginning of the square braces, all the characters that are not in that set will be matched. That doesn't matter to us too much, though. Now we've got a little bit of the special sauce to be able to track down, not just 1983 or one year, but any year, right? Or even any sort of number. Let's use the backslash D. There you go. You can see that's noting all these digits, and maybe we could use the square brace set for like zero through five. Take a look. Numbers like nine and eight, obviously anything above five is not going to be matched. But if we did this zero through nine, and then we were to match the curly braces to denote a repeat amount of occurrences, while we're noting only any digit as the first character we're looking at. Let's expand that out to just maybe two occurrences of it, or three, or four. Nice. Okay, so we are capturing the year, but now we're also capturing some other nonsense in a phone number or a credit card or something in a UUID. So let's try and clean that up. Let's search for, literally for the moment, remember a literal character, that hyphen or the dash. Also still getting caught in our UUID, but we got to make sure we'll get the rest of our date of birth. Well, remember we could use zero through nine or backslash D. Either of those two will work. Often times it might be more efficient for you to use the set here. You might just find some difficulty in other engines that don't use that quite as well. But now let's try to get the group of that or match the repetitions that could be either one character in length or two. Well, because we have a minimum here, we're still getting the UUID that starts with the four after the hyphen, but we haven't quite finished getting the rest of the date of birth. Well, we could just duplicate the process that we have here to go find or determine that one. Now, even if I were to change this to be, oh, I was born on the 7th of the date, that still makes sense. If we're to use a date that doesn't make sense, like, oh, I was born on the 123rd day, well, that's not going to be captured. So now we have one example regular expression that we could use, even though very simple, that will work to maybe redact our date of birth. So let's build out in our script how we could kind of clean that up. Let's prepare a big list of patterns, and actually, let me define it as a dictionary. That way I could say the date is actually going to map to a regular expression compiled object of R for a raw string with our pattern included there. Now I have a reference that I could use to say, okay, the key for our date is going to refer to this regular expression pattern. So later on, I can really easily loop through all the patterns that we're up against, and then appropriately replace whatever we want to redact, right? Question is, how do we get this to actually put in the redaction? Well, you might have seen it as we were scrolling along, we could use a function called sub. Think of that as like substitute, right? Here it is at the top level where we're working with a pattern. Since we're using a regular expression compiled object, we wouldn't need to provide the pattern. That's already all included. In fact, let's go find the sub function call that is used for a object. And this says it returns the string obtained by replacing the leftmost non-overlapping occurrence of the pattern in the string by replacement repl. Okay, to replace, right? What will be replacing it with? If the pattern isn't found, then the string is returned unchanged. Okay, so that should be pretty easy to use with what we're working with, right? Let's try to actually loop through all of our patterns, knowing that we'll eventually add more for the other types of info that we're going to redact. Let's say, for every redaction key and redaction pattern in the patterns that we're going to work with, let's go ahead and replace the contents with that. Since the redaction pattern is already a compiled object, we can go ahead and simply sub, replacing or substituting whatever it found with maybe a little format string where we could say redacted and then the redaction key. So it knows what exactly it was trying to redact, and we have some more context for that. And of course, we'll be replacing it across all of the contents. And we can clean this up a little bit, add some more white space so that's clearer and easier to read. Now we've got a sweet little tiny engine that will go ahead and do the replacement or redaction, no matter what patterns that we provide. So could I now print out or display the contents after we've gone through that first round of redaction? Oh, but before I do, I do want to actually loop through all of the items in this case. That way we'll smartly get both the key and the pattern out of the dictionary that we've defined up above. Oh, and let's also add a space there just so the redacted bit is a little bit more clean. We can add even a couple more square braces. Now let's get back to our terminal, and let me run our regex redact on our info document, and it has properly redacted the date of birth. Simple and easy, right? Regex put to work. Now we could build out any patterns that we want for the rest of the things we want to try and redact. We've got digits in math pretty well figured out, but we actually sort of skipped over one of the possibilities. Like I was alluding to, if this were using forward slashes or another representation of the date. Well, we could just use another set, right? Within the square braces here instead of the hyphen. Maybe that is a hyphen or inside of this square brace set here, a forward slash. Let's try to work with that. Another option for an optional or either/or characters could very well be, maybe if you had a longer delimiter or a change here, is actually using different parentheses. But then when you don't add some extra little syntactic sugar, that becomes a group, and it will be captured in some other part you can extract out of a pattern. We won't get into all that in this video, but just note those parentheses will get certain groups or parts and pieces of what you pattern extract. Inside this though, we could have a full word, and then use a vertical bar to end up splitting one or the other. We'll all be matched for that part or piece. So we could just as easily use a hyphen or forward slash and just make that an option included in there. That should still work both with our square brace set and the optional group end here. Let's try that out again in our terminal. Let me clear the screen, run this once again. That has properly redacted the date even when we're using now forward slashes for the date. Let's stick with one though. Let's just use the square brace set. Now it should be pretty easy though to go ahead and play with what we might find for phone numbers. This is interesting though because we have a plus sign, which as you know is supposed to be a special character in regular expressions. So we'll have to escape it out. And we just use a backslash to get that done. We might have an optional character though. What if there is no space here? What if they don't have a break with a delimiter of like a dot or an hyphen? Well, we can mark any sort of character as optional by adding another question mark in there. Remember that's sort of like that lazy operator or non-greedy. You can also think of it as, eh, should this be here or not? So let's try to go find our phone number looking for a literal plus sign. Remember we're using the forward slash to escape that in regular expressions, and it might have a digit one or more, right? So we could use our zero through nine, and then the plus for the area code, and then the space could be there or it could not be. So let's just literally add a space character, but let's make it optional with that question mark just following it. Then we chunk up what might be in a phone number. And different regions might have different pieces of a phone number, so again, you could map that to however many characters, however many digits you're expecting with the repetition that we've seen previously with the curly braces. I'm expecting to see three, but you could modify that, as you know, for min, max, other opportunities. And then we could have a set of maybe a period, maybe a hyphen, maybe any other delimiter, or question mark, maybe that's not even there. It's optional. And then we can do this again for the next chunk of numbers, and then another that's actually going to be four characters to get that full phone number. Okay, at least just experimenting, that seems to give us a fine phone number pattern. So let's add that in to our capability of our script. Let's say phone number, can be a compiled regular expression object with r r for a raw string. Paste all of this in, and now let's try to run our script nice and easy. Redacted phone number. Easy. But that is flexible. Like if this were to just now not use any of those delimiters, and we're just this big long string here, running it again, not even making any changes to our script, still redacts it just fine. You get the gist? We could probably just as easily carve out one for a credit card, one for a MAC address, and let's try and bang those out here. You might find yourself though thinking, "Hmm, could we do even more validation though? Like a credit card number is probably going to follow suit with a specific format, or the representation of the numbers and like the Luhn or l u h n algorithm. So you might have to build in more extraneous logic that isn't strictly tied to regular expressions to be able to carve that out, but still stuff to get you thinking about, "Hmm, what else should we be doing in like a big utility that we might really genuinely want to use for all this mass redaction." Let's get our device MAC. That could be zero through nine or an A through F or maybe lowercase A through F for like hexadecimal, and that's probably going to be in chunks of two. It's probably going to have a colon here, and we could fan that out to get the rest of it. Simple and easy, but this will work for us getting just a little prototype and learning a little bit of the regular expressions here. That works great. Redacted credit card number and redacted MAC address. I'll get a UUID for us done just as well. Those are likely going to be lowercase hexadecimal values, and I think the pattern here is eight characters in length, four four four, and then 12 at the very end. But you might make your regular expression even more sturdy. Try to add more capability just in case the data is represented in any other sort of different way. For our learning though, I think this will work. And to be honest, you could make this better. I'm probably not using all the most absolutely perfect genius regular expression pattern, but this is getting the job done. If you got a better way to go about some of these, please do let me know in the comments. I know especially, hey, for a lot of this that is repeated, maybe we could even repeat some of those repeat potential. But let's get to one example that might let us be a little bit more clever in how we could validate even an IP address with regular expressions. You might be able to think, "Oh, that's just four octets, right? Oh, they're all just three digits of numbers or zero to three or one to three, right?" In that curly brace range that we could duplicate for four different parts. But remember an IP address for IP version four is probably only going to go up to about 255. So I don't want this to gobble up and get everything if it's beyond that range. So how could we do that and capture that even in just text? Well, what if we tried to set bounds as to what might be in or or an IP address? Let me control F again so we can start to build this out. Let's say we had some groups where parts of this are optional, right? Because if an IP address is about maybe 255 or 240, like zero through nine as the potential up to there before breaking into the 25 segment of the max value of an IP address octet, 240, then the first position could be anything up to a number two, right? Like that could be a 100, and maybe a 199 if we get to that. And then for something that doesn't even have a hundredth position or hundredth place, just 42 or like one, we want to be able to match those. So with this as an example, could we match maybe in a group probably the maximum position could be up to 255, meaning that the absolute max that could be inside of a group is zero to five. That would get matched. I don't want anything else other than that though, but that should be or two in the beginning or the very first hundredth place of that up to zero to four, and then zero through nine for the next place or position, or maybe a one or a zero in some cases, right? So let me do zero and one and make that optional, and then any other digit zero through nine also optional because you might have something that's only in the tens rather than the hundreds. And then normally that's going to have a number alongside it, right? So that could smartly define within the bounds and range of an IP address up to 255, and then we could make sure we have a literal dot there. Now see how that's capturing these first three octets, but not the very last one. But don't forget, we're building out something to get the entire IP address, not just the chunks or parts and pieces all the different octets. And we have left our zero through nine as sort of mandatory that has to be there. There has to be some digit some way somehow. So if we were to now fan out our four octets, there's going to be another that we capture and another that we capture, and then one more that we capture without a following dot or period. Now we're smartly capturing IP addresses, and that will capture our original IP address that we're working with. So let's go ahead and use this. Add that into our patterns to replace, and now we can redact our IP address. Running this, clean it up. Okay. Now our final boss. Granted, this is really the beginning of the file that we just kind of skipped over, but our email address is where maybe we could capture some of parts and pieces of it. Honestly, we've got the smarts and the skills at this point. This should kind of be simple and easy for us. Especially when we know, okay, we could use the word backslash w to search for anything, or we could make that alphanumeric A through F A through Z A through Z really. Sorry, I was so used to thinking of hex. And that could include zero through nine or digits in your email address just as well. And there are other special characters that could be included, and maybe that's a period or dot, which will have its literal meaning inside of the square braces, or it could have an underscore, or a plus sign, or a hyphen or dash, and there's going to be one or more of those values, right? Plus sign. Now with its special representation, special meaning not inside of the square braces when it used its literal meaning. And that will be up to an at sign, right? Literal at sign, and then anything that might follow for the domain. But that could get a lot of stuff. There's potential that could break even further or past an email address, so maybe we add a literal backslash to escape and get the literal representation of a dot, and then the top-level domain like .com or .net or .org that follows. You might need to build out a whole long list of all the things that it could possibly be because otherwise you might run into some other trouble when you're trying to think of short domains like .io or .in or whatever. You could make that A through Z or A through Z, and then use a two and then a single comma not defining a bound or maximum end of it. That might work a little bit better. And now we've got that email regular expression that we can use. We can go put that way up at the top honestly and call our pattern matching little redaction script basically call it done for this example. Of course, we could make this even bigger and better. We could add more capability. We could have this loop through a ton of different files. We could have this redact more info or add a little bit more context here and there. But for our learning, I think this works. Let me try to run our redaction script. It does it just fine. So that was a little bit of regular expressions with the re module in Python. Granted, we didn't cover everything. I didn't even get a chance to touch on the flags or the configuration values that you might be able to set for the library like .multiline or .all or ignore case and all that. But realistically, I do want you to actually take this even further. I think it'd be cool to oh apply this for ethical hacking, for bug bounty, for looking at how you might be able to take advantage of regular expressions to go track down things in specific targets. Whether doing SSRF, maybe some WAF or web application firewall evasion. If you're up for it, would totally suggest going to take a look at the new regex for hackers course on hacking hub. There's a link in the video description, and you can go ahead and apply the voucher j h regex will take 50% off, and you can go ahead and get that for sweet 25 bucks. Thanks so much for watching everybody. Hope you enjoyed this video. Please do all those YouTube algorithm things like comment and subscribe, and I'll see you in the next video.

Original Description

https://hhub.io/jhregex || Check out the Regex for Hackers course on Hacking Hub! Code JHREGEX takes 50% off 😎 https://hhub.io/jhregex Learn Cybersecurity and more with Just Hacking Training: https://jh.live/training See what else I'm up to with: https://jh.live/newsletter ℹ️ Affiliates: Learn how to code with CodeCrafters: https://jh.live/codecrafters Host your own VPN with OpenVPN: https://jh.live/openvpn Get Blue Team Training and SOC Analyst Certifications with CyberDefenders: https://jh.live/cyberdefense
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from John Hammond · John Hammond · 0 of 60

← Previous Next →
1 Code Commentaries? PHP to JavaScript in Bash and PHP!
Code Commentaries? PHP to JavaScript in Bash and PHP!
John Hammond
2 Tutorials? MySQL connection with PHP and Bash!
Tutorials? MySQL connection with PHP and Bash!
John Hammond
3 Variable Naming in Python! Happy Birthday, Linux! Nokia N900!
Variable Naming in Python! Happy Birthday, Linux! Nokia N900!
John Hammond
4 JavaScript Splits The URL!
JavaScript Splits The URL!
John Hammond
5 HTML Tables in Python!
HTML Tables in Python!
John Hammond
6 HTML, Net Shares, GML!
HTML, Net Shares, GML!
John Hammond
7 Python 08 Programming Style and Comments
Python 08 Programming Style and Comments
John Hammond
8 Python 26 Object Oriented Programming
Python 26 Object Oriented Programming
John Hammond
9 75 Python Tutorials, Out Now!
75 Python Tutorials, Out Now!
John Hammond
10 Batch 14 Mathematical Expressions
Batch 14 Mathematical Expressions
John Hammond
11 Batch 85 Array Append
Batch 85 Array Append
John Hammond
12 Batch 86 Array Count
Batch 86 Array Count
John Hammond
13 Batch 87 Array Index
Batch 87 Array Index
John Hammond
14 Batch 88 Array Insert
Batch 88 Array Insert
John Hammond
15 Batch 89 Array Remove
Batch 89 Array Remove
John Hammond
16 Batch 90 Array Reverse
Batch 90 Array Reverse
John Hammond
17 Python [colorama] 00 Installing on Linux
Python [colorama] 00 Installing on Linux
John Hammond
18 Python [colorama] 09 Cursor Position
Python [colorama] 09 Cursor Position
John Hammond
19 Python [hashlib] 02 Algorithms
Python [hashlib] 02 Algorithms
John Hammond
20 Python 00 Installing IDLE on Linux
Python 00 Installing IDLE on Linux
John Hammond
21 Python [pygame] 11 Rectangular Collision Detection
Python [pygame] 11 Rectangular Collision Detection
John Hammond
22 Python [pygame] 12 Platforming Rectangular Collision Resolution
Python [pygame] 12 Platforming Rectangular Collision Resolution
John Hammond
23 Python [XML-RPC] 01 Research
Python [XML-RPC] 01 Research
John Hammond
24 Python [pyenchant] 03 Personal Word Lists
Python [pyenchant] 03 Personal Word Lists
John Hammond
25 FancyURLopener Authentication and User-Agent [urllib] 03
FancyURLopener Authentication and User-Agent [urllib] 03
John Hammond
26 Python 04: PEP8 Coding
Python 04: PEP8 Coding
John Hammond
27 Python Challenge! 17 COOKIES
Python Challenge! 17 COOKIES
John Hammond
28 Google CTF 2016: Ernst Echidna
Google CTF 2016: Ernst Echidna
John Hammond
29 Google CTF 2016: Spotted Quoll
Google CTF 2016: Spotted Quoll
John Hammond
30 Google CTF 2016: Can you Repo It?
Google CTF 2016: Can you Repo It?
John Hammond
31 Google CTF 2016: No Big Deal
Google CTF 2016: No Big Deal
John Hammond
32 Google CTF 2016: In Recorded Conversation
Google CTF 2016: In Recorded Conversation
John Hammond
33 Homemade CTF Challenge: 01 "Orchestra"
Homemade CTF Challenge: 01 "Orchestra"
John Hammond
34 Homemade CTF Challenge: 02 "Bae's Base"
Homemade CTF Challenge: 02 "Bae's Base"
John Hammond
35 Homemade CTF Challenge: 03 "Web Hunt"
Homemade CTF Challenge: 03 "Web Hunt"
John Hammond
36 Homemade CTF Challenge: 04 "UPX"
Homemade CTF Challenge: 04 "UPX"
John Hammond
37 Homemade CTF Challenge: 05 "The Assumption Song"
Homemade CTF Challenge: 05 "The Assumption Song"
John Hammond
38 Homemade CTF Challenge: 06 "A Brisk Stroll"
Homemade CTF Challenge: 06 "A Brisk Stroll"
John Hammond
39 Homemade CTF Challenge: 06 "I lost my password!"
Homemade CTF Challenge: 06 "I lost my password!"
John Hammond
40 web25 :: Mr. Robot : EKOPARTY CTF 2016
web25 :: Mr. Robot : EKOPARTY CTF 2016
John Hammond
41 web50 : RFC 7230 :: EKOPARTY CTF 2016
web50 : RFC 7230 :: EKOPARTY CTF 2016
John Hammond
42 misc50 : Hidden inside EKO :: EKOPARTY CTF 2016
misc50 : Hidden inside EKO :: EKOPARTY CTF 2016
John Hammond
43 Hack The Vote 2016 CTF: Sander's Fan Club [web100]
Hack The Vote 2016 CTF: Sander's Fan Club [web100]
John Hammond
44 Hack The Vote 2016 CTF Warpspeed [forensics150]
Hack The Vote 2016 CTF Warpspeed [forensics150]
John Hammond
45 Juniors CTF 2016 :: Black Suprematic Square
Juniors CTF 2016 :: Black Suprematic Square
John Hammond
46 Juniors CTF 2016 :: Six Strange Tales
Juniors CTF 2016 :: Six Strange Tales
John Hammond
47 Juniors CTF 2016 :: Lost Code
Juniors CTF 2016 :: Lost Code
John Hammond
48 Juniors CTF 2016 :: Here Goes!
Juniors CTF 2016 :: Here Goes!
John Hammond
49 Juniors CTF 2016 :: Southern Cross
Juniors CTF 2016 :: Southern Cross
John Hammond
50 Juniors CTF 2016 :: Clone Attack
Juniors CTF 2016 :: Clone Attack
John Hammond
51 Juniors CTF 2016 :: Dirty Repo
Juniors CTF 2016 :: Dirty Repo
John Hammond
52 Juniors CTF 2016 :: Hackers Blog
Juniors CTF 2016 :: Hackers Blog
John Hammond
53 Juniors CTF 2016 :: Voting!!!
Juniors CTF 2016 :: Voting!!!
John Hammond
54 Juniors CTF 2016 :: The Good, The Bad and The Junkman
Juniors CTF 2016 :: The Good, The Bad and The Junkman
John Hammond
55 Juniors CTF 2016 :: Stop Thief!
Juniors CTF 2016 :: Stop Thief!
John Hammond
56 Juniors CTF 2016 :: ROFL
Juniors CTF 2016 :: ROFL
John Hammond
57 Juniors CTF 2016 :: Restriced Area
Juniors CTF 2016 :: Restriced Area
John Hammond
58 Juniors CTF 2016 :: Oh SSH!
Juniors CTF 2016 :: Oh SSH!
John Hammond
59 HackCon CTF 2017 TRIVIA and BONUS Challenges
HackCon CTF 2017 TRIVIA and BONUS Challenges
John Hammond
60 HackCon CTF 2017 "Bacche" Challenges
HackCon CTF 2017 "Bacche" Challenges
John Hammond

This video teaches the basics of regular expressions and how to apply them for pattern matching and redaction in cybersecurity, with a focus on ethical hacking and bug bounty. The course covers topics such as regular expression basics, pattern matching, and data redaction, and provides hands-on examples using Python and Kali Linux.

Key Takeaways
  1. Open a terminal and navigate to a directory
  2. Use regular expressions to redact sensitive information
  3. Compile a regular expression object using re.compile
  4. Use the sub function to substitute patterns with redacted strings
  5. Run a script to redact phone numbers, credit card numbers, and MAC addresses
  6. Build a comprehensive regex pattern to capture IP addresses and email addresses
💡 Regular expressions can be used for pattern matching and redaction in cybersecurity, and are a valuable tool for ethical hackers and cybersecurity professionals.

Related AI Lessons

Security Belongs on the Blueprint
Integrate security into building design to mitigate physical and cyber risks
Medium · Cybersecurity
# A 4-Line HTML File Stole the Admin’s Secret — Intigriti LeakyJar CTF Writeup
Learn how a 4-line HTML file exploited a CSRF vulnerability to steal an admin's secret in the Intigriti LeakyJar CTF challenge
Medium · Cybersecurity
The Digital Gateway to Arabic Cybersecurity
Learn about the importance of language-specific cybersecurity solutions, particularly for Arabic-speaking regions, and how they can enhance digital security
Medium · Cybersecurity
Cybersecurity vs Cloud Computing – Which Career Will Dominate 2026? ☁️
Learn which IT career, cybersecurity or cloud computing, will dominate in 2026 and why it matters for your career choices
Medium · Cybersecurity
Up next
You Think Your Card Declined by Mistake? It Might Be a 2026 Scam
Tolulope Michael
Watch →