Regex is HARD!
Key Takeaways
ArjanCodes discusses the importance of using Regular Expressions (Regex) carefully in code, highlighting three reasons: potential for ReDoS attacks, difficulty in readability and maintainability, and possibility of false positives and negatives. He provides recommendations for safe Regex usage, including using pre-existing and validated Regex patterns and implementing additional input validation.
Full Transcript
I'm going to cover three reasons why you should be careful using reg access in your code regular expressions or reg xes are really powerful tool to help you see if a string matches a certain pattern they're widely used for validation of forms for example if you enter your email address somewhere then you can use a regx to check that it's actually a valid email address and lots of websites lots of applications are using that but if you use reg AIS wrong it can actually do way more harm than you might think by the way this is a new series that I'm starting of shorter bit more focused videos that I'm going to publish on Tuesdays we haven't really thought of a proper name yet for the time being I'm just going to call these Tuesday tips with the rist that actually we decide to no longer run these on Tuesdays and then we have to rename them but Tuesday tips let's stick with that if you have any other suggestion for how to name these videos feel free to post it in the comments the first reason reason why you should be careful with reg AIS is that you can trigger a so-called redos attack here I have an example of a simple main function that has an email address and it calls the function validate email and validate email what does that do well that's a uh simple function that matches a regular expression with a pattern in this case we're doing this for email addresses when we run the main file you see it's actually really fast 0.0022 seconds to perform the match but actually there's a huge problem with the RX that we're using in this particular case if I change the email address to something like this lots of A's let's save that so now you see when I run this code it doesn't actually stop at all so I have to cancel the program like this the reason that this happens is called backtracking and that means that if the input fails to match that the regx checking engine goes back to the previous position to try again and the engine will try this over and over again to explore all the possible paths and the particular regx that I'm using here is inefficient so it creates this really long Loop to go through all of the possibilities and because not everybody is nice on the internet we can't assume that everybody will put in nicely formatted email addresses in your form so you have to be careful and that also brings us to reason number two which is that readability and maintainability of reg xes is hard so here you see a couple of different reg xes for email addresses now before I continue pause the video and ask yourself which which one of these reg xes is actually a bad reg X is it one two or three the answer to the question is the first one is bad and reg xes 2 and three are good now if you didn't catch this don't feel bad about it because reg xes are actually really hard to read and it's really hard to guess which ones are good and which ones are bad here you can actually pretty clearly see the difference between the performance of these R xes so I have the evil pattern which is the blue line and we have the good pattern zero and good pattern one which uh you don't see this one because it's overlaid by the green one because they're both like really fast but you can see clear difference between a good pattern like this one and the bad pattern like this where if the length of the string increases the evil pattern the bad pattern actually increases a lot in time in order to compute that particular match so that has the possibility to make a big difference in performance of your application if you're using a lot of pattern matching the third reason that you need to be careful with reg xes is that they can lead to false positives and negatives for example if your email address validation Rex is not perfect it might actually let through invalid email addresses that are done stored in the database leading to all sorts of problems with people not being able to log in and especially if you use the email address later on your application you're going to assume that it's going to be correct so that can lead to all sorts of problems and then when your application crashes because of that it's going to be really hard to debug because it's it's not a problem in the code it's a problem in the data now of course this is something that you should take into account when you send an email and probably an email service will raise an error if for some reason the email address doesn't work but it can still be hard to figure out so in summary you have to be careful when using regx one small additional modification to the pattern can actually have a huge effect on performance so two recommendations the first is use a pre-existing and validated regx don't just randomly use something that you found on the internet or on stack Overflow or something make sure you're actually using using high quality reg xes the second thing you should do and that's specifically to avoid these redos rega Deni surface attacks is to make sure you have some additional input validation for example you could put a maximum on the string length to avoid these kinds of attacks and that way you have an extra line of defense that helps you avoid performance issues with regex validation I hope you enjoyed this short video for the little dashboard that I made here to compare different reg xes I actually used Dash and that's a really nice tool to create these sorts of dashboards if you want to learn more about how to build an app using Dash I did a video about that a while ago and you can watch that right here thanks for watching and see you soon
Original Description
💡 Learn how to design great software in 7 steps: https://arjan.codes/designguide.
Welcome to the first Tuesday Tips video! Ever wondered how to check if a string follows a specific pattern? That's where Regular Expressions, or Regex, come in. In this video, I'll dive into Regex and reveal some surprising ways it can be a double-edged sword.
🔥 GitHub repository: https://git.arjan.codes/2024/tuesday_tips/regex
🎓 ArjanCodes Courses: https://www.arjancodes.com/courses/
🔖 Chapters:
0:00 Intro
0:59 Reason 1: Risk of ReDoS attack
2:16 Reason 2: Readability and Maintainability is hard
3:26 Reason 3: False Positives and Negatives
4:10 Summary
#arjancodes #softwaredesign #python
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from ArjanCodes · ArjanCodes · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Full stack WEB DEVELOPMENT in 2021 - the ULTIMATE tech stack for FAST web app development
ArjanCodes
FROM PRODUCT IDEA TO SOFTWARE - turn your idea into reality in a few steps
ArjanCodes
Cohesion and Coupling: Write BETTER PYTHON CODE Part 1
ArjanCodes
Build a GLASSMORPHISM React Component - Typescript & Material-UI
ArjanCodes
Observer Pattern Tutorial: I NEVER Knew Events Were THIS Powerful 🚀
ArjanCodes
100% CODE COVERAGE - Think You're Done? Think AGAIN.☝
ArjanCodes
Two UNDERRATED Design Patterns 💡 Write BETTER PYTHON CODE Part 6
ArjanCodes
1000 Subscribers! 🚀 WHY I Started this Channel and WHAT'S NEXT
ArjanCodes
Channel Trailer ArjanCodes - March 2021
ArjanCodes
Exception Handling Tips in Python ⚠ Write Better Python Code Part 7
ArjanCodes
Monadic Error Handling in Python ⚠ Write Better Python Code Part 7B
ArjanCodes
GW BASIC Games I Wrote When I Was a Kid 🎮 Running 30 Year Old Code
ArjanCodes
Why You Should Think About SOFTWARE ARCHITECTURE in Python 💡
ArjanCodes
Uncle Bob’s SOLID Principles Made Easy 🍀 - In Python!
ArjanCodes
QUESTIONABLE Object Creation Patterns in Python 🤔
ArjanCodes
If You’re Not Using Python DATA CLASSES Yet, You Should 🚀
ArjanCodes
CODE ROAST: Yahtzee - New Python Code Refactoring Series!
ArjanCodes
7 UX Design Tips for Developers
ArjanCodes
Going All-in on Software Design in Python + an ANNOUNCEMENT 🎙
ArjanCodes
🎙 Interview with Sybren Stüvel, Developer @ Blender 3D
ArjanCodes
Do We Still Need Dataclasses? // PYDANTIC Tutorial
ArjanCodes
7 Python Mistakes That Instantly Expose Junior Developers
ArjanCodes
Answering Your Most Frequently Asked Python Questions // Q&A 07-2021
ArjanCodes
GitHub Copilot 🤖 The Future of Software Development?
ArjanCodes
More Python Code Smells: Avoid These 7 Smelly Snags
ArjanCodes
Test-Driven Development In Python // The Power of Red-Green-Refactor
ArjanCodes
5 Tips To Keep Technical Debt Under Control
ArjanCodes
Refactoring A Tower Defense Game In Python // CODE ROAST
ArjanCodes
The Factory Design Pattern is Obsolete in Python
ArjanCodes
Why the Plugin Architecture Gives You CRAZY Flexibility
ArjanCodes
Refactoring A Data Science Project Part 1 - Abstraction and Composition
ArjanCodes
Refactoring A Data Science Project Part 2 - The Information Expert
ArjanCodes
Refactoring A Data Science Project Part 3 - Configuration Cleanup
ArjanCodes
Purge These 7 Code Smells From Your Python Code
ArjanCodes
Running A Software Development YouTube Channel
ArjanCodes
Refactoring A PDF And Web Scraper Part 1 // CODE ROAST
ArjanCodes
Refactoring A PDF And Web Scraper Part 2 // CODE ROAST
ArjanCodes
How To Easily Do Asynchronous Programming With Asyncio In Python
ArjanCodes
The Software Designer Mindset
ArjanCodes
NEVER Worry About Data Science Projects Configs Again
ArjanCodes
Powerful VSCode Tips And Tricks For Python Development And Design
ArjanCodes
8 Python Coding Tips - From The Google Python Style Guide
ArjanCodes
What Is Encapsulation And Information Hiding?
ArjanCodes
8 Tips For Becoming A Senior Developer
ArjanCodes
Building A Custom Context Manager In Python: A Closer Look
ArjanCodes
GraphQL vs REST: What's The Difference And When To Use Which?
ArjanCodes
You Can Do Really Cool Things With Functions In Python
ArjanCodes
Announcing The Black VS Code Theme (Launching April 1st)
ArjanCodes
7 DevOps Best Practices For Launching A SaaS Platform
ArjanCodes
Refactoring a Rock Paper Scissors Lizard Spock Game // Code Roast Part 1
ArjanCodes
Refactoring a Rock Paper Scissors Lizard Spock Game // Part 2
ArjanCodes
Things Are Going To Change Around Here
ArjanCodes
Dependency Injection Explained In One Minute // Python Tips
ArjanCodes
How To Setup A MacBook Pro M1 For Software Development
ArjanCodes
A Simple & Effective Way To Improve Python Class Performance
ArjanCodes
How To Write Unit Tests For Existing Python Code // Part 1 of 2
ArjanCodes
How To Write Unit Tests For Existing Python Code // Part 2 of 2
ArjanCodes
Make Sure You Choose The Right Data Structure // Python Tips
ArjanCodes
5 Tips For Object-Oriented Programming Done Well - In Python
ArjanCodes
Next-Level Concurrent Programming In Python With Asyncio
ArjanCodes
More on: Systems Design Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Distributed Transactions in System Design: Why Data Consistency Becomes Hard Once Your Application…
Medium · Programming
Monolith vs Microservices: A Real-World Architectural Autopsy
Dev.to · Erwin Wilson Ceniza2
FOV in FPS Games: The Math Behind Field of View Settings
Dev.to · Alex Carter
How I Structured My Next.js 14 App Router Project — And Why It Scales
Dev.to · Mbanefo Emmanuel Ifechukwu
Chapters (5)
Intro
0:59
Reason 1: Risk of ReDoS attack
2:16
Reason 2: Readability and Maintainability is hard
3:26
Reason 3: False Positives and Negatives
4:10
Summary
🎓
Tutor Explanation
DeepCamp AI