Regex is HARD!

ArjanCodes · Beginner ·🏗️ Systems Design & Architecture ·2y ago

Key Takeaways

ArjanCodes discusses the importance of using Regular Expressions (Regex) carefully in code, highlighting three reasons: potential for ReDoS attacks, difficulty in readability and maintainability, and possibility of false positives and negatives. He provides recommendations for safe Regex usage, including using pre-existing and validated Regex patterns and implementing additional input validation.

Full Transcript

I'm going to cover three reasons why you should be careful using reg access in your code regular expressions or reg xes are really powerful tool to help you see if a string matches a certain pattern they're widely used for validation of forms for example if you enter your email address somewhere then you can use a regx to check that it's actually a valid email address and lots of websites lots of applications are using that but if you use reg AIS wrong it can actually do way more harm than you might think by the way this is a new series that I'm starting of shorter bit more focused videos that I'm going to publish on Tuesdays we haven't really thought of a proper name yet for the time being I'm just going to call these Tuesday tips with the rist that actually we decide to no longer run these on Tuesdays and then we have to rename them but Tuesday tips let's stick with that if you have any other suggestion for how to name these videos feel free to post it in the comments the first reason reason why you should be careful with reg AIS is that you can trigger a so-called redos attack here I have an example of a simple main function that has an email address and it calls the function validate email and validate email what does that do well that's a uh simple function that matches a regular expression with a pattern in this case we're doing this for email addresses when we run the main file you see it's actually really fast 0.0022 seconds to perform the match but actually there's a huge problem with the RX that we're using in this particular case if I change the email address to something like this lots of A's let's save that so now you see when I run this code it doesn't actually stop at all so I have to cancel the program like this the reason that this happens is called backtracking and that means that if the input fails to match that the regx checking engine goes back to the previous position to try again and the engine will try this over and over again to explore all the possible paths and the particular regx that I'm using here is inefficient so it creates this really long Loop to go through all of the possibilities and because not everybody is nice on the internet we can't assume that everybody will put in nicely formatted email addresses in your form so you have to be careful and that also brings us to reason number two which is that readability and maintainability of reg xes is hard so here you see a couple of different reg xes for email addresses now before I continue pause the video and ask yourself which which one of these reg xes is actually a bad reg X is it one two or three the answer to the question is the first one is bad and reg xes 2 and three are good now if you didn't catch this don't feel bad about it because reg xes are actually really hard to read and it's really hard to guess which ones are good and which ones are bad here you can actually pretty clearly see the difference between the performance of these R xes so I have the evil pattern which is the blue line and we have the good pattern zero and good pattern one which uh you don't see this one because it's overlaid by the green one because they're both like really fast but you can see clear difference between a good pattern like this one and the bad pattern like this where if the length of the string increases the evil pattern the bad pattern actually increases a lot in time in order to compute that particular match so that has the possibility to make a big difference in performance of your application if you're using a lot of pattern matching the third reason that you need to be careful with reg xes is that they can lead to false positives and negatives for example if your email address validation Rex is not perfect it might actually let through invalid email addresses that are done stored in the database leading to all sorts of problems with people not being able to log in and especially if you use the email address later on your application you're going to assume that it's going to be correct so that can lead to all sorts of problems and then when your application crashes because of that it's going to be really hard to debug because it's it's not a problem in the code it's a problem in the data now of course this is something that you should take into account when you send an email and probably an email service will raise an error if for some reason the email address doesn't work but it can still be hard to figure out so in summary you have to be careful when using regx one small additional modification to the pattern can actually have a huge effect on performance so two recommendations the first is use a pre-existing and validated regx don't just randomly use something that you found on the internet or on stack Overflow or something make sure you're actually using using high quality reg xes the second thing you should do and that's specifically to avoid these redos rega Deni surface attacks is to make sure you have some additional input validation for example you could put a maximum on the string length to avoid these kinds of attacks and that way you have an extra line of defense that helps you avoid performance issues with regex validation I hope you enjoyed this short video for the little dashboard that I made here to compare different reg xes I actually used Dash and that's a really nice tool to create these sorts of dashboards if you want to learn more about how to build an app using Dash I did a video about that a while ago and you can watch that right here thanks for watching and see you soon

Original Description

💡 Learn how to design great software in 7 steps: https://arjan.codes/designguide. Welcome to the first Tuesday Tips video! Ever wondered how to check if a string follows a specific pattern? That's where Regular Expressions, or Regex, come in. In this video, I'll dive into Regex and reveal some surprising ways it can be a double-edged sword. 🔥 GitHub repository: https://git.arjan.codes/2024/tuesday_tips/regex 🎓 ArjanCodes Courses: https://www.arjancodes.com/courses/ 🔖 Chapters: 0:00 Intro 0:59 Reason 1: Risk of ReDoS attack 2:16 Reason 2: Readability and Maintainability is hard 3:26 Reason 3: False Positives and Negatives 4:10 Summary #arjancodes #softwaredesign #python
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from ArjanCodes · ArjanCodes · 0 of 60

← Previous Next →
1 Full stack WEB DEVELOPMENT in 2021 - the ULTIMATE tech stack for FAST web app development
Full stack WEB DEVELOPMENT in 2021 - the ULTIMATE tech stack for FAST web app development
ArjanCodes
2 FROM PRODUCT IDEA TO SOFTWARE - turn your idea into reality in a few steps
FROM PRODUCT IDEA TO SOFTWARE - turn your idea into reality in a few steps
ArjanCodes
3 Cohesion and Coupling: Write BETTER PYTHON CODE Part 1
Cohesion and Coupling: Write BETTER PYTHON CODE Part 1
ArjanCodes
4 Build a GLASSMORPHISM React Component - Typescript & Material-UI
Build a GLASSMORPHISM React Component - Typescript & Material-UI
ArjanCodes
5 Observer Pattern Tutorial: I NEVER Knew Events Were THIS Powerful 🚀
Observer Pattern Tutorial: I NEVER Knew Events Were THIS Powerful 🚀
ArjanCodes
6 100% CODE COVERAGE - Think You're Done? Think AGAIN.☝
100% CODE COVERAGE - Think You're Done? Think AGAIN.☝
ArjanCodes
7 Two UNDERRATED Design Patterns 💡 Write BETTER PYTHON CODE Part 6
Two UNDERRATED Design Patterns 💡 Write BETTER PYTHON CODE Part 6
ArjanCodes
8 1000 Subscribers! 🚀 WHY I Started this Channel and WHAT'S NEXT
1000 Subscribers! 🚀 WHY I Started this Channel and WHAT'S NEXT
ArjanCodes
9 Channel Trailer ArjanCodes - March 2021
Channel Trailer ArjanCodes - March 2021
ArjanCodes
10 Exception Handling Tips in Python ⚠ Write Better Python Code Part 7
Exception Handling Tips in Python ⚠ Write Better Python Code Part 7
ArjanCodes
11 Monadic Error Handling in Python ⚠ Write Better Python Code Part 7B
Monadic Error Handling in Python ⚠ Write Better Python Code Part 7B
ArjanCodes
12 GW BASIC Games I Wrote When I Was a Kid 🎮 Running 30 Year Old Code
GW BASIC Games I Wrote When I Was a Kid 🎮 Running 30 Year Old Code
ArjanCodes
13 Why You Should Think About SOFTWARE ARCHITECTURE in Python 💡
Why You Should Think About SOFTWARE ARCHITECTURE in Python 💡
ArjanCodes
14 Uncle Bob’s SOLID Principles Made Easy 🍀 - In Python!
Uncle Bob’s SOLID Principles Made Easy 🍀 - In Python!
ArjanCodes
15 QUESTIONABLE Object Creation Patterns in Python 🤔
QUESTIONABLE Object Creation Patterns in Python 🤔
ArjanCodes
16 If You’re Not Using Python DATA CLASSES Yet, You Should 🚀
If You’re Not Using Python DATA CLASSES Yet, You Should 🚀
ArjanCodes
17 CODE ROAST: Yahtzee - New Python Code Refactoring Series!
CODE ROAST: Yahtzee - New Python Code Refactoring Series!
ArjanCodes
18 7 UX Design Tips for Developers
7 UX Design Tips for Developers
ArjanCodes
19 Going All-in on Software Design in Python + an ANNOUNCEMENT 🎙
Going All-in on Software Design in Python + an ANNOUNCEMENT 🎙
ArjanCodes
20 🎙 Interview with Sybren Stüvel, Developer @ Blender 3D
🎙 Interview with Sybren Stüvel, Developer @ Blender 3D
ArjanCodes
21 Do We Still Need Dataclasses? // PYDANTIC Tutorial
Do We Still Need Dataclasses? // PYDANTIC Tutorial
ArjanCodes
22 7 Python Mistakes That Instantly Expose Junior Developers
7 Python Mistakes That Instantly Expose Junior Developers
ArjanCodes
23 Answering Your Most Frequently Asked Python Questions // Q&A 07-2021
Answering Your Most Frequently Asked Python Questions // Q&A 07-2021
ArjanCodes
24 GitHub Copilot 🤖 The Future of Software Development?
GitHub Copilot 🤖 The Future of Software Development?
ArjanCodes
25 More Python Code Smells: Avoid These 7 Smelly Snags
More Python Code Smells: Avoid These 7 Smelly Snags
ArjanCodes
26 Test-Driven Development In Python // The Power of Red-Green-Refactor
Test-Driven Development In Python // The Power of Red-Green-Refactor
ArjanCodes
27 5 Tips To Keep Technical Debt Under Control
5 Tips To Keep Technical Debt Under Control
ArjanCodes
28 Refactoring A Tower Defense Game In Python // CODE ROAST
Refactoring A Tower Defense Game In Python // CODE ROAST
ArjanCodes
29 The Factory Design Pattern is Obsolete in Python
The Factory Design Pattern is Obsolete in Python
ArjanCodes
30 Why the Plugin Architecture Gives You CRAZY Flexibility
Why the Plugin Architecture Gives You CRAZY Flexibility
ArjanCodes
31 Refactoring A Data Science Project Part 1 - Abstraction and Composition
Refactoring A Data Science Project Part 1 - Abstraction and Composition
ArjanCodes
32 Refactoring A Data Science Project Part 2 - The Information Expert
Refactoring A Data Science Project Part 2 - The Information Expert
ArjanCodes
33 Refactoring A Data Science Project Part 3 - Configuration Cleanup
Refactoring A Data Science Project Part 3 - Configuration Cleanup
ArjanCodes
34 Purge These 7 Code Smells From Your Python Code
Purge These 7 Code Smells From Your Python Code
ArjanCodes
35 Running A Software Development YouTube Channel
Running A Software Development YouTube Channel
ArjanCodes
36 Refactoring A PDF And Web Scraper Part 1 // CODE ROAST
Refactoring A PDF And Web Scraper Part 1 // CODE ROAST
ArjanCodes
37 Refactoring A PDF And Web Scraper Part 2 // CODE ROAST
Refactoring A PDF And Web Scraper Part 2 // CODE ROAST
ArjanCodes
38 How To Easily Do Asynchronous Programming With Asyncio In Python
How To Easily Do Asynchronous Programming With Asyncio In Python
ArjanCodes
39 The Software Designer Mindset
The Software Designer Mindset
ArjanCodes
40 NEVER Worry About Data Science Projects Configs Again
NEVER Worry About Data Science Projects Configs Again
ArjanCodes
41 Powerful VSCode Tips And Tricks For Python Development And Design
Powerful VSCode Tips And Tricks For Python Development And Design
ArjanCodes
42 8 Python Coding Tips - From The Google Python Style Guide
8 Python Coding Tips - From The Google Python Style Guide
ArjanCodes
43 What Is Encapsulation And Information Hiding?
What Is Encapsulation And Information Hiding?
ArjanCodes
44 8 Tips For Becoming A Senior Developer
8 Tips For Becoming A Senior Developer
ArjanCodes
45 Building A Custom Context Manager In Python: A Closer Look
Building A Custom Context Manager In Python: A Closer Look
ArjanCodes
46 GraphQL vs REST: What's The Difference And When To Use Which?
GraphQL vs REST: What's The Difference And When To Use Which?
ArjanCodes
47 You Can Do Really Cool Things With Functions In Python
You Can Do Really Cool Things With Functions In Python
ArjanCodes
48 Announcing The Black VS Code Theme (Launching April 1st)
Announcing The Black VS Code Theme (Launching April 1st)
ArjanCodes
49 7 DevOps Best Practices For Launching A SaaS Platform
7 DevOps Best Practices For Launching A SaaS Platform
ArjanCodes
50 Refactoring a Rock Paper Scissors Lizard Spock Game // Code Roast Part 1
Refactoring a Rock Paper Scissors Lizard Spock Game // Code Roast Part 1
ArjanCodes
51 Refactoring a Rock Paper Scissors Lizard Spock Game // Part 2
Refactoring a Rock Paper Scissors Lizard Spock Game // Part 2
ArjanCodes
52 Things Are Going To Change Around Here
Things Are Going To Change Around Here
ArjanCodes
53 Dependency Injection Explained In One Minute // Python Tips
Dependency Injection Explained In One Minute // Python Tips
ArjanCodes
54 How To Setup A MacBook Pro M1 For Software Development
How To Setup A MacBook Pro M1 For Software Development
ArjanCodes
55 A Simple & Effective Way To Improve Python Class Performance
A Simple & Effective Way To Improve Python Class Performance
ArjanCodes
56 How To Write Unit Tests For Existing Python Code // Part 1 of 2
How To Write Unit Tests For Existing Python Code // Part 1 of 2
ArjanCodes
57 How To Write Unit Tests For Existing Python Code // Part 2 of 2
How To Write Unit Tests For Existing Python Code // Part 2 of 2
ArjanCodes
58 Make Sure You Choose The Right Data Structure // Python Tips
Make Sure You Choose The Right Data Structure // Python Tips
ArjanCodes
59 5 Tips For Object-Oriented Programming Done Well - In Python
5 Tips For Object-Oriented Programming Done Well - In Python
ArjanCodes
60 Next-Level Concurrent Programming In Python With Asyncio
Next-Level Concurrent Programming In Python With Asyncio
ArjanCodes

This video teaches the importance of careful Regex usage in code, highlighting potential security threats and performance issues. Viewers learn how to use Regex safely and effectively.

Key Takeaways
  1. Use pre-existing and validated Regex patterns
  2. Implement additional input validation
  3. Test Regex patterns for performance and security
  4. Use tools like Dash to compare and visualize Regex patterns
💡 Small modifications to Regex patterns can have significant effects on performance and security, making it crucial to test and validate patterns carefully.

Related AI Lessons

Distributed Transactions in System Design: Why Data Consistency Becomes Hard Once Your Application…
Learn how distributed transactions impact data consistency in system design and why it's crucial for scalable applications
Medium · Programming
Monolith vs Microservices: A Real-World Architectural Autopsy
Learn to decide between monolith and microservices architectures for your project and why it matters for scalability and maintainability
Dev.to · Erwin Wilson Ceniza2
FOV in FPS Games: The Math Behind Field of View Settings
Learn the math behind Field of View settings in FPS games and how to optimize your gameplay experience
Dev.to · Alex Carter
How I Structured My Next.js 14 App Router Project — And Why It Scales
Learn how to structure a scalable Next.js 14 App Router project for better organization and maintainability
Dev.to · Mbanefo Emmanuel Ifechukwu

Chapters (5)

Intro
0:59 Reason 1: Risk of ReDoS attack
2:16 Reason 2: Readability and Maintainability is hard
3:26 Reason 3: False Positives and Negatives
4:10 Summary
Up next
Retracing It All With My Son
Ginny Clarke
Watch →