Suffix array finding unique substrings

WilliamFiset · Beginner ·⚡ Algorithms & Data Structures ·8y ago

Key Takeaways

The video discusses using suffix arrays and LCP arrays to find and count unique substrings in a string, providing a space-efficient solution with improved time complexity.

Full Transcript

in this video I want to discuss a neat application of suffix arrays and LCP arrays and that is finding and counting unique sub strings there are a variety of interesting problems in computer science especially bioinformatics that require you to find all the unique substrings of a string the naive algorithm has a terrible time complexity of N squared which requires a lot of space the idea is to generate all the sub strings of the string and dump them inside a set a superior approach is to use the information stored inside the LC pra this provides not only a quick but also a space efficient solution I'm not saying that this is the canonical way of finding all unique sub strings because there exists other notable algorithms such as rabbin carp in combination with bloom filters let's now look at an example of how to find unique sub strings let's now look at an example of how to find all the uni substrings of the string a Z a Z a for every string there are exactly n times n plus 1 over 2 sub strings the proof of this I will leave as an exercise so listener but it's really not that hard to derive now notice that all the sub strings here there are a few duplicate ones I have highlighted the repeated sub strings there are exactly 6 of them and 9 unique ones now let's use the information inside the LCP array to figure out which of those sub strings really were the duplicate once in the table on the right I generated the LCP array for the string a a za za remember what the LCP array represents it represents that two adjacent suffixes of the original string shares certain Americ ters with each other so if the LCP value at a search index is say five and there are five characters in common between those two suffixes in other words there are five repeated sub strings between those two suffixes since they come from the same larger string so if we inspect the first LCP position at index one we see that has a value of one we know that the repeated string is the first character in a so we know that a is a repeated substring the next LCP value is three so there are three repeated substring between a z ay and a Z a Z a namely a AZ and ay z a the next interesting LCP values - for the last two suffixes so there are two repeated substrings here we can eliminate z and z a we can then come up with an interesting way of counting all unique substrings we know how many sub strings there are and that is n times n plus 1 over 2 and we also know the number of duplicate strings that is the sum of all the LCP values if this doesn't make immediate sense try out a few examples and play around with it if we go back to the example we just did with the string a Z a Z a and we set N equal to 5 which is the length of the string then we can get the correct answer of 9 by punching N equals 5 and then removing all the repeated substring values summed up in the LCP array alright so if you're looking for a programming challenge concerning counting substrings like I mentioned here you have the link to the cat is online judge for some practice I also included some source code for the suffix array and LCP are a available on github I get dot-com / William Fiza / - structures thank you for watching and in the next video we'll be talking about other applications of suffix arrays and LCP arrays stay tuned

Original Description

Related Videos: Suffix array intro: https://www.youtube.com/watch?v=zqKlL3ZpTqs Longest common prefix (LCP) array: https://www.youtube.com/watch?v=53VIWj8ksyI Counting unique substrings: https://www.youtube.com/watch?v=m2lZRmMjebw Longest common substring 1/2: https://www.youtube.com/watch?v=Ic80xQFWevc Longest common substring 2/2: https://www.youtube.com/watch?v=DTLjHSToxmo Longest repeated substring: https://www.youtube.com/watch?v=OptoHwC3D-Y Github link: https://github.com/williamfiset/algorithms Kattis link: https://open.kattis.com/problems/substrings My website: http://www.williamfiset.com =================================== Practicing for interviews? I have used, and recommend `Cracking the Coding Interview` which got me a job at Google. Link on Amazon: https://amzn.to/3cvMof5 A lot of the content on this channel is inspired by the book `Competitive Programming` by Steven Halim which I frequently use as a resource and reference. Link on Amazon: https://amzn.to/3wC2nix
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from WilliamFiset · WilliamFiset · 0 of 60

← Previous Next →
1 JES Image Manipulation - 2 - Installation
JES Image Manipulation - 2 - Installation
WilliamFiset
2 JES Image Manipulation - 3 - User Interface
JES Image Manipulation - 3 - User Interface
WilliamFiset
3 JES Image Manipulation - 5 - Negative
JES Image Manipulation - 5 - Negative
WilliamFiset
4 JES Image Manipulation - 6 - Black & White
JES Image Manipulation - 6 - Black & White
WilliamFiset
5 JES Image Manipulation - 4 - Grayscale
JES Image Manipulation - 4 - Grayscale
WilliamFiset
6 JES Image Manipulation - 8 - Blur
JES Image Manipulation - 8 - Blur
WilliamFiset
7 JES Image Manipulation - 7 - Edge Detection
JES Image Manipulation - 7 - Edge Detection
WilliamFiset
8 JES Image Manipulation - 9 - Blend
JES Image Manipulation - 9 - Blend
WilliamFiset
9 JES Image Manipulation - 10 - Matte
JES Image Manipulation - 10 - Matte
WilliamFiset
10 JES Image Manipulation - 13 - Rotate90
JES Image Manipulation - 13 - Rotate90
WilliamFiset
11 JES Image Manipulation - 12 - Mirroring Picture
JES Image Manipulation - 12 - Mirroring Picture
WilliamFiset
12 JES Image Manipulation - 11  - Crop Image
JES Image Manipulation - 11 - Crop Image
WilliamFiset
13 JES Image Manipulation - 14 - Stretch picture
JES Image Manipulation - 14 - Stretch picture
WilliamFiset
14 Java Fractal Explorer [6/8]
Java Fractal Explorer [6/8]
WilliamFiset
15 Java Fractal Explorer [4/8]
Java Fractal Explorer [4/8]
WilliamFiset
16 Java Fractal Explorer [8/8]
Java Fractal Explorer [8/8]
WilliamFiset
17 Java Fractal Explorer [5/8]
Java Fractal Explorer [5/8]
WilliamFiset
18 Java Fractal Explorer [2/8]
Java Fractal Explorer [2/8]
WilliamFiset
19 Java Fractal Explorer [7/8]
Java Fractal Explorer [7/8]
WilliamFiset
20 Java Fractal Explorer [1/8]
Java Fractal Explorer [1/8]
WilliamFiset
21 Java Fractal Explorer [3/8]
Java Fractal Explorer [3/8]
WilliamFiset
22 Introduction [Programming Competition Problems]
Introduction [Programming Competition Problems]
WilliamFiset
23 String Manipulation 1 [Programming Competition Problems]
String Manipulation 1 [Programming Competition Problems]
WilliamFiset
24 String Manipulation 2 [Programming Competition Problems]
String Manipulation 2 [Programming Competition Problems]
WilliamFiset
25 Graph Theory 1 [Programming Competition Problems]
Graph Theory 1 [Programming Competition Problems]
WilliamFiset
26 Logic 1 [Programming Competition Problems]
Logic 1 [Programming Competition Problems]
WilliamFiset
27 Grid Problems 1 [Programming Competition Problems]
Grid Problems 1 [Programming Competition Problems]
WilliamFiset
28 Dynamic Programming 1 [Programming Competition Problems]
Dynamic Programming 1 [Programming Competition Problems]
WilliamFiset
29 Introduction to Big-O
Introduction to Big-O
WilliamFiset
30 Dynamic and Static Arrays
Dynamic and Static Arrays
WilliamFiset
31 Dynamic Array Code
Dynamic Array Code
WilliamFiset
32 Linked Lists Introduction
Linked Lists Introduction
WilliamFiset
33 Doubly Linked List Code
Doubly Linked List Code
WilliamFiset
34 Stack Introduction
Stack Introduction
WilliamFiset
35 Stack Implementation
Stack Implementation
WilliamFiset
36 Stack Code
Stack Code
WilliamFiset
37 Queue Introduction
Queue Introduction
WilliamFiset
38 Queue Implementation
Queue Implementation
WilliamFiset
39 Queue Code
Queue Code
WilliamFiset
40 Priority Queue Introduction
Priority Queue Introduction
WilliamFiset
41 Priority Queue Min Heaps and Max Heaps
Priority Queue Min Heaps and Max Heaps
WilliamFiset
42 Priority Queue Inserting Elements
Priority Queue Inserting Elements
WilliamFiset
43 Priority Queue Removing Elements
Priority Queue Removing Elements
WilliamFiset
44 Priority Queue Code
Priority Queue Code
WilliamFiset
45 Union Find Introduction
Union Find Introduction
WilliamFiset
46 Union Find Kruskal's Algorithm
Union Find Kruskal's Algorithm
WilliamFiset
47 Union Find - Union and Find Operations
Union Find - Union and Find Operations
WilliamFiset
48 Union Find Path Compression
Union Find Path Compression
WilliamFiset
49 Union Find Code
Union Find Code
WilliamFiset
50 Binary Search Tree Introduction
Binary Search Tree Introduction
WilliamFiset
51 Binary Search Tree Insertion
Binary Search Tree Insertion
WilliamFiset
52 Binary Search Tree Removal
Binary Search Tree Removal
WilliamFiset
53 Binary Search Tree Traversals
Binary Search Tree Traversals
WilliamFiset
54 Binary Search Tree Code
Binary Search Tree Code
WilliamFiset
55 Fenwick Tree range queries
Fenwick Tree range queries
WilliamFiset
56 Fenwick Tree point updates
Fenwick Tree point updates
WilliamFiset
57 Fenwick Tree construction
Fenwick Tree construction
WilliamFiset
58 Fenwick tree source code
Fenwick tree source code
WilliamFiset
59 Hash table hash function
Hash table hash function
WilliamFiset
60 Hash table separate chaining
Hash table separate chaining
WilliamFiset

This video teaches how to use suffix arrays and LCP arrays to efficiently find and count unique substrings in a string, which is useful in various applications such as bioinformatics. The approach provides a significant improvement over the naive algorithm in terms of time and space complexity.

Key Takeaways
  1. Generate all suffixes of the input string
  2. Create an LCP array to store the lengths of common prefixes between adjacent suffixes
  3. Use the LCP array to identify and count duplicate substrings
  4. Calculate the total number of unique substrings by subtracting the number of duplicates from the total number of substrings
💡 The LCP array provides a space-efficient way to store information about duplicate substrings, allowing for efficient counting of unique substrings.

Related AI Lessons

Bloom Filters, Explained Properly
Learn how Bloom filters work and their benefits, including tiny memory and blazing speed, in exchange for potential false positives.
Dev.to · Daksh Gargas
Prefix Sums: The Preprocessing Trick That Makes Range Queries Instant
Learn how prefix sums enable instant range queries in arrays, boosting performance in various applications
Medium · Programming
I Thought I Was Ready for the Interview — Then One Simple Math Question Destroyed Me
A simple math question can destroy a developer's interview, highlighting the importance of being prepared for unexpected questions
Medium · Programming
Week 2(Day 10): LeetCode Two Pointers(slow & fast): Remove Duplicates from Sorted Array (Brute…
Learn to remove duplicates from a sorted array using the two pointers technique, improving from brute force to optimized solutions
Medium · Python
Up next
Stump Grinder Carbide Wheel Grinds Hardwood To Chips
Innoforge Studio
Watch →