String Compression II - Leetcode 1531 - Python
Skills:
Systems Design Basics60%
Key Takeaways
The video covers the String Compression II problem on Leetcode 1531, using a greedy algorithm and recursive function with memorization to compress strings by deleting at most K characters.
Full Transcript
hey everyone welcome back and let's write some more neat code today so today let's solve the problem string compression 2 I'm not going to lie this is definitely a really hard problem easily the hardest of the month so don't feel bad if you are not able to solve it the premise of the problem is that we can compress strings by taking multiple consecutive characters that are the same and condensing them into the character and an integer so this tells us that we have three consecutive A's if we had four consecutive A's we would do this so you can see that we can take a long string like this one like six characters maybe and then condense it into something that looks like this so just two characters now we could also condense two characters down to A2 but that's not really an improvement and when it comes to single characters we're not going to condense that like into A1 because that would be doing the opposite we're actually making it longer there are a couple things that are worth considering for for example if we had 10 a characters that would then be condensed to this A10 so it is condensed down like to less characters but this is three characters and suppose we even had like a 100 a characters then we would make it a 100 so that's four digits this is as large as we have to worry about in this problem because I think we're guaranteed that the input string is going to be less than or equal to 100 characters but this is part of what makes this problem hard there's a lot of little things like this that we're going to have to consider so now that we kind of know the background on compression what exactly are we supposed to do in this problem consider this input string and consider the integer k equals 2 we are allowed to delete at most K characters from the string but we don't have to eliminate K characters we could eliminate one if we wanted to but in this example we can eliminate at most two characters so let's say we did this and this which just happens to be the answer but why is this the answer because now if we take this string and then compress it we end up with A3 C3 so the length of the string now after it's compressed is four this is the deletions that we do to minimize the compressed string if we deleted a couple other characters if we deleted C and A we would end up with A2 B C2 and then D so this time the length of the compressed string is six it's not minimized so what we want to do is basically do the deletions such that we minimize the string after it's been compressed and then return the length of that compressed string my mind initially went to a greedy solution because you probably want to delete like at least from this example it's kind of of telling us you probably want to delete the Standalone characters that don't have like multiple consecutive characters that are the same because we know that like this Chunk we could delete from it but it's not going to reduce the size we're just taking it from being A3 down to A2 so maybe we should always just delete characters like this that might be a greedy way to do this unfortunately that won't always work and the reason is cuz doing it that way we could maybe use a heap we could maybe sort these based on the counts it wouldn't work because what if this block wasn't all C's what if it was A's that type of greedy solution does not consider that when we delete a character like this we're actually creating a new long block of consecutive characters so for us to be able to solve this problem as we like go over this string we are going to need to kind of compare consecutive characters and also we're going to need to know the length of the string after it's compressed so that makes our job pretty hard so now let's kind of think about this from a Brute Force perspective even that is not easy I'm going to admit so that would kind of mean making decisions and the decision as we let's say iterate over this string from left to right that's usually a good first approach and if that doesn't work we can kind of think about it later but I'll tell you that that probably is going to work in this case now let's just go iterate over character by character and consider let's delete this character or let's not delete this character so that's a very naive way to think about it but it technically does work but if we do do that and I'm not even going to try to draw the decision Tree in that case because it's going to get really complicated but if we do do that one thing we have to worry about is still maintaining the lengths of previous consecutive characters so that we can determine the total length and for us to know about those consecutive characters we should as we like move our pointer ey which is going to tell us what the current character we're at we should probably compare it with the previous character just to know if like our streak of characters has increased has it gone from two up to three and the edge case that I mentioned earlier where it goes from let's say three consecutive characters all the way up to 10 that's special because now the length of the compressed string has increased cuz we went from like A9 up to a a 10 which is another digit in the string now another observation that I kind of want to make look at this streak over here we consider the case where we delete the first a then we consider the case where we delete the second a and it's fine when we're deleting both of these but why would we delete the first a that's one possibility and another possibility is save the first a and then delete the second a these aren't different in any way even if this a didn't exist these are not different in any way so what does that tell us well I think we'd probably pass the solution either way but that tells me that when our pointer is here we're at the first a we can delete it fine okay now when our pointer gets here and we're always going to compare we're going to look back and check okay it's the same as the previous character so we know we already considered the case where we deleted an A and while we do have a deletion left we can consider the case where we delete both A's but we don't need to consider the case where we only delete one a and the way we will be able to detect that is we're going to actually be keeping track of four parameters one of course is K the number of deletions we have left the second one is going to be I the current index that we're at another is going to be the previous character of course for us to compare it and second is going to be the previous count so the number of consecutive times that this previous character has shown up before so with these four variables the four parameters when we get here with I at this character we will know whether we deleted the previous one or not like in that Branch if the previous character is actually now set to an empty string because if if we delete this then when we shift our ey pointer here we can't say oh the previous character was actually a no we deleted a so we're not going to tell the next recursive function called that there was a previous character we're going to tell it there was nothing there there was an empty string and the count of that is zero I know this is really really abstract but it's going to make a lot more sense when we probably look at the code solution but for now I think the last thing I'm going to mention is that we're going to have two main cases in our recursive function and eventually we're going to add memorization to it using those four variables I just showed you but most of that will make more sense in the code but the two cases we're going to have is when the adjacent characters are the same and when they're not the same if they are the same that means we are now counting a streak like we're at this character we looked at these two they're the same or maybe we're at this character these two are the same and in this case we are actually not going to delete because believe it or not in this case there's no need for us to consider the deletion in the other case where we find that the two adjacent characters are not the same that is when we do consider the deletion so for example when I is at the first character the previous character we will say is an empty string in this case they are different so we consider both possibilities one where we keep this character and one where we delete this character next when we get to this guy we will reach him in two different ways one where we have deleted this in which case the previous character is an empty string and the other case where we did not delete it and these two characters are the same in the first one here we will do two possibilities we will delete the character and we will keep the character so it does give us all the possibilities now also in the case where the two characters are not the same if if we were to keep this then we call the recursive function on the rest of this string and of course they will know that this character does exist but we will then say one plus the recursive call which I think I'm just going to call count or maybe DFS so in this case when we keep the character we will do that plus one you might be thinking well if we do that plus one every single time we keep a character for example here when we keep this we do the plus one that'll work because this compressed is going to be A2 but when we get to the third character this compressed is going to be A3 if we do a plus one three times that's probably not going to work but that's why we're only going to do the plus one in the case where two adjacent characters are not the same when they are the same you're going to see how I'm going to handle that in just a second but now let's get into the code okay so I'm going to start out with just the Brute Force solution without any caching so I'm going to have a helper function I'm going to call it count and it's going to take four parameters I of course the index that we're at K the number of deletions we have left previous the previous character and previous count the number of times that previous character showed up consecutively so before we implement this let us actually call it so we know how it's going to be called it's going to start at index zero K is going to initially be just what K was as the input parameter previous is initially just going to be an empty string and the count of that is just going to be zero okay now before I even get to the base case I'm actually just going to implement the recursive case so I talked about how there are two main things that we're looking for are the two consecutive characters the same or are they different the case where they're different I think probably makes more sense so I'm going to start with that there are two possibilities one where we delete and the one where we don't delete so let's Implement both of those how would the recursive function be called in that case if we're deleting well we're definitely going to be at the index I + one the number of deletions we have left is going to be decremented by one the previous character is going to actually stay the same because when we're deleting we're actually deleting the character at s of I so whatever the previous character was even if it was the empty string is going to stay the same if we deleted the current character and the previous count of that is also going to stay the same if we were to delete the current character what about the other case where we do not delete well we're still going to be at I + 1 K is going to stay the same in that case and now the previous character is actually going to be S at index I because we didn't delete it so now it became the previous character and the count of it is going to be one because we know that the current character is not the same as the previous character so the streak is just starting and also if we were not to delete that character we would do a plus one like I talked about earlier because this is a new character it's the start of a new streak so whatever the result of like the length of the optimal compression is is going to have one more character now in it plus whatever the remaining compression ends up being so among these two we want to get the minimum of them so I'm going to do that just like this passing them both in um not forgetting to put a comma here cuz they're comma separated and I'm going to assign this to to a variable result this is the entire case where the two adjacent values are not the same now let's consider this case first of all what would the recursive call even look like let's just start with that well we're going to have I + 1 as per usual we're not doing any deletion so K is staying the same and the current character and previous character were the same so here we can either put previous or S ofi I guess I'll do s at index I and previous count is going to be incremented by one cuz the streak just increased I guess I can rename this to previous if that makes it easy to understand but is this enough do we just now set the result equal to this well doesn't the length of the streak tell us anything it kind of tells us how much we should increment this by like here we had the plus one because we knew that was the start of a new sequence but now the sequence is going to be longer than just one like that's guaranteed it could be two it could be three it could be 10 we don't know so a naive way would be to do something like this to convert previous count + one into a string and then get the length of it for example if we had 10 for example is previous count plus one if we convert it to a string and get the length of it we'll say okay there's two digits and then we could add that length here but notice how this is recursive like if we had add 100 or even just 10 consecutive characters in a row we would end up adding that addition here multiple times so that's not a good way to go about it a better way would be just to detect every time we have increased the number of digits and that happens when we go from one consecutive character to two because we get a changes to be A2 or when we go from 9 to 10 because then we go from like A9 to A10 we increased by one lastly when we go from 99 up until 100 cuz then we go from A99 to a 100 we're increasing the digit by one once again best way to detect that is check if a previous count is in any of these values in this array it's either 1 or 9 or 99 and in that case we can set our incrementer equal to one otherwise we set the increment equal to zero the reason we're setting it to one even if it were to be 99 is because we know we're incrementing it by one every time it's any of these values so once we get to 99 we already know that we already incremented it for these two other cases so that's why here the increment we're going to add it down here this is not easy stuff I know for sure it's not easy to understand at all that's why this is a hard problem it's not really expected for most people to come up with these on their own so this is how we're handling Computing the lengths of streaks and also again the reason why we don't delete here is because we already considered the case where we delete at the beginning of a streak which would be in the else case we don't need to delete in the middle of a streak and so down here now that we've done that we can return the result but last thing what about the base cases well one obvious base case is if I is equal to the length which means it's gone out of bounds we have an empty string what is the length of that of course it's going to be zero so we can return zero in that case another base case might be if K is equal to Zer but actually it's okay if K is equal to zero because just because we have no more deletions left doesn't mean we can't count the length of the remaining portion of the string so this does not qualify as a base case also remember that it's not a requirement that k equals z that's why we don't have to put like an additional guard here that K is equal to Z before we return but there is one base case we do have to consider if k equals z then down here we're allowed to execute this we're also allowed to execute this one but we're not allowed to execute this one so how do we do that we can add some if statements and stuff that make it complicated or we can say if K ever becomes less than zero then return a value so big that this possibility is probably already negated so if we ever were to call this here and K went negative then we'd end up returning Infinity so it would basically disqualify that call anyway and it's important to put this one first actually cuz what if both of these were the case we just went out of bounds and K became negative well we don't want to end up accidentally returning zero so let's make sure to put this one first there we go and last thing I want to do is now add caching so I usually like to do that with a hashmap so here we're basically going to consider these four values as the key of our hashmap so here I'm going to check another base case if this is in the cach let's just return that from the cach lastly let's not forget to add it to the cache so down here right before we return the result I'm going to make sure to cach the result so we can then return it from up here now this is the entire code it's not easy at all to come up with it's not easy to understand but I hope you learned something from it let's run this to make sure that it works and as you can see on the left yes it does though the percentage isn't great and I think that's just because most people implemented this with the bottom up approach though I think the time complexity is the same let's actually talk about that now here you can see within here there's no like Loops or anything so technically a function call is going to be constant but how many times are we going to make a unique function call well we can count that by taking I which is going to be the length of s so that's going to give us an N term here K which is also like the parameter that's how many possible values this might end up having and previous which is going to be 26 unique characters well I guess 27 if we consider like the empty string here so that doesn't really do anything we could put a 26 or 27 here but that's not going to change the complexity so I won't do that lastly previous count this could also end up being as large as the input string itself so we can add another n term here so basically the time complexity is n^2 * K same for the memory complexity thanks for watching if you found this helpful please like And subscribe if you're preparing for coding interviews check out NE code. thanks for watching and I'll see you soon
Original Description
🚀 https://neetcode.io/ - A better way to prepare for Coding Interviews
🧑💼 LinkedIn: https://www.linkedin.com/in/navdeep-singh-3aaa14161/
🥷 Discord: https://discord.gg/ddjKRXPqtk
🐦 Twitter: https://twitter.com/neetcode1
🐮 Support the channel: https://www.patreon.com/NEETcode
⭐ BLIND-75 PLAYLIST: https://www.youtube.com/watch?v=KLlXCFG5TnA&list=PLot-Xpze53ldVwtstag2TL4HQhAnC8ATf
💡 DYNAMIC PROGRAMMING PLAYLIST: https://www.youtube.com/watch?v=73r3KWiEvyk&list=PLot-Xpze53lcvx_tjrr_m2lgD2NsRHlNO&index=1
Problem Link: https://leetcode.com/problems/string-compression-ii/
0:00 - Read the problem
0:12 - Drawing Explanation
10:19 - Coding Explanation
leetcode 1531
#neetcode #leetcode #python
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from NeetCodeIO · NeetCodeIO · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Leetcode 149 - Maximum Points on a Line - Python
NeetCodeIO
Design Linked List - Leetcode 707 - Python
NeetCodeIO
Minimum Time to Collect All Apples in a Tree - Leetcode 1443 - Python
NeetCodeIO
Design Browser History - Leetcode 1472 - Python
NeetCodeIO
Number of Good Paths - Leetcode 2421 - Python
NeetCodeIO
Flip String to Monotone Increasing - Leetcode 926 - Python
NeetCodeIO
Maximum Sum Circular Subarray - Leetcode 918 - Python
NeetCodeIO
Find Closest Node to Given Two Nodes - Leetcode 2359 - Python
NeetCodeIO
Concatenated Words - Leetcode 472 - Python
NeetCodeIO
Data Stream as Disjoint Intervals - Leetcode 352 - Python
NeetCodeIO
LFU Cache - Leetcode 460 - Python
NeetCodeIO
N-th Tribonacci Number - Leetcode 1137
NeetCodeIO
Best Team with no Conflicts - Leetcode 1626 - Python
NeetCodeIO
Greatest Common Divisor of Strings - Leetcode 1071 - Python
NeetCodeIO
Shortest Path in a Binary Matrix - Leetcode 1091 - Python
NeetCodeIO
Insert into a Binary Search Tree - Leetcode 701 - Python
NeetCodeIO
Delete Node in a BST - Leetcode 450 - Python
NeetCodeIO
Shuffle the Array (Constant Space) - Leetcode 1470 - Python
NeetCodeIO
Fruits into Basket - Leetcode 904 - Python
NeetCodeIO
Number of Subarrays of size K and Average Greater than or Equal to Threshold - Leetcode 1343 Python
NeetCodeIO
Naming a Company - Leetcode 2306 - Python
NeetCodeIO
As Far from Land as Possible - Leetcode 1162 - Python
NeetCodeIO
Shortest Path with Alternating Colors - Leetcode 1129 - Python
NeetCodeIO
Minimum Fuel Cost to Report to the Capital - Leetcode 2477 - Python
NeetCodeIO
Count Odd Numbers in an Interval Range - Leetcode 1523 - Python
NeetCodeIO
Contains Duplicate II - Leetcode 219 - Python
NeetCodeIO
Path with Maximum Probability - Leetcode 1514 - Python
NeetCodeIO
Add to Array-Form of Integer - Leetcode 989 - Python
NeetCodeIO
Unique Paths II - Leetcode 63 - Python
NeetCodeIO
Minimum Distance between BST Nodes - Leetcode 783 - Python
NeetCodeIO
Design Hashmap - Leetcode 706 - Python
NeetCodeIO
Range Sum Query Immutable - Leetcode 303 - Python
NeetCodeIO
Binary Tree Zigzag Level Order Traversal - Leetcode 103 - Python
NeetCodeIO
Middle of the Linked List - Leetcode 876 - Python
NeetCodeIO
Course Schedule IV - Leetcode 1462 - Python
NeetCodeIO
Single Element in a Sorted Array - Leetcode 540 - Python
NeetCodeIO
Capacity to Ship Packages - Leetcode 1011 - Python
NeetCodeIO
IPO - Leetcode 502 - Python
NeetCodeIO
Minimize Deviation in Array - Leetcode 1675 - Python
NeetCodeIO
Longest Turbulent Array - Leetcode 978 - Python
NeetCodeIO
Last Stone Weight II - Leetcode 1049 - Python
NeetCodeIO
Construct Quad Tree - Leetcode 427 - Python
NeetCodeIO
Find Duplicate Subtrees - Leetcode 652 - Python
NeetCodeIO
Sort an Array - Leetcode 912 - Python
NeetCodeIO
Ones and Zeroes - Leetcode 474 - Python
NeetCodeIO
Remove Duplicates from Sorted Array II - Leetcode 80 - Python
NeetCodeIO
Maximum Twin Sum of a Linked List - Leetcode 2130 - Python
NeetCodeIO
Concatenation of Array - Leetcode 1929 - Python
NeetCodeIO
Symmetric Tree - Leetcode 101 - Python
NeetCodeIO
Check Completeness of a Binary Tree - Leetcode 958 - Python
NeetCodeIO
Construct Binary Tree from Inorder and Postorder Traversal - Leetcode 106 - Python
NeetCodeIO
Find Peak Element - Leetcode 162 - Python
NeetCodeIO
Accounts Merge - Leetcode 721 - Python
NeetCodeIO
Binary Tree Preorder Traversal (Iterative) - Leetcode 144 - Python
NeetCodeIO
Binary Tree Postorder Traversal (Iterative) - Leetcode 145 - Python
NeetCodeIO
Number of Zero-Filled Subarrays - Leetcode 2348 - Python
NeetCodeIO
Minimum Score of a Path Between Two Cities - Leetcode 2492 - Python
NeetCodeIO
Sqrt(x) - Leetcode 69 - Python
NeetCodeIO
Successful Pairs of Spells and Potions - Leetcode 2300 - Python
NeetCodeIO
Optimal Partition of String - Leetcode 2405 - Python
NeetCodeIO
More on: Systems Design Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Bloom Filters, Explained Properly
Dev.to · Daksh Gargas
Prefix Sums: The Preprocessing Trick That Makes Range Queries Instant
Medium · Programming
I Thought I Was Ready for the Interview — Then One Simple Math Question Destroyed Me
Medium · Programming
Week 2(Day 10): LeetCode Two Pointers(slow & fast): Remove Duplicates from Sorted Array (Brute…
Medium · Python
Chapters (3)
Read the problem
0:12
Drawing Explanation
10:19
Coding Explanation
🎓
Tutor Explanation
DeepCamp AI