CS50x 2026 - Lecture 5 - Data Structures

CS50 · Beginner ·⚡ Algorithms & Data Structures ·6mo ago

Key Takeaways

CS50x 2026 Lecture 5 covers data structures such as stacks, queues, dictionaries, and linked lists, with a focus on implementation using arrays, dynamic memory allocation, and pointer arithmetic.

Full Transcript

[Music] Heat. [Music] Heat. [Music] [Music] All right, this is CS50 and this is week five already uh wherein we will focus today on data structures which is a topic we've touched on a little bit in simp in simple form but today we'll dive all the more deeply and for better or for worse this is our last week on C uh next week Of course, we transition to Python, which is a so-called higher level programming language, which is really, frankly, just going to make our lives a lot easier. We're going to be able to solve a lot of the same problems, but so much more quickly as humans, but not necessarily, as we'll see, as fast when we run the code as the computer might have if we were still using a lower level language like C. So, indeed, thematic over this week and next is going to be the theme we've seen before of tradeoffs. But before we get there, why don't we focus on a couple of data structures that you might encounter in the real world. Uh, namely stacks and cues. Let's learn some facts about both of these. If we could dim the lights dramatically. [Music] Once upon a time, there was a guy named Jack. When it came to making friends, Jack did not have the knack. So, Jack went to talk to the most popular guy he knew. He went up to Lou and asked, "What do I do?" Lou saw that his friend was really distressed. "Well," Lou began, "just how you're dressed. "Don't you have any clothes with a different look?" "Yes," said Jack. "I sure do. Come to my house, and I'll show them to you." So, they went off to Jacks, and Jack showed Lou the box where he kept all his shirts and his pants and his socks. L said, "I see you have all your clothes in a pile. Why don't you wear some others once in a while?" Jack said, "Well, when I remove clothes and socks, I wash them and put them away in the box. Then comes the next morning and up I hop. I go to the box and get my clothes off the top." Lou quickly realized the problem with Jack. He kept clothes, CDs, and books in a stack. When he reached for something to read or to wear, he chose a top book or underwear. Then, when he was done, he would put it right back. Back it would go on top of the stack. I know the solution, said a triumphant Lou, you need to learn to start using a queue. Lou took Jack's clothes and hung them in a closet. And when he had emptied the box, he just tossed it. Then he said, "Now Jack, at the end of the day, put your clothes on the left when you put them away. Then tomorrow morning when you see the sunshine, get your clothes from the right from the end of the line. Don't you see?" said Lou, "It will be so nice. You'll wear everything once before you wear something twice." And with everything in cues in his closet and shelf, Jang started to feel quite sure of himself. All thanks to Lou and his wonderful queue. All right, our thanks to Professor Shannon Duval at Elon University who kindly put together that animation. And it's meant to paint a picture of a couple of things that we've all encountered in the real world. But more technically, what we just saw were what are known as abstract data types whereby they're data structures in some sense, but it's really about the design thereof. what characteristics or features or functionality these structures offer irrespective of how they are implemented in terms of lower level implementation details which is to say you can implement as we'll see Q's and stacks in any number of ways which are going to have real world implications for how you can actually use them and what kinds of problems you can solve with them. So let's consider for instance Q's in the first place. So, a queue is something you sort of experience all the time. Anytime you go to a store, uh, go to, uh, some event in for which you have to line up in a so-called queue, you'd ideally like there to be some fairness property about that queue such that if you got in line first, you get into the store first, you get to check out first or some other such goal. Meanwhile, the person who got there last actually is at the end of the line and stays at the end of the line and therefore gets served or enters in at the end. So, Q's have what a computer scientist would say is a FIFO property. first in, first out. That is, if you're the first person in line, you're the first person to get out of line. And for many problems, that is a good solution. Certainly, if you're concerned with fairness. Um, but more technically, AQ has what we'll call two operations. NQ, which is a fancy way of saying getting in line, and DQ, a fancy way of saying getting out of the line from the front of it. But those two operations, if you think about it in code, could it be implemented with different actual details? And by that, I mean this. Here is one way that we could go about implementing in CC code a queue for a bunch of people or persons who want to line up for something. So for instance, we'll decree that this queue can hold no more than 50 people like that's the physical capacity and then we define a structure which we've done a couple of times in the past whereby this structure has not only an array of persons that we'll call people and that will be as big as is the capacity. So this is an array of size 50 for 50 such persons. And then we're going to propose that we also keep track in this implementation of a queue of the current size of the queue. So we're going to make a distinction between the capacity like how many total people can be there and the size like actually how many people are in line at that moment in time so that you know which of the spots in the array are effectively empty. And we're going to call that whole structure a Q. Now the catch with this particular implementation in code of a Q is what there is inherent in it a a limitation something you just kind of have to deal with. And I see you nodding. What What's your instinct for this? >> For example, like 50 students. >> Okay. Well, I think you hit the nail on the head in that it's only for 50 students or 50 people, which means if a 51st person wants to get into line, you literally have no means of remembering them in this data structure. So, how do you solve that? Well, we could just recompile our code after changing the 50 to like 51 or maybe 500 or 5,000. But there there's this trade-off because you could still be undershooting the total number of people trying to get into maybe a big concert in the case of an extreme. But at at the same time, if you overallocate memory using 5,000 locations in memory, what if only a few people show up? Now you're just wasting memory. And certainly at the end of the day, you only have a finite amount of memory in the computer. So you kind of have to decide a priority like before compiling your code, how big is this structure going to be? How much space are you going to waste? And in the end, it's all sort of stupid. It would be ideal if instead we could just grow the queue as needed and shrink it. Essentially asking the operating system as we started doing last week for more memory and then giving it back if we don't actually need that memory. Which is to say can't really do an array in this static sense. And by static I mean we're literally deciding in advance at compilation time how big this thing is going to be. As an aside, this is also a bit annoying for implementing a queue because you have to somehow keep track of who is at the head of the queue, the front of the queue, because as you start plucking people off, you need to remember who's the next person effectively. But there are ways in code that we could solve this. So, let's consider an alternative to a queue, which gives us very different properties, namely a stack. And we saw that in the animation whereby uh Jack used a stack to put his clothes into a box so that every time he got dressed he sort of took the sweater from the top from the top from the top and might never wear anything other than black as a result. If he does a wash before he actually reaches the blue and the red sweater there. So a stack as we've just seen has a LIFO property to it. Last in first out. So, if I do a load of laundry and I plop some more sweaters on this stack, well, I'm presumably going to use the last sweater that went in first as opposed to trying to create a mess and like, you know, pull the bottommost sweater out, which is just going to be a little more effort than uh than it would be otherwise from just taking it from the top. So, sometimes last in first out doesn't give you maybe this fairness property you might want for other problems, but it does give you an efficiency, a convenience certainly. So, maybe that might be compelling. And stacks are actually everywhere, too. If you've checked your Gmail recently, odds are you've opened up gmail.com or outlook.com and you've looked at your inbox. And where does the new mail by default end up? At the top. At the top. At the top. And I dare say all of us are guilty of sort of neglecting emails that fall below the break or onto the next page and sort of focusing only on the last in and therefore replying to it first out, which isn't great maybe for the senders of those emails, but it's just how those user interfaces are implemented quite often unless you override those default settings. So how might we implement a stack? Well, we need to implement more technically two fundamental operations. The analogs of NQ and DQ in the world of stacks are called push, which means push something onto the top of the stack and pop, which means remove something from the top of the stack also. And the the team in the cafeterias and dining halls on campus do this all day long. Any of the cafeterias or dining halls that have stacks of trays, of course, you put the first tray at the bottom and then the next tray and the next tray and the next tray. And which tray do all of you pick up? Well, presumably the one on the very top because it's even harder to grab the bottommost tray than it would be for something like a sweater. As a result, there's maybe undesirable properties like maybe no one ever gets to the nasty tray at the very bottom of the stack because we're constantly replenishing the top ones. But thanks to gravity, like that just happens to be the most appropriate data structure in the real world for distributing things like trays in a cafeteria. So, how might we implement that idea in code? Well, funny enough, we can pretty much use the exact same structure. We could just rename Q to stack because at the end of the day we need to keep track of some number of people and maybe people's is a weird sort of analog here but we kept everything else the same so why not that but the size is also something we still need to remember and it turns out it's a little easier to implement a stack in this way because you could always remove it from the end of the array end of the array and the first thing that went into the stack the first in can always stay at location zero for instance but ultimately we could implement it in this way but we have the same darn limitation You can still only put 50 sweaters, 50 trays, 50 people into that stack data structure. So this is just one implementation approach. But that doesn't mean that's necessarily a limitation of stacks and cues. They're abstract in the sense that we could do better. We could maybe start to manage our own memory, move away from statically defining the total size of this array and just start allocating and deallocating, that is growing and shrinking the data structure instead. which is to say we can make these abstract data types much less abstract with actual implementations. Let's consider a data structure that we saw an abstract data type that we saw early on that we didn't necessarily give this name. A dictionary is yet another abstract data type that's sort of everywhere in the world literally in the world of dictionaries containing words and their definitions. And you can think of a dictionary really in the abstract if you were to draw this on the chalkboard as really just a two column table whereby on the left is the word and on the right is the definition. If it's a physical book, it's essentially the same thing with lots of columns of words on the left, often bold-faced, and then the definitions right next to them. You can also see this in the context of like a phone book, which is where we began the course in week zero, where it's essentially a dictionary of names and numbers instead of words and definitions. And a computer scientist would generalize the notion of a dictionary further and just call the thing on the left a key and the thing on the right a value. And these things are omnipresent in computing. And you're going to start to see them all the more today. next week and beyond in that if you just want to associate some piece of data with another piece of data, a so-called key value pair, a dictionary is going to be your go-to data type. But even these two we can implement in different ways for reasons that we've already seen. Like maybe there's only a finite size to this dictionary if we're using an array. Maybe we can do better than that. And maybe a dictionary if implemented one way is going to be fast. Maybe if implemented another way is going to be slow. So we'll consider these other design possibilities today too in the context of phone books and other data structures as well. After all, if you have an iPhone or an Android phone and Apple or Google only decided that you can have 50 friends because they implemented the contacts app in an array. I mean that would be an annoying limitation. So presumably they've done things a little more dynamically as we'll do today. So let's focus on the first of the data structures we saw back in week 2. That is an array which recall was just a chunk of memory where you can store values in it back to back to back and that was the fundamental definition. The values are back to back to back or contiguous in memory and as we've seen we generally have to decide in advance the size of an array. So for instance if we want to store three values like 1 2 and three it might look pictorially like this or in code let's go ahead and implement this same idea and take a moment to whip up our very first program here and we'll call it say list C. And in this program, let's just do something demonstrative of how you could use arrays to store three things in memory. It's quite simply the numbers 1 2 3, but you can imagine it being three people's names, three sweaters, three people, or any other piece of data as well. So, I'm going to go ahead and at the top of list C include standard io.h. I'm going to then do int main void. So, no command line arguments. Then, I'm going to go ahead and give myself an array of integers of size three called list. And that's how we've done that uh from week two onward. Then just for the sake of discussion, I'm going to hardcode some representative values. So the first value will be at location zero because arrays are zero indexed. Then I'm going to do the second value which will be two. And then the third value which will be at location two, but the value will be three. Now just to prove that we've stored this correctly in memory, let's just do a quick for loop for int i equals uh equals z. uh I is less than three I ++ and then inside of this for loop I'm just going to do a quick print f of percent i back slashn printing out the value of list at location i. So it's not a useful program per se but it gives us an array to play with. It prints out that what's in it. So hopefully we will see 1 2 and three on the screen. So let me make this list program dot /list enter and voila we're on our way going. All right, but what if now we actually want to uh change that design and be like, "Oh, shoot. I now have a fourth number that I want to store or just bought a fourth sweater or a fourth person wants to get in line or I want to add a fourth friend to my contacts." Whatever the scenario might be, it stands to reason that ideally you would plop that fourth value right here in memory so that everything remains contiguous. You're still using an array. Your code doesn't really have to change except for the length. All for for all intents and purposes, it's the same implementation using a just a bit more memory. But recall that when you declare an array of a fixed size, you only are getting promised that chunk of memory, not necessarily more memory to the right, to the left, above or below conceptually because recall in the context of your whole computer, you've got this canvas of memory, all of which represent here bytes. And there could be a whole bunch of actual values or garbage values in memory. So in a more complicated program that 1 2 3 sure might end up here. But if I also had created a string in this program h e l o comma world might have also ended up right next to it in memory which means I can't just plop the four here because then if I'm still using that string elsewhere in my program now it's going to say hello world instead of hello world because you're just claiming the h that bite as your own which does not in fact belong to your array. Of course, there looks like there's plenty of other memory I could use here because these garbage values represented by Oscar are not being used. They've been used in the past, but we treat garbage values as memory we could reuse certainly. So, wouldn't it be nice to maybe just plop the 1 2 3 and four in this chunk of memory over here. And I can totally do that. But, of course, if I want to do that, I got to copy the first three values over and then put the fourth one there and then presumably give back to the operating system the memory I no longer need. So that in fact when using arrays is a perfectly valid solution and I think we can go ahead and do this in our same program. So let me go back to VS code here and instead of statically allocating memory for this array and by static I mean literally hard hard coding the number three here in a way that is permanent uh effectively. Let me go ahead and do this instead. at the top of my code. Let me delete the static allocation of that in uh that array before. And now let me leverage my understanding if still preliminary of pointers and memory management from this past week four to just dynamically allocate a guess at how much memory I need initially. So I'm going to go ahead and use maloc and allocate space for three integers. But integers take up a few bytes and it's usually is four. But just for good measure, I'm going to say times whatever the size of an int is is the total number of bytes I want. So presumably it's going to be 3 * 4= 12, but I'm generalizing it. But then recall that maloc returns the address of that chunk of memory, the address of the first bite. So if I want to create an array effectively called list, I can't just do int list like this yet. But what I could say is that all right now my list variable is actually going to be the address of an integer and set maloc's return value equal to that. So in code here what I've done is I'm asking on the right hand side the operating system please give me 12 contiguous bytes in memory. All of those bytes of course can be numerically addressed like ox12324125. We've had that story before. Maloc by definition returns the address of the first such bite and it's on me to remember that I allocated 12 if need be. So I'm just storing the address of that first bite in a pointer called list. But recall from last week there's this functional equivalence we saw between treating a pointer as an array and sometimes even treating an array like a pointer. The C uh language sort of lets us do this this conversion if you will. So what I could do here now is quite the same syntax as before. I could say list bracket 0 gets one. List bracket 1 gets two. List bracket 2 gets three. And even though I have this fancy new line inspired by week four, the syntax thereafter can be exactly the same. Why? Well, recall that these three lines here using square bracket notation is just syntactic sugar for the stuff we learned last week. Specifically, I could instead of doing list bracket zero, I could much more arcanely say go to that address in list and put the number one there, please. I can say go to the address list + one and put the value two there. I could then say finally go to the address at list + two and put the number three there. But this looks ridiculous. And even uh sort of an experienced programmer might not be inclined to do this if with using fewer keystrokes and more readable code. They could just do instead what I did the first time around, which is functionally the same, and just treat that chunk of memory as though it's an array. And the computer will essentially do the requisite pointer arithmetic to figure out where to put one, two, and three. So even though this is still kind of fresh, hot off the press from last week, it's exactly the same as we tinkered with last week. So suppose now that some time passes and I realize for the sake of the story that oh shoot I need more than three integers. I need space for four so as to achieve this picture in memory. Well I could of course just like delete all that code, change the three to a four, redo the whole thing, recompile the code, rerun it. But let me propose that we write our code in a way that allows us to change our mind while the program is running how much memory we actually need. And case in point, if you meet someone new, you want to add them to your phone. Well, you obviously don't want to have to wait for Apple to recompile the contacts app, reboot your phone just to add one more person. You want the program just to ask the operating system for more memory for that new person. So in this case, let's just pretend that some time passes and now I want to go ahead and actually change my mind and instead allocate space for four integers instead. Well, I could do something like this. I could just say literally list equals maloc of 4* size of int semicolon. I don't need to redeclare list on line 13 because it already exists from line five. But this is bad because what have I done wrong here in line 13? I've made a poor decision. Yeah, in front. um you like waste all the memory that >> Yeah, I'm wasting all of the memory I had from line five because I'm essentially forgetting where it is. If the list pointer is literally a pointer, like a foam finger pointing somewhere in memory, what I'm really doing is saying point it over here now, but I've completely lost track of those other three integers in memory. And that's what we described last week as a memory leak, which you could find with Valgrren. And if you didn't find it or fix it in code, eventually the computer and the program would slow down over time. So this is probably bad. It's not good to just unilaterally change your mind and say, "No, no, no. Forget about that memory. Give me a new chunk of memory." Especially if you want to copy the old memory into the new, just like I did a bit ago when trying to get the 1 2 3 into the bigger chunk of memory that can fit 1 2 3 4. So, how might I do this? Well, a temporary variable is kind of our go-to solution anytime we need to remember something in addition to uh something we already have in mind. So, let me just give myself a temporary variable called tmp by convention for short and set the return value of this mala call to that. And then what I could do is something like this. Much like my print statement earlier, I could do another for loop and say for int i equals z i is less than 3, i ++. And then in this for loop I could say treat that new chunk of memory as an array like we can set the e location equal to the e location in list. So these lines here copy old list into new list. It copies those first three values. And then what I bet I could do at the bottom here is then just manually I can say go to the fourth location which when you zero index is technically bracket three and set that equal to the number four. So these lines here copy the one the two and the three using a loop and then line 20 here at the moment just adds the fourth value. And again this is a stupid sort of way to write code and that if you want to put the four there you should have just done it earlier. I'm just pretending that some time has indeed passed in the program and I've changed my mind along the way and I want to let the user add some value to memory. Okay, but before we proceed further, I dare say that there are some other mistakes we should clean up. One of the lessons I preached last week was that anytime you use Maloc, what should you do or check for is you should always what? You should always free. So here I'm clearly not freeing any memory. So I should definitely do that. And there was one other rule of thumb with memory. What should you always do when using Malik? Yeah. >> Check to see if null came back, which just means something is wrong, like it's out of memory or something else went wrong. And if you don't do that, your program may very well crash with one of those segmentation faults that we saw uh briefly in the past. So it makes the code a lot more bloated, but it is good practice. So let's just check if the list pointer I get back contains null. There's no point continuing on. Let's just go ahead and immediately return one because something has indeed gone wrong. And then down here under maloc again let's do the same. If the temporary pointer also contains null. Now let's go ahead and similarly return one or any other nonzero value. But here's a subtlety and let me combine your two ideas. If I immediately return one on line 20 after the second maloc call fails, what should I still go back and do first? Yeah. You want to elaborate on your first instinct? >> Yeah, I want to still free the first chunk of memory because if we execute line five and all is well, which means that line 6, 7, 8, and 9 don't apply. Like it's not in fact null. We got back a legitimate value. That means we have a chunk of memory given to us for three integers, which means it still exists down here at line 19 and 20. So if I'm ready now to abort this program and return one to signify error, I first want to free that original list and say to the operating system, here's your memory back. Now, as an aside, strictly speaking, this is not necessary because the moment the program itself quits, the computer is just going to give back the memory to the operating system. So when programs quit, the memory leaks sort of go away. But your code is still buggy and generally we're running software that doesn't run for a split second but for minutes, hours, days, uh continually in which case it's best practice to squash these memory related bugs now. Check for null, free any memory so that you never indeed encounter these kinds of leaks. All right, so let's forge ahead a little bit more and let me propose that after we have done the copy, we now want to similarly free the original list. However, what I think we're going to want to do first is after freeing the original list is remember that the new list is effectively that which we allocated the second time around. So even though this program is getting a little long, notice that what I've just done is I've said, okay, store in the list variable the address of this new chunk of memory. So that list now with a foam finger is effectively pointing here instead of up here. But before that, I made sure to free what my finger was pointing at originally, the list pointer. All right. Lastly, let's just scroll down to the bottom of the code here. I can manually change the three to a four just to demonstrate that I've stored all four values in here. And then at the very end of the program, I think I have to free the list again because now list is pointing all the foam finger to the bigger chunk of memory, the 1 2 3 4. And then I can go ahead and return zero at the very end because all is hopefully well at this point. Let me go ahead and open my terminal window again and make this version of list. I made a lot of mistakes here it seems. Let's scroll up to the very first call to undeclared library function maloc dot dot dot. What have I apparently done wrong or forgotten? What have I done wrong? Yeah, I'm back. Yep. Yeah. So in standard lib.h H is where maloc is actually declared. So let's just add that quickly. Let's go ahead and include standard lib.h in addition to standard io.h. Let me clear my terminal window. Rerun make list. Enter. Now we're good. Dot /list. And ph we see 1 2 3 4. Okay. So at this point in the story, all we've done is write a dopey little program that allocates memory for three integers. 1 2 and three. then changes our mind and allocates more memory for four integers, freeing the original chunk of memory after copying the first three integers into the new memory and adding that fourth value. But this is kind of a lot of hoops to jump through. And let me propose one refinement here. So if back in VS Code, we go back into list.c here. It turns out that at least this loop isn't strictly necessary, not to mention the fact that we already have another loop for just printing the list. If I want to more cleverly reallocate memory, it turns out that there's another function that we didn't talk about last week, but is in standard lib.h2 called realloclock, which as the name kind of suggests, it reallocates memory, but a little smarter in that it will try to grow your existing chunk of memory if it can, which is going to be super efficient because then you can just plop the four at the very end. or if there just isn't room there because maybe someone else put hello world right there in memory elsewhere in your program. It's going to do all of the copying for you. So what you get back ultimately is a pointer to the new chunk of memory containing all of the original data as well. However, we're still going to have to check for null. We're still going to want to free the original list if something goes wrong and then return one. We're still going to want to add the fourth value cuz realo has no idea what more we want to put in the list. But I can in fact delete my other for loop whose purpose in life was just to copy all of those integers from old into new. All right, that was a lot. Let me pause for any questions. >> How does real know that it should reallocate the memory in list? Shouldn't you tell like if you have a lot of malog before specifically? >> Very good question. That's because I wrote a bug uh that we didn't trip over because I didn't compile this version of the code. So the question is how does realloc know what to realloc? Well, according to the documentation, which I forgot to read, you need to tell realloclock what the address is of the chunk of memory that you do want to realloclock. So the first argument to realloc which I did admittedly forget until a moment ago is to put the address of the chunk of memory that you already maloclocked earlier so that it knows to go there see if there's indeed some garbage values it can reclaim at the end of that chunk of memory or if it has to wholesale move things elsewhere in memory to give you four times the size of the int this time instead of just three. But still things can go wrong like you still want to check for this null value because real might not be able to give you enough memory or your memory could just be so fragmented that even though you want four bytes maybe there's three bytes over here two bytes over here one bite over here if there aren't four contiguous bytes realloclock 2 could fail and it will return null to signify as much other questions on any of this. >> Why do we still need the temp? >> Why do we still need the temp variable for the same reasons as before? Because if we just say list equals reallock and something does go wrong, realo by definition will return null but not touch the original memory which case we have now lost track of where that original chunk of memory is. So we can never go back to it to print it to change it to free it. So we have to use this temporary variable here. Good question. Other questions? Yeah. >> Is there a reason? Is there a reason that we free list instead of temp? Uh, so let me So down here or further down? Okay, so further down, let me scroll down to where we came from. So here after we've added this fourth value to temp, I've gone ahead and freed list, which at this point in the story is still pointing to the original chunk of memory, the 1 2 3. Then I am updating list as a variable to point to the new chunk of memory. Then I'm doing my thing by printing out all of the integers therein. Then I am freeing what list is then pointing to. So I'm not technically freeing the same address in memory multiple times because I'm in the intervening time moving what list is pointing to. >> Absolutely yes. it would be correct to go ahead down here and just say temp because temp is still in scope. It's still pointing at the same thing. I would just argue that that's semantically wrong because at this point in the code really list is the variable you care about. Temp was really meant to be a throwaway temporary variable and you're asking for trouble if you use a temporary variable later than you the programmer intended. And if a colleague did that too, who knows what you've done with the temporary bolt in the meantime? Good questions. Yeah, in front >> correct. Realloc will try to give you more memory in the same location as before if there's room at the end. >> So realloc will two potential things for you. So if the computer's memory looks like this, you're sort of out of luck because realloc can't give you this bite. However, if it finds like four bytes down here, for instance, realloc will not only allocate those four bytes for you, it will then copy the data for you over to it, which is wonderful because it just means we don't need an extra for loop all the time. We do this. Yeah. In front. >> How does it know how much data? >> How does it know how much data to >> copy? >> Uh because how much how does the how does Realloc know how much data to copy? Because the operating system, and you can think of it as the standard library, stdlib.h, keeps track of what memory has been allocated for you in the past. So when you pass in that same address, it knows, it has essentially a lookup table, a dictionary if you will, that tells it what memory has been allocated already. So you don't have to worry about that. >> Yeah. In front. >> Good question. In other programming languages, you don't always have to declare the length of an array. Case in point, Python, coming next week. That is because someone else who invented that programming language wrote all of this kind of code for you. And indeed, that's one of the goals with our transition between weeks five and six is to demonstrate that all of these problems are still being solved, just not by you and not by me anymore. We're standing on the shoulders of other smart people who have invented not just new code, but like a new language and a new compiler, or as we'll see, an interpreter for it so that we can hide all of these lower level details. Cuz honestly, as you can see already, like this is an annoying number of lines of code just to have a conversation about the numbers 1 2 3 4. In Python, we could reduce this code to like two lines of code, one line of code. It's going to be fine. All right. So, with that said, the among the goals here was to demonstrate that there are a bunch of ways in which we can implement these data types. But let's talk more concretely about what we'll call data structures, which are concrete definitions of how you use the computer's memory to lay stuff out in memory. And using data structures, you can implement stacks and cues and dictionaries and all of these other things. So, we're going to put into your toolkit today a whole bunch of canonical data structures that like every computer scientist does and should know that you nec won't necessarily implement all of the time yourself, but when you use some feature of Python or Java or C++ or some other language, you are choosing among typically implementations of these data structures that someone else has written the code for so that you can just benefit from the functionality and the features thereof like that FIFO property we talked about or LIFO without having to get into the weeds too much yourself. So when it comes to data structures, let's consider that we have at our disposal now a few new pieces of syntax in C and we're going to add just one more today. We saw last week that we have the strruct keyword and we've seen that for a few weeks now. Whenever we want to invent our own data structure, we can use literally strruct. We saw in the past that you can use the dot operator to actually go inside of a structure to get at someone a person's name or their number. And we saw last week the star operator for dreferencing a pointer, dreferencing an address to actually go somewhere like inside of a structure. Wonderfully. Today we're going to see that you can actually in some cases combine the dot and the asterisk into a single operator with two characters that literally looks like an arrow and that will help reflect the yellow and black drawings that we've done over the past couple of weeks where we have an arrow on the screen pointing somewhere. This literal arrow in code is going to line up with that same concept. So let's introduce the first of our alternatives to arrays. An array again is a contiguous chunk of memory where the values are back to back to back. Among the upsides so fast because like all the data is right there. We've seen since week zero you can do binary search and just jump around randomly by just doing simple arithmetic to go to the middle the middle of the middle by just dividing by two a couple of times and rounding as needed. But the problem with arrays to be clear is that they are statically uh they are statically all allocated to be a specific size maybe three maybe four but it is a finite value which is problematic because look at all the code we had to write just to resize these things again and again. Well, what if we sort of try to preempt that kind of pain and try to just build up a list by linking it together no matter where the values actually are in memory and move away from this constraint that everything has to be contiguous. After all, as I said a moment ago, if the computer has plenty of memory here, here, here, here that to collectively is more than enough memory, but none of those individual chunks is quite as big as you need for an array. Well, heck, let's at least try to leverage all of the available memory and stitch together the data structure as opposed to really holding firm this constraint that the array be back to back to back and contiguous. So a linked list is something you can now build using that syntax from last week and a bit more today in your same canvas of memory. So that for the sake of discussion, suppose that we want to store first in our list the number one. Well, we all know already that it might very well exist at an address like ox123 for the sake of discussion, but it's somewhere there. Suppose that you want to store a second value in memory, but you didn't think about it initially and so you weren't smart enough to put it like right next to the one and then the next value next to that, but you know somehow from maloc or similar functions that you could put the number two over here at address ox456 for the sake of discussion and similarly there's room for the number three over here at say address ox789. So already we have a list of values in memory. But because they're not continuous, you can't just do some trivial plus+ trick to go from one to the other because they're differing numbers of bytes apart. They're not just back to back one bite. So what if we try to solve that problem in the following way? Instead of just using one bite for each of these values, let me waste a little bit of memory or spend a little bit of memory and have some metadata associated with our data. So data is value or values you care about. Metadata is data that helps you maintain the data you care about. So let me propose that we use two chunks of memory for every value such that the top of each of those chunks represents the actual varrow you we care about 1, two, and three respectively. And you can perhaps see where this is going. The second chunk of memory that I've allocated to each of these values could perhaps be a pointer to the next one. A pointer to the next one. And if this is the end, we can put our old friend o x0 aka null and just treat that as the end of the list implicitly. So even though these things could be anywhere in memory, by just storing with each value the address of the next value in memory, creating effectively a treasure map or breadcrumbs, however you want to think of it metaphorically, we can get from one node to the other. And indeed, that's going to be a term of art we start using. A node is just a generic structure that contains data and metadata. usually like the number you care about and a pointer to the next such node. Um, these are not to scale as an aside. This is typically four bytes. A pointer, as we've discussed, is technically eight bytes, but it just looks prettier to draw them as simple squares on the screen. So, what does this really mean? Well, who really cares about ox 1 2 3 4 5 6 7 8 9. We can really think of this actually as being more of a picture with arrows. But to keep track of this list of three values, I do propose that we're going to need one additional value over here. And it's deliberately just a single square because to keep track of this list of three values, I'm going to use just one variable called say list and store in that variable a pointer as we defined it last week the address of the first node. Why? Because the first node can then get me to the second. The second node can then get me to the third and so forth. So what's the upside? And now if I want a fourth value somewhere on the screen, I could put it here, here, here, here, wherever there's enough room and just make sure that I update the arrow to point to that next chunk. Update the arrow to point to the next chunk. There's no copying of data. 1, two, and three can stay there now forever until the program quits and we do actually free it. But we can just keep adding, adding, adding or growing this data structure in memory. So that is what the world knows as a linked list in Python to which you were essentially alluding. Um a list in Python is indeed a linked list. Other languages call these vectors but they are essentially arrays that can be grown and shrunken automatically effectively without you having to worry quite as much about it. So how does the code for implementing something like this work? Well, let me propose that we have this familiar friend of a person which we claimed in past weeks has a name and a number associated with them. We know from last week that strings are not technically a thing in C as a keyword. So that's technically just char star name and number but same idea otherwise. And this is what we defined in the past as a person. So this is a structure we've seen before. I now need to implement the code equivalent of these rectangles, each of which has an integer and then a pointer to the next such value. So let me propose that we delete what's inside this structure, change the name from person to node, which again is a generic term for a container of values, and let me propose that inside of this new node structure, we put literally an int for the number we care about. There's going to be my 1 2 3 or four. And then, and this is a little bit new, let's include in this structure a pointer to the next such node. It's a pointer in the sense that it's an arrow. It's the address of the next node. So that's why we say node star. I could call it anything I want, but semantically calling it next makes perfect sense because it's the next such node. But this isn't quite right. For annoying technical reasons, I need to do one other thing here. I need to technically and we've not done this before put the name give the a temporary name to this structure if you will. So literally say strruct node here even though I've already said node here. Why? Because I technically need to change this line to say strruct node star. Long story short, why is this necessary? Well, recall in the past C and the compiler reads your code top to bottom, left to right. Well, if in a previous version of this code, we use the word node here, but the compiler never sees the word node until down here. like it's just not going to compile because the word literally doesn't exist. We saw this with functions in the past. So we the solution to that was to put the prototype higher up in the file and then it would compile. Okay, you can think of this as somewhat analogous whereby if I give this structure a name on this first line even if it's redundant to this one then I can say struck node inside of these curly braces because the compiler has already seen the word node there. So just you have to do it this way. So now that we have this in code, we can kind of start playing around with actually storing these things in memory. So let me propose that we go ahead and do this by transitioning back to VS Code here. And let's instead of using our array based implementation, let's implement the first of our linked lists. And I'm going to be a bit extreme and delete pretty much everything inside of main. I am for convenience now going to include the CS50 library. Not so much for the char star thing, but because as we discussed last week, it's still useful for getting ints and getting strings and other things which instead unless you use scanf are much harder and more annoying to get in C. So, let's go ahead and do this. Um, outside of main, let's go ahead and invent this node called strruct node here. Then, inside of my curly braces, we'll give every such node a number and every such node a pointer to the next such node. And we'll call this whole thing node by convention. Then inside of main, let's go ahead and do this one step at a time. Let me propose that to create a linked list initially, it's empty. So how do I represent an empty linked list? Well, I could call the variable list and set it equal to null. But what is the data type for a linked list? Well, per the picture that we had up earlier, in so far as all we need is a single pointer at far left here to represent the address of the first node in the list. I dare say all we need to say is that our list is of type node star. That is to say, what is the link list? Well, it's by definition the address of the first node in the list. So that's the first subtlety here. So that gives me a picture with no other nodes. It just gives me a single pointer initialize to null. Now let's go ahead and for par with the previous example just do something three times. So in this for loop structured exactly as before, let's go ahead and allocate a new node, ask the user for a number to put inside of it and then start stitching things together so as to achieve a picture in memory quite like this. So how am I going to do this? Well, first I need to allocate a new node. How do I do that? Well, I can use our new friend Maloc and allocate the size of a node. I want to store the address of this chunk of memory somewhere. And what I'm going to propose is that we have a temporary variable and I'll call this n which whose type is that of a node star. So what am I doing here? I'm trying to build up this list in memory so that I first have a pointer to the list. I I first have a pointer that is null pointing nowhere. no list exists. I then want to go ahead and create one new node, store value in it, and then point my list at that node. Then I want to do it again and again a total of three times. So how do we do this? We allocate space for the size of a node. However many bytes that's going to be, it's probably going to be 12 cuz it's four for the int and eight for the pointer, but who cares? Size of will answer that question for me. I'm going to store the address of this chunk of memory inside of a temporary variable called n for node and that's why it has to be node star because it's going to be pointing to an actual node. I'm going to do my quick sanity check. So if n equals equals null, we can't proceed further. I'm going to go ahead and just return one right now. So that's just sort of boilerplate code you should be in the habit of doing anytime you're using maloc. But if all goes well, let's do this. Let's go to the address in N and then go inside of that node and change its number to be whatever the human wants it to be by using get int and just prompt the human for their favorite number. Then let's go to that same node and update the next field to equal for now null because all I want to do is allocate one new node with that number. That's it. Then I'm going to need to stitch this together further. So I'll propose that all we need do and let's clean this up first is now make sure that we string these nodes together. This syntax isn't quite right because technically because of precedence I need to drefer oops I need to uh dreference n and then go inside of it. I need to dreference n and then go inside of it. However this syntax if it's looking a little overwhelming and you have no idea now what's going on. Thankfully in C there's much simpler syntax which is this. Go to the node and go inside it to get the number. Go to the node and go inside it to get next. So the arrow notation that I promised we would now have is the same thing as using the star operator the deep reference operator parenthesizing it. Then the dot operator which is just a p

Original Description

*** This is CS50, Harvard University's introduction to the intellectual enterprises of computer science and the art of programming. *** TABLE OF CONTENTS 00:00:00 - Introduction 00:00:43 – Jack Learns the Facts 00:03:35 – Stacks and Queues 00:10:44 – Dictionaries 00:12:26 – Resizing Arrays 00:27:16 – realloc 00:33:43 – Linked Lists 01:20:24 – Trees 01:36:12 – Hashing and Hash Tables 01:56:09 – Tries *** HOW TO SUBSCRIBE http://www.youtube.com/subscription_center?add_user=cs50tv HOW TO TAKE CS50 edX: https://cs50.edx.org/ Harvard Extension School: https://cs50.harvard.edu/extension Harvard Summer School: https://cs50.harvard.edu/summer OpenCourseWare: https://cs50.harvard.edu/x HOW TO JOIN CS50 COMMUNITIES Bluesky: https://bsky.app/profile/cs50.harvard.edu Discord: https://discord.gg/cs50 Ed: https://cs50.edx.org/ed Facebook Group: https://www.facebook.com/groups/cs50/ Faceboook Page: https://www.facebook.com/cs50/ GitHub: https://github.com/cs50 Gitter: https://gitter.im/cs50/x Instagram: https://instagram.com/cs50 LinkedIn Group: https://www.linkedin.com/groups/7437240/ LinkedIn Page: https://www.linkedin.com/school/cs50/ Medium: https://cs50.medium.com/ Quora: https://www.quora.com/topic/CS50 Reddit: https://www.reddit.com/r/cs50/ Slack: https://cs50.edx.org/slack Snapchat: https://www.snapchat.com/add/cs50 SoundCloud: https://soundcloud.com/cs50 Stack Exchange: https://cs50.stackexchange.com/ Telegram: https://t.me/cs50x Threads: https://www.threads.net/@cs50 TikTok: https://www.tiktok.com/@cs50 Twitter: https://twitter.com/cs50 Twitter Community: https://twitter.com/i/communities/1722308663522594923 YouTube: http://www.youtube.com/cs50 HOW TO FOLLOW DAVID J. MALAN Facebook: https://www.facebook.com/dmalan GitHub: https://github.com/dmalan Instagram: https://www.instagram.com/davidjmalan/ LinkedIn: https://www.linkedin.com/in/malan/ Quora: https://www.quora.com/profile/David-J-Malan Threads: https://www.threads.net/@davidjmalan TikTok: https://www.ti
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from CS50 · CS50 · 0 of 60

← Previous Next →
1 Hello, World: Hadi Partovi
Hello, World: Hadi Partovi
CS50
2 Content Distribution and Archival in a Digital Age
Content Distribution and Archival in a Digital Age
CS50
3 CS50 2014 - Week 1
CS50 2014 - Week 1
CS50
4 CS50 2014 - Week 3
CS50 2014 - Week 3
CS50
5 CS50 2014 - Week 0, continued
CS50 2014 - Week 0, continued
CS50
6 CS50 2014 - Week 4
CS50 2014 - Week 4
CS50
7 Week 3, continued
Week 3, continued
CS50
8 Quiz 0 Review
Quiz 0 Review
CS50
9 CS50 2014 - Week 3, continued
CS50 2014 - Week 3, continued
CS50
10 CS50 2014 - Week 7
CS50 2014 - Week 7
CS50
11 CS50 2014 - Week 7, continued
CS50 2014 - Week 7, continued
CS50
12 Breaking Through The (Google) Glass Ceiling by Christopher Bartholomew
Breaking Through The (Google) Glass Ceiling by Christopher Bartholomew
CS50
13 Introduction to Amazon Web Services by Leo Zhadanovsky
Introduction to Amazon Web Services by Leo Zhadanovsky
CS50
14 CS50 2014 - Week 9
CS50 2014 - Week 9
CS50
15 How to Build Innovative Technologies by Abby Fichtner
How to Build Innovative Technologies by Abby Fichtner
CS50
16 Light Your World (with Hue Bulbs) by Dan Bradley
Light Your World (with Hue Bulbs) by Dan Bradley
CS50
17 Building Dynamic Web Apps with Laravel by Eric Ouyang
Building Dynamic Web Apps with Laravel by Eric Ouyang
CS50
18 CS50 2014 - CS50 Lecture by Steve Ballmer
CS50 2014 - CS50 Lecture by Steve Ballmer
CS50
19 CS50 2014 - Week 10
CS50 2014 - Week 10
CS50
20 This is CS50 with Steve Ballmer?
This is CS50 with Steve Ballmer?
CS50
21 Meteor: a better way to build apps by Roger Zurawicki
Meteor: a better way to build apps by Roger Zurawicki
CS50
22 Data Analysis in R by Dustin Tran
Data Analysis in R by Dustin Tran
CS50
23 Data Visualization and D3 by David Chouinard
Data Visualization and D3 by David Chouinard
CS50
24 CS50 2014 - Week 6
CS50 2014 - Week 6
CS50
25 Build Tomorrow's Library by Jeffrey Licht
Build Tomorrow's Library by Jeffrey Licht
CS50
26 CS50 2014 - Week 9, continued
CS50 2014 - Week 9, continued
CS50
27 Essential Scale-Out Computing by James Cuff
Essential Scale-Out Computing by James Cuff
CS50
28 iOS App Development with Swift by Dan Armendariz
iOS App Development with Swift by Dan Armendariz
CS50
29 Sam Clark Leads Yale Students on Tour to CS50 at Harvard
Sam Clark Leads Yale Students on Tour to CS50 at Harvard
CS50
30 3D Modeling and Manufacture by Ansel Duff
3D Modeling and Manufacture by Ansel Duff
CS50
31 CS50 2014 - Week 5, continued
CS50 2014 - Week 5, continued
CS50
32 hello, world
hello, world
CS50
33 CS50 2014 - Deep Thoughts - Hash Table
CS50 2014 - Deep Thoughts - Hash Table
CS50
34 CS50 2014 - Deep Thoughts - Binary Tree
CS50 2014 - Deep Thoughts - Binary Tree
CS50
35 CS50 2014 - Deep Thoughts - Scratch
CS50 2014 - Deep Thoughts - Scratch
CS50
36 CS50 2014 - Deep Thoughts - MySQL
CS50 2014 - Deep Thoughts - MySQL
CS50
37 LaunchCode Visits CS50
LaunchCode Visits CS50
CS50
38 CS50 Live, Episode 100
CS50 Live, Episode 100
CS50
39 CS50 Field Trip to Google
CS50 Field Trip to Google
CS50
40 This is CS50 AP
This is CS50 AP
CS50
41 Week 4: Monday - CS50 2011 - Harvard University
Week 4: Monday - CS50 2011 - Harvard University
CS50
42 Week 2: Wednesday - CS50 2011 - Harvard University
Week 2: Wednesday - CS50 2011 - Harvard University
CS50
43 Week 1: Wednesday - CS50 2011 - Harvard University
Week 1: Wednesday - CS50 2011 - Harvard University
CS50
44 Week 11: Monday - CS50 2011 - Harvard University
Week 11: Monday - CS50 2011 - Harvard University
CS50
45 Week 3: Wednesday - CS50 2011 - Harvard University
Week 3: Wednesday - CS50 2011 - Harvard University
CS50
46 Week 12: Monday - CS50 2011 - Harvard University
Week 12: Monday - CS50 2011 - Harvard University
CS50
47 Week 1: Friday - CS50 2011 - Harvard University
Week 1: Friday - CS50 2011 - Harvard University
CS50
48 Week 3: Monday - CS50 2011 - Harvard University
Week 3: Monday - CS50 2011 - Harvard University
CS50
49 Week 10: Wednesday - CS50 2011 - Harvard University
Week 10: Wednesday - CS50 2011 - Harvard University
CS50
50 Week 2: Monday - CS50 2011 - Harvard University
Week 2: Monday - CS50 2011 - Harvard University
CS50
51 Week 9: Monday - CS50 2011 - Harvard University
Week 9: Monday - CS50 2011 - Harvard University
CS50
52 Week 7: Monday - CS50 2011 - Harvard University
Week 7: Monday - CS50 2011 - Harvard University
CS50
53 Week 5: Monday - CS50 2011 - Harvard University
Week 5: Monday - CS50 2011 - Harvard University
CS50
54 Week 5: Wednesday - CS50 2011 - Harvard University
Week 5: Wednesday - CS50 2011 - Harvard University
CS50
55 Week 7: Wednesday - CS50 2011 - Harvard University
Week 7: Wednesday - CS50 2011 - Harvard University
CS50
56 Week 8: Monday - CS50 2011 - Harvard University
Week 8: Monday - CS50 2011 - Harvard University
CS50
57 Week 9: Wednesday - CS50 2011 - Harvard University
Week 9: Wednesday - CS50 2011 - Harvard University
CS50
58 Week 8: Wednesday - CS50 2011 - Harvard University
Week 8: Wednesday - CS50 2011 - Harvard University
CS50
59 Week 10: Monday - CS50 2011 - Harvard University
Week 10: Monday - CS50 2011 - Harvard University
CS50
60 Week 2: Wednesday - CS50 2010 - Harvard University
Week 2: Wednesday - CS50 2010 - Harvard University
CS50

This lecture covers the basics of data structures, including stacks, queues, dictionaries, and linked lists, and how to implement them using arrays, dynamic memory allocation, and pointer arithmetic. The lecture also covers memory management and how to use tools like malloc and realloc to allocate and reallocate memory.

Key Takeaways
  1. Implement a stack using an array
  2. Implement a queue using a linked list
  3. Use dynamic memory allocation to allocate memory for an array
  4. Use pointer arithmetic to manipulate memory
  5. Implement a linked list using a node structure
  6. Use malloc to allocate memory for a node
  7. Use realloc to reallocate memory for a node
💡 Data structures can be implemented using arrays, linked lists, and dynamic memory allocation, and understanding how to manage memory is crucial for efficient and effective programming.

Related Reads

📰
Bloom Filters, Explained Properly
Learn how Bloom filters work and their benefits, including tiny memory and blazing speed, in exchange for potential false positives.
Dev.to · Daksh Gargas
📰
Prefix Sums: The Preprocessing Trick That Makes Range Queries Instant
Learn how prefix sums enable instant range queries in arrays, boosting performance in various applications
Medium · Programming
📰
I Thought I Was Ready for the Interview — Then One Simple Math Question Destroyed Me
A simple math question can destroy a developer's interview, highlighting the importance of being prepared for unexpected questions
Medium · Programming
📰
Week 2(Day 10): LeetCode Two Pointers(slow & fast): Remove Duplicates from Sorted Array (Brute…
Learn to remove duplicates from a sorted array using the two pointers technique, improving from brute force to optimized solutions
Medium · Python

Chapters (10)

Introduction
0:43 Jack Learns the Facts
3:35 Stacks and Queues
10:44 Dictionaries
12:26 Resizing Arrays
27:16 realloc
33:43 Linked Lists
1:20:24 Trees
1:36:12 Hashing and Hash Tables
1:56:09 Tries
Up next
Stump Grinder Carbide Wheel Grinds Hardwood To Chips
Innoforge Studio
Watch →