Week 5: Wednesday - CS50 2011 - Harvard University
Key Takeaways
Covers the CS50 Library, heap, pointers, and forensics in the context of CS50
Full Transcript
all right welcome back to cs50 this is the end of week five and this is one of my favorite problem sets that's on the horizon here so if some of you have kind of been staring at this trying to figure out what it is letting your eyes zone in and zone out odds are you didn't quite decipher it but if you had say from childhood one of these pieces of plastic that came for instance with one of these cereal top boxes and you held it up to this thing what you would see as did last year's students when they implemented this idea in code was that it was professor plum in the lounge with the candlestick with this particular murder mystery and so on the horizon for you this year is going to be a little somewhat different in a little someplace different with a little something different but this is a teaser for problems at five whose focus will be forensics and the recovery of information that's either been accidentally or deliberately lost uh i went through my uh inbox to retrieve an email from one of your predecessors i'll keep it anonymous but this was really quite fun to read since he sent this to us a few months after cs50 ended a year ago and he wrote to us yesterday my sister accidentally formatted her camera's digital media card and lost a year's worth of memorable photos parenthetically she unfortunately isn't the best at backing up her data this reminded me of problem set five so i thought i would try to run her memory card through the recover.c program that i wrote all the way back in october so after four hours of figuring out how to create a raw forensic image from the formatted memory card i uh came across this page and was able to do the following after tinkering around with some of the command line arguments i managed to create the forensic image i installed and configured the cs50 virtual box i managed to run the forensic image through my program and run all recover all 1027 of my sister's photos i find it absolutely amazing that i was able to go from being one of those less comfortable with no programming knowledge whatsoever to having the ability to recover data off of a formatted memory card in an actual real-life situation so it does in fact happen so this will be problem set five so a quick invitation uh cs50 lunch at the usual place cs50.net rsvp this friday at 1 15 p.m um we're going to do a dinner in future weeks for those who consistently cannot make fridays fyi and now let's turn our attention back to this library so very briefly last time we looked at this function get string we've been taking this for advantage uh taking it uh taking this for granted for some time now and underneath the hood recall that it's starting to it's actually been using some of these new fundamentals we looked at this notion of memory management dynamically allocating memory and just as a quick review so the function we introduced with which you can ask the operating system for memory is now called okay malloc for memory allocation and that memory ends up on stack or heap so it ends up on the heap and the intuitive explanation for that is that the stack memory constantly is coming and going it's disappearing every time a function finishes executing so it stands to reason that a function like getstring if you want whatever memory it's allocating to actually survive its return which you do otherwise what's the point in getting a string that memory needs to go onto the heap so henceforth and we'll see this more in future psets when you actually want memory dynamically in other words you don't know how much you need in advance and that's absolutely the case when you have no idea what the human is going to type until he or she does or if you want to dynamically grow or shrink something you're not just going to declare local variables anymore we're going to start using a function called malloc and it comes with an opposite function called free which is a function we'll start to need to call when you want to tell the operating system i am done with this memory in fact a little dirty secret of the cs50 library is that currently it leaks memory whereby every time you call getstring we call that function malloc asking the os for memory asking the os for memory but never once probably for your problem sets and even in lecture have we said to the operating system i'm done with this string you can have these bytes back and so that's actually a bug it's a bug that's been in the cs50 library from the beginning of the semester but it's meant to simplify the process so that we can just assume that we're getting a string but now as we begin to dismantle these training wheels anytime we actually ask the os for memory we're going to need to give it back and so we're going to stop using the getstring function since now we can appreciate perhaps that it's not quite doing us a huge number of favors now just as an aside what do we mean by what do we care about memory leaks for well if you've ever been running your mac or pc or ubuntu computer or whatever you have and you find that over time it's getting slower and slower and you're opening things like gchat or your browser or photoshop or whatever programs you typically run and you've been doing this a lot loading things into memory quitting loading well if any of those programs has one of these things called a memory leak where some programmer just screwed up and asked the os for memory but didn't necessarily hand it back what can happen in real terms is that your computer can slow down because your os is going to think that it's out of ram and so future programs are going to use something called virtual memory which for today's purposes is slower so if you're ever finding that your computer is inexplicably getting slower and slower and slower it might just be frankly that you need to reboot because there's some buggy programmer programs that haven't quite been playing nicely inside of your laptop so let's take a quick look at the cs50 library this is cs50.c we started looking at this last time and it's okay if you don't quite understand all of the intricacies of getstring but a function that is a little easier is something called get int and it turns out that get long get long long they're all pretty much equivalently implemented just for different data types and they all conveniently use getstring so let's assume for the moment that getstring works and it allocates memory for us and it ultimately hands us back a line of text that the user has typed in so here is the implementation of get ins that we've used as far back as week one in problem set one in c so i first have kind of a curious feature here i am deliberately inducing an infinite loop while true means do this forever so hopefully somewhere in this loop i have a break statement or a return statement something that's going to deliberately break me out of this infinite loop so let's see what i'm trying to do potentially infinitely so i have in the inside of this loop i'm going to try to get a line of text so this is a familiar call i'm now checking recall for line equals equals null so as an example when might getstring return this special keyword null yeah so if there was nothing typed in and we can simulate that as we've discussed with control d which normal humans aren't likely to type but there's another scenario that could very well happen and we'd get back null if too long a string right if they paste in a huge essay that would fully exhaust the computer's ram getstring in as much as it calls malloc might end up returning this special pointer value called null which essentially means something bad happened i don't know what necessarily but i can't give you back a string so we have to check for this otherwise we risk later in our program those things called seg faults and core dumps now why am i returning into max well this is just a convention in c there's kind of this problem fundamentally in c any time you're implementing a function that's supposed to return a number like an integer because you can return you know upwards of two billion positive numbers maybe as big as negative two billion but if you want to return an error you kind of have to arbitrarily say okay zero represents an error but then that's problematic because if you want your function like get int to be able to legitimately return a zero if the user typed in zero well you can't use zero as your special error value so you can say all right let's use negative one if the user uh if the if something goes wrong we'll return negative one by default but that two is problematic what if the user wants to type in negative one so the convention that most functions in c adopt is they don't choose these popular values like negative one zero and one they'll choose like two billion in one something that's crazy large and just shy of the maximum possible value and we define that typically with a constant so if we actually poked around a bunch of dot h files in the appliance we would see that someone had done sharp define int max and it's something like two billion and so we're kind of deciding uh yes we're sacrificing that number but it's less likely to be useful than zero or negative one or two so that's why we're returning in max just as a matter of convention now here's where things get interesting and here's where the training wheels now come off so this final chunk of code this big branch here is how getint ultimately works and it uses a function that you may have seen in readings or other examples called s-scanf which means string scan formatted so it's kind of like the opposite of printf printf obviously prints scanf reads in from the keyboard so if we didn't have the cs50 library you're seeing the hoops you would have needed to jump through in week one just to do something stupid like ask the user for a number so how does this actually work well notice first just for temporary variables i'm declaring an int n and a character c and i ultimately want to put the number that the user types in into n c i just need to check if the user screws up i need a character for the following reason here is the magical function s scanf takes a few arguments the first argument is the line of text that the user typed in so look at a string that the user typed in and recall from above we called getstring and we called the return value line so that's all this is this is literally the array of characters aka string that the user typed in so we're saying to scanf scan this string and try to extract an integer for us but how do we do that well much like printf which is used for output scanf does this for input we say in parent if we say in quote marks percent d which is a placeholder saying try to fill this placeholder with an actual integer that the user typed in that's inside of this string called line we have a white space character to the left and a white space character in the right to the right just to be friendly so that if the user accidentally hits the space bar then a number that's okay or they hit a number and then the space bar that's okay we're going to forgive that but we're not going to forgive typing any character other than a space if the user types in 42 space f or g or any letter of the alphabet or symbol on the keyboard what's going to happen is this scanf is also being told all right put the first number you see into this placeholder but put the first character you see alphabetical character punctuation character that you see in what variable apparently in c so this is just here as a bit of error detection we only want to fill in but just in case the user messes with us and we also happen to fill c we want to be able to detect that the user typed not just a number but also a number and a letter of some sort so we have to tell scanf where to put those variables and it is not correct certainly to do this because any time you pass a variable into a function how are you passing it what's the jargon so you're passing it by so-called value you're passing a copy of it in but if you want scanf scanf to be able to fill those variables with the some new values how do you do that well you have to write down that little scrap of paper the visual we keep doing where you have to say to a scanf put your answer on these two sheets of paper so here is ampersand d here is rather ampersand n here's ampersand c put your int here put your character here because then i the person who wrote this code is gonna check if scanf returns one that's perfect one the return value of scanf signifies how many pieces of paper that function filled with values so if it only filled one the ins called n great we're good to go we can immediately free the line the string that the user typed in and that's new today and then we immediately return that integer but if the user somehow did not cooperate did not just type a number like 42 and instead we get back from scanf the number two which signifies that they typed in some number plus some garbage character or if scanf returns zero because they didn't type in anything somehow the else condition is going to apply we're still going to free that line that string that we got from the user and then we're going to nag them retry retry retry so if you've ever kind of wondered how that retry is being spit out automatically for you if the user doesn't cooperate it's right there every one of these functions get int get float get double has that retry line and if you don't get there but instead the user cooperates you instead return the int or the float or the double or the like so those are the training wheels that are henceforth off questions if it uh so no it has a good question so if the user just typed a character would it not still return one no then it would return zero because when you have the format string percent d percent c that's telling scanf you have to try to put a number then a letter so if the user only puts a letter doesn't match that pattern so scanf returns zero so good good corner case yeah until it was true uh so no good qu good question too so while true is kind of a sort of vacuous truth it true is always true so while true just runs forever no matter what you are there's no way to break out of this loop unless you manually return from the inside of it now we don't have to do it this way in fact it's usually wrong to induce an infinite loop it usually means you messed up and you made some mistake but in this case we could have used a do while we could have used a for loop but in this case we decided as the staff we don't want to say we're going to let the user try a hundred times in which case we'd have a for loop with the number 100 hard-coded and we also noticed did not want to prompt the user initially and say saying something silly like give me an integer we wanted to leave that to you guys the ability to actually put that first printf the only printf we wanted to spit out was the retry so we just decided that really what we want really the construct that gets the job done is just do the following forever but we'll stop ourselves when we're ready so it's just a design decision so what if you type in a number good question so what if you type in a number and a character so scanf actually treats white space inside of that second argument special if you actually did that it would ignore it would still put number and then letter you don't need to have the white space there so that is a it's sort of um sort of a secret feature whereby this space is optional it would return two in that case if you typed in 4 2 f for instance you get the 42 and the f but that would still be wrong then why didn't all our programs crash why do your programs crash what do you mean why do they not crash ah good question so why do your programs not so much crash but why don't they hang and just kind of sit there waiting in perpetuity well remember getstring is actually the function that's first getting called and it's getstring that sits there with that blinking prompt waiting for input and so if the user doesn't ever type anything or even hit control d absolutely your program's just going to sit there and run infinitely long but getstring is actually what pauses this loop and waits with a blinking prompt for the user's input good question all right so scanf let's actually see this is how we might use this manually so this is an example it's among today's printouts and online this is an example it's called scan f1 and it demonstrates how you can sort of old school style get an integer from the user if you don't have the cs50 library and it actually is relatively simple if all you want to get is an int from the user of these exceptions of the user not cooperating aside but it gets a little dangerous if we're not trying to get ins but we're trying to get characters or strings because recall we've revealed that strings involve pointers pointers involve memory memory involves the risk at least of screwing up and inducing core dumps and seg faults so we're about to see that so here is a program whose purpose in life is to say to the user give me a number please then i have already declared a local variable called x and i'm passing it in by reference so to speak by pointer by address these are all synonymous phrases to a function called scanf and scanf reads directly from the keyboard scanf string scanf reads from a string that you already have in a variable as we did earlier scanf reads directly from the user's keyboard which is the goal right now so right now the user is being prompted for an integer and if the user provides an integer it's going to be stored inside of the variable x and it's going to be printed back out on the screen so let's try this let me go ahead and do make scanf1 alright it seemed to compile okay i'm going to go ahead and run scanf1 and i'm going to type in the number 42 thanks for the 42. it seems to work so let's try it again let's uh try uh the number let's say zero always try the interesting cases that seems to work let's try the number negative one that seems to work let's try the arbitrary word i latched on to this term monkey something went wrong here so what happened can we infer from this short program what the bug now is yeah okay good good thought so perhaps it's converting the int to the chars all right so other so maybe so actually no but let's see some other ideas but that's a good thought yeah yeah so it's actually it's a good thought but it is indeed this case whereby because as scanf did not detect an integer it instead detected a word a character m in particular it could not populate the placeholder and so nothing was put in x so its original value is unchanged and what is x's original value apparently well it's just who knows it's some garbage value and that garbage value at that moment in time happened to be 2719732 who knows what it might have been so this is another takeaway here too if there's ever a risk in a program where your variables might not be assigned some value it's actually a good habit just to detect such things to for instance initialize your variables to some known value maybe it's zero maybe it's negative one maybe it's in max but so that you yourself can check afterward what actually whether or not the value in there is legitimate so realize this is absolutely an option and so if there's ever a risk again of your variables not getting initialized you might want to pre-initialize it yourself to what we keep calling a sentinel value just some special number or special constant well this is relatively straightforward for ins but let's look at version two here at trying to get a string let's just continue this logic so we've taken the training wheels off i have removed the cs50 library from my appliance and so i cannot anymore say include cs50.h and so i instead if i want to declare a string i have to go back to the old-fashioned way of saying char star so char star buffer is going to represent my string and i'm calling this a buffer deliberately in my mind a string really is an array and an array is just a chunk of memory and computer scientists would typically call a chunk of memory a buffer something into which you can read values characters numbers whatever so buffer typically means an array of memory but notice here that really what i've declared is char star buffer so just to be proactive here how much space have i actually allocated with this first line of code here many bytes or how many bits have just been allocated by char star buffer so it's pretty small right how big is a char star is what the question reduces to it's probably it's generally 32 bits right so any time we have a pointer we've claimed at least for the appliance and for slightly older computers a pointer is always 32 bits so what does that mean char star buffer all i've allocated is 32 bits and what's supposed to go inside of those bits not a string that's not really a buffer that's kind of a misnomer at this point in the story because really i only have a pointer and i'm not supposed to put characters in pointers i'm supposed to put memory addresses in pointers so even if this still feels a little abstract at least take away from this that something here is wrong and as an aside just to tie this together if you're pretty computer savvy and you've known it for some time that for instance your pc can only have two gigabytes maximally of ram you might be generally aware of these restrictions well two gigabytes is actually a result of having a 32 bit cpu intel inside if it's 32 bits the biggest possible number you can express is 2 to the 32 which is 4 billion but for technical reasons that's actually half so you have 31 bits that you can use to address your ram so long story short you can't have more than 2 gigabytes of memory in some computer because you don't have numbers big enough to actually say put this here put this here you kind of can't count that high even if you had 10 gigabytes of memory so another reason that having a 64-bit computer these days as most of you now probably do is actually a very good thing so the um c doesn't have a native stream class but does it have so the printf has a native uh like format code correct so pr c does not have a native string class like something like java does but it does know about strings in the sense of printf but it really treats them as arrays of characters and it stops when it sees backslash zero so it's very primitive in that sense so what do i say i say to the user string please and then i use scanf percent s buffer so i'm saying to scanf take whatever the user just typed in at his or her keyboard and then put it where at that memory address now what is that memory address well what is the value of buffer at this point in the story it's it's a garbage value right the answer to this question is always going to be it's just some unknown garbage value if i and none of my code has not initialized it to anything explicit so what you're really saying is you're allocating only 32 bits for a a pointer to a character and that's got the number like ox1234 you are then handing this slip of paper to scanif and saying put whatever the user types here well what's there well you have no idea it could be zero it could be one two three four five it could be some part of memory that you don't even have access to so scanf is going to erroneously put the string there which typically induces one of those seg faults so let's see if we can confirm this so let me go ahead and make scan f2 and always keep in mind that some of these errors are not detectable because you get lucky and you actually touch memory that is yours but you really shouldn't be using it now we've configured make in such a way that you can't even compile this code because we are proactively checking wait a minute buffer is uninitialized in this function and so there's an error won't let me even compile but let me see if i can manually override this rather than type make i'm instead going to do gcc dash standards equals c99 this is just the version of c we're using and i'm going to skip the dash w error which is the guy that's making make be so picky scan f2 dot c dash o scan f2 enter so now i'm going to run scan f2 so it compiled even though i know there's a bug in this program let's go ahead and run it string please monkey enter segmentation fault so what's a possible solution here well i can at least initialize this to some null known value when the convention is typically null let me go ahead and recompile this but notice even if i do this this is no better it's still buggy but at least now printf is detecting as much in this case so still a bug but at least now it's obvious to you the human the programmer wait a minute this is clearly not what i expected i'm doing something wrong so what's the solution perhaps to this what do how do we fix this problem of buffer being some unknown value you know can i do something a little crazy like well null is obviously bad i know that much well let's put it at this address i keep quoting as popular well we could do that but what you're really saying now you're just arbitrarily saying put what the user types over there and you have no idea where there is so what function can we call that gives me a memory address of a legitimate chunk of memory all right so the solution here is malloc so we could try this so give me a string that's of size oh the user's not going to type in a word that's more than like 10 characters so i'm going to hard code 10. now this should actually work because malloc assuming there's ram left in the computer is going to give me a pointer and it's going to say put this word that the user types here in memory and that address is now stored in buffer so scanf will put it there now let me go ahead and try compiling this let me recompile it with gcc oh implicit declaration of function malloc so we've not had to do this manually yet because the library the cs50 library is usually doing this for us but there's another header that's popular standard lib standard library.h and that should make gcc happy now not knowing about malloc and indeed it did so scanf2 and go ahead and type monkey nice thanks for the monkey all right but wait a minute uh monkey monkey monkey enter interesting so that kind of worked but monkey monkey monkey monkey with no spaces interesting but what's happening here actually is that we are getting lucky let me see if i can make us unlucky by doing this ad nauseum let's see but that doesn't work so let's do paste let's do paste let's do paste let's do zoom out let's do paste now obviously a user is not typically going to do something like this but imagine it's actually you know a form field on a web page where it's still working so i'm getting a little bored copying and pasting but take my word for it today that if we did this long enough we would traverse one of those segmentation barriers where right now we're within it and we're just getting lucky but we're gonna cross over it at some point and in fact it's going to crash on us again segmentation fall so how do you fix this well this is actually harder to fix and now the is the cs50 libraries get string function motivated recall we looked at it briefly on monday and anytime you call getstring how many characters does it get at a time well recall used a new function get character get char and it just got one at a time one at a time it's incredibly paranoid the get string function that we wrote so that it only slowly looks at what the user's typing in and only if it realizes wait a minute you just typed in 11 characters but i've only allocated 10 bytes what is it going to do well recall we saw briefly the re-alloc function which is like a cousin of malloc and reallock as its name suggests takes the 10 bytes that you might have already allocated with malloc and doubles it or triples it and we repeat this process so why was getstring relatively complex compared to this for this simple reason there are so many programs to this day written out there where the you the programmer has made a arbitrary and ultimately dangerous decision to say no one's going to have a name longer than 16 characters or a thousand characters but these are precisely the opportunities that bad guys look for trying to crash programs because again we saw on monday this opportunity for a buffer overflow exploit which essentially means typing something a little more sophisticated than monkey monkey monkey but rather code that you want to execute and you can trick the computer into overflowing this buffer and executing your adversarial code yeah the word string does not exist it's in cs50.h good good question coincidence so percent s is part of c it's part of printf and percent so the word string exists in programmer's vocabulary but the data type string does not exist in c so percent s denotes char star really good question so i'm contradicting myself here right in the previous example with s with scan f1 recall that i did this i put it passed in ampersand x to get the address of x but we can kind of answer this just with our own jargon what is buffer already it's a memory address so i don't need to use ampersand because i already have the answer to the question the question is going to be where do you want me to put the user's input well put it at that address and the fundamental difference here is that in the previous example we allocated an int it was on the uh stack as a local variable there was no malloc involved but as soon as you involve malloc what you're literally getting is that address so we don't need to use the ampersand in this case and realize there's one other way we can create a buffer that's just as dangerous as hard coding a length a very common approach in a program is to say something like char buffer bracket 16. so char buffer 16 doesn't feel like a memory address really but it is in fact an array of characters and what though is an array well an array really is just the address of the first chunk of memory so what is this really doing this also is allocating not 10 but 16 bytes this time it's then passing the word of the name of the array buffer to scanf and think back now to pset 3 when you implemented sort or search remember that you could pass in an array as an argument to a function and you didn't use the double brackets you instead just wrote the array's name that's because you can pass an array by its name by its address and so scanf here would use that 16 byte buffer to put the user's input but what's going to happen if the user types in 17 characters just by nature by definition you're going to go beyond the boundaries of that array and notice too in c scanf and your own code has no idea how big the original array is it has no idea how many bytes you asked malloc for it is entirely up to you the programmer to remember how many bytes you asked for or how many bytes you hard coded in the array and so again this is the pri one of the primary reasons that so much code written in c and c plus plus and even in some modern languages is in fact exploitable because of these kinds of dangers and if you don't believe that too realize that these languages that you might already know a little bit we certainly will by semester zen like javascript and php and python and ruby a lot of the times the programs called interpreters that you use to use those language they're written in c themselves so you might be writing php code but it's being executed by a c program so if that c program is buggy your php code can be vulnerable as well a good question so if the buffer is an address why do we you not say star buffer as we did in our swap function on monday so the reason again boils down to the fundamental question we're trying to answer here the question at hand for scanf is where do you want me to put the user's input the answer to that question must be an address but we already have an address malloc gave us an address so the simple answer to why we need no star and no ampersand here is because buffer is already an address because in this case we called malloc and as we're as i'm disclosing today uh the name of an array can be treated as though it is a pointer as well an address the only time we use the star is when we want to go there scanf will do star buffer we do not uh 16 characters 16 of whatever the data type is in green uh no it's still going to be 16 bytes a char is one byte it's eight bits so in this case we would get literally 16 bytes or 16 chars if it instead were int buffer 16 then we would get 64 because it'd be four bytes per integer all right all right so recall then with the danger that this leads to right we saw this picture and we kind of lambasted this design because you have this dangerous pointer called the return address in red and that was simply the address of what what was this red return address used for it tells a function what this is the return address it's literally return address it tells a function where it should return control of the computer to once it's done doing its thing so if i'm the main function and i call the foo function what i'm essentially doing conceptually is i main i'm going to say i am address one two three four five you hand this piece of paper to foo foo keeps it around in this red slice of memory and as soon as foo is done executing it checks where did maine tell me to go back to one two three four let me hand control to the cpu back to the address that was on this piece of paper but the problem is that if foo has on a buffer say 16 bytes or in this case 12 bytes and the user types in not hello but something much longer than that where does the space go it goes from top left to bottom and so you run the risk ultimately of overwriting this with the address of some bad guys code and unfortunately even though the simple solution might be to just say all right well don't write hello from top down write it from bottom up turns out that only makes the problem a little harder for the bad guys but the problem is that you can end up tricking future functions that gets called into exploiting code so there's actually not a simple fix for this and again this remains one of the most common ways of exploiting a program let's just peel back the layer of one other thing with regard to pointers so this here is a program called pointers.c it's among our print our source code from today already notice that i'm using a few header files up here using a few libraries just because i wanted to resort to the cs50 library for this and now notice the one new habit i'm getting into is any time i call getstring i now need to say if the return value equals equals null something went wrong i should yell at the user i should return i should exit so i'm now checking that value now why can getstring return null because it uses malloc and malloc can return null all right so notice this trick though we have previously printed strings and previously the syntax had been this right this should probably remind you a little bit of week one week two if you want to print a string that the user's typed in should remind you of pset two the caesar cipher and vigenere well i can print each character percent c one at a time and then i can print that character by way of s the name of the string bracket i so comfort with this hopefully so turns out that all this time these square brackets or what we would generally call syntactic sugar it's just a nicer prettier way of doing something that at the end of the day is actually more sophisticated this code here that i just wrote is equivalent to s bracket one so let's go back to the fundamentals first of all what is s well s we call string but really as of this week what is s in more technical terms it's an address right it is the address in ram at which that string characters live from left to right so star s recall means go there and if you go to that address you're going to see the letter m and then o and then n and then k if the word is monkey right if you go to that address you're going to see those characters but you don't want to go to the same address every time you want to go to the start of the string which is identified by the name s but each time you iterate in this loop how many steps to the right in memory do you want to look well i right one more one more one more so if inside of your loop you take this address s and you add i to it well the first time this loop goes through i is initialized to what apparently zero so s plus zero is s so what are you gonna print first you're gonna print star s which means go to that address and print out if the word is monkey the letter m if you then take that same address and do plus one on your second iteration that's not address one two three four that's like one two three five and what letter presumably is at that location so o so star of that summation means print the o print the n print the k and so forth and because we already checked in advance the length of s with this helpful function string length we're not going to crash we're only going to step over the characters one at a time and then we're going to stop but just realize all this time even as far back as problem set 2 and caesar we've been using pointers we've been using memory addresses we've been walking through your computer's ram but we did it in a more user-friendly way with s bracket i but really you've been using a feature called pointer arithmetic taking an address and doing some mathematical arithmetic on it plus one minus one so realize all we've been doing is the same topic all this time yeah really good question so if we were instead iterating not over characters which are one byte typically but instead over ins would we have to do plus four times i so that we go four bytes four bytes four bytes short answer no the reason this feature has its own name pointer arithmetic is because the compiler will figure out that when you say s plus i if s is actually a char star it's gonna do literally plus one if instead though s is an int star well an n star points to an int by definition ants on this machine are four bytes so plus one is actually going to be implicitly converted to plus four then plus eight plus twelve plus sixteen so that's what's really cool about pointer arithmetic you don't even have to think about those details so your code will work on old machines new machines because the compiler will figure this out for you really good catch all right yeah good question is this more computationally efficient than using square brackets no the compiler will actually effectively turn your square brackets into this so when your code's running you will notice no difference back in the day you might notice a compilation difference but these days on a two gigahertz computer compiling caesar is instantaneous anyway so it's a non-issue in modern times all right so that was a lot let's go ahead and take our five-minute musical break here all right so we are back really good news no problem set next week i know there it goes um so yeah so quiz 0 is next wednesday there's no lecture on monday because it's a holiday for the universities quiz zero is on wednesday we will announce via email on the course's website where to go next wednesday we're going to try to book enough classrooms so that we have writing services for everyone so we most likely will not be here so again don't show up before checking your email there will be a review session this sunday at 7 p.m in northwest science same time same place as the walk-throughs usually are this will be a course-wide review it will be filmed put online by the next day uh to cover uh really the past six weeks of material and particularly field questions from you um we'll also have office hours next week on monday and tuesday night they most likely will not be in the dining halls they'll instead be in a classroom where we can use a whiteboard and they'll be totally casual and an opportunity to get some last minute questions uh answered no two that there's four years worth of old quizzes on the course's website so the best guidance for to get a sense of what past quizzes have been like is to go there and you'll see not just the questions but also the answers do just realize that the course evolves over time for instance in o7 we had three quizzes thereafter it was just two but the material changes so realize that if you have no idea how to answer some question on the quiz that's either because you zoned out at some point this semester or we just never talked about it this semester so look ultimately to the syllabus and to the lecture slides and scribe notes in particular for guidance and so if you've not realized this it's always fascinating at the end of the semester in the queue guide to read that people are unaware to these scribe notes so we have a wonderful teaching fellow who actually trans uh summarizes what goes on in class each day typically with snarky uh little footnotes which you might enjoy and so this is meant to be an authoritative set of notes in lieu of your own potentially if you'd rather not so much scribble things down oh perfect very apropos so realize these two are meant to be a very good guidance through the course and i would strongly urge too when you do show up at office hours and or the course review session honestly you'll be doing yourself a service if you spend at least a little bit of time this weekend taking a past quiz so that you're not being hit with material for the first time you can rather make better use of your time and ask questions only about the stuff you are forgetting or struggling with um also we put online these things we have been transcribing all of the course's lectures as promised so that now when you visit past lectures videos you will see not just the course's video on the left you will also see every word that came out of my mouth for better or for worse and you will be able to hit play and just to give you a sense of what you two will be able to do by semester's end with a little javascript you can even read what i'm saying in real time as it highlights as the words come out of my mouth and even subtitle it if you would prefer to watch in that fashion so this also means two more compellingly that you can search control f and so forth looking for topics like pointers looking for arrays things that you might have struggled with you can actually find that point in the lecture scroll down to that spot click on a specific sentence and the lecture will immediately jump to that point in the class so realize that's there and we will finish by the weekends today's and monday's lecture so that you have access to those online before the quiz so i also got curious as to what words do actually come out of my mouth um and so i uploaded them to a nice free tool online that creates a visualization of the words that you've pasted in and the bigger the word the bigger the font the more times i said it the smaller the word or if it's not even there the fewer times i said it and this for instance was this year's very first lecture um it's kind of curious i say the word just a lot actually a lot course that makes sense cs50 is a little smaller there i was worried i was saying facebook too much the first week but that's actually pretty small at the top right so that was reassuring i then fast forwarded a few weeks and some themes definitely popped out uh just again in ghana so i didn't realize i sound so intellectual in class um and then i looked at yet another week and like just is the theme so now i'm never going to be able to say this word without kind of tripping over myself but apparently that is the most popular takeaway or word that i say in cs50 so um hi so one exciting initiative that the university across all of harvard schools has been working on you may have read about it at some point in the crimson is the harvard innovation lab this is a beautiful new space across the river right next to hbs that has a few floors to it the top two are being used by hvs classes the bottom floor is meant to be an innovation space for undergraduates gsas students hbs students students from all across the university who have entrepreneurial ideas who have projects they want to work on collaboratively with friends and they just need space to work and they want to be around other smart people doing interesting things technical people so as to ask and to answer questions and so just to give you a sense of the space it has literally just opened in the past few days when you walk across the river you'll see a building like this you go in it's very modern and high-tech concrete floors nice lighting funky seating and so forth and a lot a lot a lot of work space what we thought we'd do for fun even though it is across the river so it's a little farther than leverit and quincy and and foho and lowell is for just one week not next week but the week after for problem set five is we've been cordially invited as a class to spend office hours there monday tuesday wednesday thursday pizza and soda will be served throughout the evening the shuttles will run back and forth between campus and it's actually not that far a walk but just emotionally we'll make sure that you can hop on a shuttle and to actually get there but it should be a fun cs50 field trip to see a couple of hundred of us working on pset five in the harvard innovation lab as its inaugural class so more on that i'm sure over time so we set the stage on monday for forensics and focusing on a problem domain albeit using these same fundamentals of recovering information or covering your tracks when trying to get rid of information to begin to discuss how information is actually stored on something like a hard drive we actually need to be able to represent the data on a hard drive in actual programmatic terms so just as a flashback you might recall from week zero what is actually inside of a hard drive we'll just watch the first part of this for a few seconds recall that this was the story is where your pc stores most of its permanent data to do that the data travels from ram along with software signals that tell the hard drive how to store that data the hard drive circuits translate those signals into voltage fluctuations these in turn control the hard drives moving parts some of the few moving parts left in the modern computer some of the signals control a motor which spins metal coated platters your data is actually stored on these platters other signals move the read write heads to read or write data on the platters this machinery is so precise that a human hair couldn't even pass between the heads and spinning platters yet it all works at terrific speeds so the film recall goes on to just discuss um this is going to be a little messing with your mind if we could focus the camera on the board here for a moment so you'll recall that the video went on perhaps to show those little blue and red magnetic particles and to reduce the problem of storing information on a hard drive to the orientation of magnetic particles either north south or south north thereby representing one or zero respectively some system like that well what's really going on in a computer's hard drive is obviously got to be higher level than that there's got to be some notion of file name some notion of file sizes some notion of folders in which things are so hard drives are not just zeros and ones rather they actually have some metadata stored on them and metadata means data but it's useful for other data you really care about so let me just draw for instance one of those platters so this is a metal disk that's inside of a typical hard drive and it spins around generally it's thousands of times per minute and all along here are zeros and ones in some orientation now what's really going on there is that clusters of these zeros and ones represent more interesting things they represent your actual files like your mp3s and your movies but also things like the file name and where things are so it turns out even though this might represent some movie you've downloaded from itunes there's a special part of the hard drive reserved for a table sort of an excel spreadsheet of sorts that has to oversimplify two columns on the left is the name of the file and on the right is the address of the file so just like we can say that your ram can be addressed from byte 0 1 2 3 4 on up similarly can your hard drive even though it's a circle be described in the same way this is byte zero this is byte one two three four and some system along those lines so how does your computer remember where data on your hard drive is it uses this little cheat sheet so when you create or save or download a file there's this table in your operating system's memory that says okay you just downloaded uh movie.mov some file name like that and so the name gets written in the left column in the right column gets written the address of say the first byte of that movie now hard drives are actually pretty fancy and so you can get what's called disk fragmentation if you've got big files they might not necessarily all end up here you might get part of your movie here or here or here or here so if you've ever heard this term defragment your hard drive it's referring to the fact that your file might be spread out so it's not sufficient just to remember the starting address of a file turns out there's usually a list of addresses part one is here part two is here and so forth but now into the world of forensics what happens when you actually drag a file to the recycle bin or to the trashcan on on mac os or windows so you probably figured this much out right like what happens when you just drag it to that that special icon yeah and actually not even there in fact we can rewind further nothing happens right if you drag a file something sketchy or private that you want to
Original Description
CS50 Library. Heap. Pointers, continued. Forensics.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from CS50 · CS50 · 54 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
▶
55
56
57
58
59
60
Hello, World: Hadi Partovi
CS50
Content Distribution and Archival in a Digital Age
CS50
CS50 2014 - Week 1
CS50
CS50 2014 - Week 3
CS50
CS50 2014 - Week 0, continued
CS50
CS50 2014 - Week 4
CS50
Week 3, continued
CS50
Quiz 0 Review
CS50
CS50 2014 - Week 3, continued
CS50
CS50 2014 - Week 7
CS50
CS50 2014 - Week 7, continued
CS50
Breaking Through The (Google) Glass Ceiling by Christopher Bartholomew
CS50
Introduction to Amazon Web Services by Leo Zhadanovsky
CS50
CS50 2014 - Week 9
CS50
How to Build Innovative Technologies by Abby Fichtner
CS50
Light Your World (with Hue Bulbs) by Dan Bradley
CS50
Building Dynamic Web Apps with Laravel by Eric Ouyang
CS50
CS50 2014 - CS50 Lecture by Steve Ballmer
CS50
CS50 2014 - Week 10
CS50
This is CS50 with Steve Ballmer?
CS50
Meteor: a better way to build apps by Roger Zurawicki
CS50
Data Analysis in R by Dustin Tran
CS50
Data Visualization and D3 by David Chouinard
CS50
CS50 2014 - Week 6
CS50
Build Tomorrow's Library by Jeffrey Licht
CS50
CS50 2014 - Week 9, continued
CS50
Essential Scale-Out Computing by James Cuff
CS50
iOS App Development with Swift by Dan Armendariz
CS50
Sam Clark Leads Yale Students on Tour to CS50 at Harvard
CS50
3D Modeling and Manufacture by Ansel Duff
CS50
CS50 2014 - Week 5, continued
CS50
hello, world
CS50
CS50 2014 - Deep Thoughts - Hash Table
CS50
CS50 2014 - Deep Thoughts - Binary Tree
CS50
CS50 2014 - Deep Thoughts - Scratch
CS50
CS50 2014 - Deep Thoughts - MySQL
CS50
LaunchCode Visits CS50
CS50
CS50 Live, Episode 100
CS50
CS50 Field Trip to Google
CS50
This is CS50 AP
CS50
Week 4: Monday - CS50 2011 - Harvard University
CS50
Week 2: Wednesday - CS50 2011 - Harvard University
CS50
Week 1: Wednesday - CS50 2011 - Harvard University
CS50
Week 11: Monday - CS50 2011 - Harvard University
CS50
Week 3: Wednesday - CS50 2011 - Harvard University
CS50
Week 12: Monday - CS50 2011 - Harvard University
CS50
Week 1: Friday - CS50 2011 - Harvard University
CS50
Week 3: Monday - CS50 2011 - Harvard University
CS50
Week 10: Wednesday - CS50 2011 - Harvard University
CS50
Week 2: Monday - CS50 2011 - Harvard University
CS50
Week 9: Monday - CS50 2011 - Harvard University
CS50
Week 7: Monday - CS50 2011 - Harvard University
CS50
Week 5: Monday - CS50 2011 - Harvard University
CS50
Week 5: Wednesday - CS50 2011 - Harvard University
CS50
Week 7: Wednesday - CS50 2011 - Harvard University
CS50
Week 8: Monday - CS50 2011 - Harvard University
CS50
Week 9: Wednesday - CS50 2011 - Harvard University
CS50
Week 8: Wednesday - CS50 2011 - Harvard University
CS50
Week 10: Monday - CS50 2011 - Harvard University
CS50
Week 2: Wednesday - CS50 2010 - Harvard University
CS50
🎓
Tutor Explanation
DeepCamp AI