CS50 SQL - Lecture 2 - Designing

CS50 · Beginner ·🔍 RAG & Vector Search ·2y ago

Key Takeaways

Introduces the basics of SQL and database design

Full Transcript

foreign [Music] well hello one and all and welcome back to csrt's introduction to databases with SQL my name is Carter zinke and last we left off we learned about relating that is how to have multiple tables in our database for people places and things and how to have them relate to one another and then we might do in the real world now today we'll take a step forward and we'll talk about how to put you in the driver's seat designing your very own database schemas to organize your data now we'll pick up where we last left off which is with this database of books so we had this database it was full of books that have been long listed for the international Booker prize to be long listed means to be nominated for some prize let's say so we had the past five years of books in this database and we worked on improving this database over time from week zero to week one now this week we'll actually take a look underneath the hood and see what commands we had used to create these varied databases so let's reveal now what we had done all along I'll go back to my terminal here and if you remember I could use a command to open up a database file which was this command here sqlite 3 and then the name of the file I want to open so let's try this I'll go back to week zero to my week zero folder like this and I'll open up my long list database week zero slash long list dot DB okay I'll hit enter now I'm in my sqlite prompt so I could work on typing some SQL commands SQL statements or queries inside of this terminal prompt here so if I want to get a feel for what's inside this database we saw I could use a command called select so I'll select now some rows from this table I'll say select let's say the title and also the author columns from my long list table semicolon and enter now I'll see all the titles and all the authors that were inside of this long list table but I only want to peek here I only want to see roughly what kind of data is inside so I could probably improve this command here I could instead say select the author and title columns from long list from long list but limit now to the first five rows we saw before semicolon and I'll hit enter on this query and now I'll see only the top five rows so I'm able now to see what what kind of data is inside my database but what I can't yet see is what command was used to create this table and what kind of data could be stored inside of it so let's reveal now what was going on underneath the hood all this time I'll say this new command a sqlite command not a SQL keyword but sqlite command called dot schema dot schema now if I hit enter I'll see the following the command the statement the query that was used to create this long list table and notice how I have many columns inside this table I have an ISBN column a title column an author column and so on and each column seems to have some kind of data that could be stored inside of it like text or integers or real for floating Point values or decimals if you're familiar so this is how we created the very first version of longlist.db but let's also see how we created the second swall type dot quit to leave this version of longlist.db and now let me open up the next version we had created which is more relational it had tables inside of it that could relate to one another so let me type sqlite3 then long lit or the sqlite3 and go to week one and then type long list dot DB I'll hit enter now in my next version of longlist.db well what could I do I could type select and look at some of the tables in here to see what kind of data is inside I could perhaps say select maybe the names from the authors table here from the authors table and hit semicolon now I'll see all the names of authors that are inside of the author's table I could do the same thing for books I could maybe look at the titles of books I could say Select Title from the books table semicolon and enter now I see all of the titles that are inside my books table but what I haven't seen yet is the schema of this database it is the way it is organized and the commands that were used to create these tables so let me work on that now I'll clear my terminal using Ctrl L and now let me type dot schema again I'll type dot schema to see what commands were used to create this database hit enter and I can see I think there are quite a lot of commands here now this goes overwhelming I mean I'd be right there with you right this is a lot of commands to parse through and read so there's probably a better way to do this and one way to try is to type dot schema and then give it some table name let's say I want to understand the SEMA for just the books table like just that for now so I'll say dot schema and then the table name books in this case then I can hit enter now I'll see the schema the organization the command that we use to create the books table in this case and notice again we have several columns ID ISBN title publisher ID and so on each one has their own kind of data they could support or take into this column like integers text and so on so again what we'll do today is have you all learn how to write your very own create table commands to build your very own databases that represent what you want to represent in the real world so let me exit this prompt here and let me propose that we'll have a bit of a design challenge today to actually try to represent a real world entity with some database here and if you're not already familiar Boston is perhaps famous for being among the first cities to have a subway system in the United States so here is a picture from the late 1800s of a Subway being built in Boston's city streets underneath the streets here there'll be trolley cars would go and transfer people across Boston here's another picture of a trolley actually working underneath the streets so people would go down underneath they would hop on a trolley they'd be able to go to different parts of Boston perhaps from Harvard to the MIT or downtown up to let's say Braintree or down to Braintree which is more south of Boston for example one of the famous stops is the Park Street stop which is right down in the middle of Boston one of the essential hubs of this subway system and now these photos are all from let's say the early 1900s late 1800s and so on but the Subway's gotten a lot more modern sense then and actually now we have several lines that span the entire city and Beyond so here we have the red line of which Harvard and MIT are apart we have the green line which brings you kind of west to east the Blue Line the orange line and so on so many more lines and stations have been added to this system it's a big design challenge to represent all of these stations all of these lines and all these people who might ride this Subway too so the question then becomes how can we create a schema for this data and again by schema we mean what kinds of tables should we have what kinds of columns might those tables have and what kind of data should we put in each of those columns for instance so let me propose this let's say we start just first with names and stations so Charlie here our very first writer on this system is going to be at the Kendall and MIT station so this is what this table represents now but what more could we have well we might also want to have maybe what Charlie is doing at that station maybe he's entering the station for instance and if you're familiar with subway system you often have to pay to get on to a train or get onto the station itself so let's say Charlie pays some fare to enter into the Kendall MIT station well back in the mid 1900s the fare was only about a dime it was 10 cents so we'll say Charlie paid 10 cents to enter the Kendall MIT station and now this seems pretty good but if I am the transit authority the person who runs the subway system I probably want to know does Charlie have enough money to get on the train and if so I want to make sure that okay well Charlie actually could get on this train so let's say not only does Charlie pay Affair he has some remaining balance afterwards so Charlie here has gotten onto the Kendall MIT stop he's paid the fare of 10 cents and has five cents left okay so here's a bit of a table we'd probably add more information to it let's say Charlie then leaves at the Jamaica Plain stop and the fair to leave is about a nickel five cents and now Charlie has no sense left over so again Charlie paid 10 cents to get on had five cents left paid five cents to get off and now has no remaining balance here anymore okay so that's Charlie's story let's look at Alice though let's say Alice gets on at the Harvard stop they too pay 10 cents to get on at the Harvard stop and they have a remaining balance of 20 cents Alice will go let's say to Park Street get off at Park Street pay the nickel to leave and now they'll have a balance of 15 cents at the end let's go to Bob Bob enters the ale wife station they pay 10 cents they have a remaining balance of 30 cents and let's say they leave at Park Street and have a fare of 10 cents to leave because it's a further distance now they'll have a raining balance of 20 cents overall so this table is okay I might admit I mean last time we learned about having what we called primary keys and foreign keys so seems like that's missing here let's go ahead and add that here I'll give each row a unique ID so I can know who entered who exited and so on give that a unique ID here but I might even say that this table could really be improved substantially and I want to ask you what redundancies or inefficiencies do you see here if we're trying to represent Riders and stations what could we improve about this design look so uh probably the redundancy will be the names and the stations too for example the Italian will go to the train building then it will become most of the names in the data yeah good point if I'm hearing what you're saying Lauren let me show you some examples that I highlighted here you know one example could be to your point about these names like these names seem to be telling us the name of a person but here we have only three names Charlie Alice and Bob well my name is Carter and what if somebody else named Carter also tried to get on and leave at some stop well I wouldn't be able to know which Carter was which or which Charlie was which which Alice was which and so on so you probably need a way to represent people and their names a little better here too what other ideas do we have for how to improve the design of this table yes I think we can have a singular ID for a singular person that way will be better able to track their activities nice so we probably have an ID for each person a bit of what we learned about last week right putting people in their own table and giving them their own unique ID a primary key let's show that here I'll go to some slides and I'll pick out one that shows us just Riders so to your point so Kanye we could try to have maybe a table for just Riders and maybe simplify this table has only two columns it has a column for ID and a column for name so here we have Charlie Alice and Bob all in their own table well we approach to we could do the same thing for stations like let's say we have a table of stations now and we give each one their very own ID as well our own primary key for this table we have Harvard Kendall and Park Street we can differentiate between them using their IDs here so a few improvements could be made and as we're making these improvements splitting one table into many and debating what kind of data to store in each table the process we're going through is one called normalizing We're normalizing our data to normalize means to reduce redundancies effectively to take a table take one table for instance split up into multiple and have each entity be part of its very own table some academics in the world have named different normal forms quote unquote there's like first normal form second normal form third normal form this progression of making your data more and more efficient you can use those as heuristics at the end of the day a few principles might apply first take your entities like in this case stations and Riders and put them each in their own table and if you add more data make sure that if I were to add a column let's say to writers it is only a data point about Riders not about stations not about fares only about Riders and that process can help us make a data set that is more Dynamic more easy to reproduce and more easy to write queries on so that is the process here of normalizing okay so if we have now Riders and stations we want to represent them in our table well we could use what we learned about relating last week to ask how could we actually represent these Riders and these stations so let's say here again have Riders and stations I want to make sure that I have the right relationship between them well if you're familiar with Subways we might say that a rider goes to one station in this big T here is the symbol for a station here in Boston for the t's that we call it for the subway so a writer might go to one station but of course that might not be the full picture a writer also gets off at some station later on so a rider could be associated with not just one station but multiple and if you're familiar at least with any subway system really the Boston one two it can often get pretty busy and so Riders might not just go to of course one station or two stations could also have multiple riders that are on a particular station here so to recap one Rider might be associated with more than one station they might get on at this first one and get off at this later one but each station could presumably have more than one Rider each station here could have Rider a or ride or B the rider up here or the writer down below and even many more than that as well so to put it in the language of our ER diagrams our entity relation diagrams from last week we could look at a bit like this where we have Riders and stations Riders visit stations and they're Associated like this a rider must have at least one station associated with them that's what this horizontal line means if they aren't attestation they're really a rider right a rider though could have many stations associated with them that's what this three prong down here means they could have one two three four they could have many stations they could on and get off of now a station could have anywhere between zero Riders if it's maybe out of commission or isn't very popular upwards to many it could have two three four five even hundreds of riders associated with this particular station so here is our entity relation diagram for these particular Riders and these stations here so let me ask what questions do we have on these relationships between Riders and stations how to design this table so far then I want to ask that you have used the save ID for stations and Riders so that maybe give us problem in coding yeah a good observation so you might have noticed that in the writer's table and in the stations table I gave the same kind of ID like I had one two three for each of them and let me just show you that again here I'll come back to some slides and I'll show you again the writers table where we had Charlie Alice and Bob ID one two three same for the stations we had stations Harvard Kendall Park Street ID one two three and to your question isn't that a problem well I would argue in this case it's not so long as we keep clear that these IDs are for stations and these IDs are for Riders and we'll see how to do that using our SQL keywords later on but again so long as we have an ID just for our Riders and an ID just for our stations we can keep these separate even if they might have the same values but a great question here let's take just one more uh regarding the entity relationship diagram how is it possible for stations to have a possibility of zero Riders but writers must compulsively have at least one station yeah a good question so this might be up to you and how you formulate it but for me let me show the diagram again I'll go back to over here uh in my mind to be a writer you have to visit a station if you aren't visiting a station you aren't really a writer right now presumably there are stations that were built but aren't really being used right now weren't really uh in service yet that could be a station that has no visitors so you could argue let's make sure every station has at least one Rider and every Rider may or may not have to visit a station for that I would say let's kind of we could probably reasonably disagree there and talk about how we could represent the diagram here too but a great observation and a good other critique of this this system here all right so let's now turn to representing this in our database I'll go back to my computer and we'll learn about this new SQL keyword SQL statement this one called create table create table allows us to as the name suggests create a brand new table so let's do just that in our new database to represent Riders and stations I'll go into my terminal and I want to make a brand new database I'll call this one mbta.db because MBTA stands for the Massachusetts Bay Transportation Authority the people who run the subway essentially so I'll do sqlite 3 mbta.db hit enter and I'll type y to say yes I want to create this brand new database okay now if I type dot schema I see nothing but that's expected I don't have any tables yet I have nothing inside this database it's up to me to create these tables myself so what I'll do is clear my terminal and let's start first with riders I might create a table for writers I'll say create table and now I have to give that table some name I might call it Riders here and then in parentheses like this I can specify what columns should be part of this table so let's start first I'll hit enter here and continue this query now all by convention just indent four spaces one two three four and I'll give an ID to each of these Riders as we saw before I'll say ID here is one of my columns Now to create a new column I'll follow this up with a comma and hit enter all again by convention for Style just in Dent four spaces and what's my next column perhaps a name for these writers I'll give this column the name name and I'll leave it at that once I'm done adding columns I no longer need to have a comma I could simply close out this query this statement I could hit enter here say close it parentheses to close the top parentheses here semicolon hit enter and now nothing seems to happen but that's actually a good sign so let me type dot schema hit enter and I'll see the result of that statement before great table if it doesn't already exist writers and writers has inside of it two columns ID and name okay let's keep going let's make one for stations too I'll clear my terminal and I'll say create me a table called stations and include actually not station uh if you ever want to let's say fix this kind of table here let me try closing this on my closing the parentheses hit semicolon enter I'll get a syntax error I can restart I'll do control L now I'll do create table stations plural open parentheses enter all indent by four spaces one two three four and now I'll similarly include an ID for each of these stations I'll say ID comma and then what else should I have well stations tend to have a name like the Kendall MIT station the Harvard station the Park Street Station so I'll set I'll give each of these the very own name comma what else do stations have well they might also have a line that they're on let's say it's the red line or the blue line or the green line and so on I'll have a space for them to write in their line that they're a part of okay and maybe I'll leave it at that to keep it simple I'll say stations have an ID a name and a line now I'll close this out I'll say end parentheses semicolon hit enter and nothing seems to happen but if I type dot schema I'll now see my Riders and my stations tables inside of my database now one more step here we have writers we have stations but we have to relate them we have to relate them using this many to many relationships we saw last week so let me try making a table to implement this many-to-many relationship and if you remember we might call this kind of table a junction table an associative entity a join table it has a lot of names but looks a bit like this I'll create this new table which represent let's say visits a writer visits a station so I'll call this table visits and inside I'll make sure it has two columns one for a writer ID to represent a writer and one let's say for a station ID to represent a station now when I see a rider ID next to a station ID in this table all know the rider with that certain ID visited the station with that certain ID so I'll close this out I'll say end parentheses here semicolon enter and finally clear my terminal type dot schema and I can see I have writers stations and visits between Riders and stations in this associative entity this Junction table or a join table up to you what you might want to call it in this case now what questions do we have why we have not used the primary key and secondary key in this table uh good question so we're going to get there in just a minute but if I look back at my uh terminal here my schema I'll see I really just have column names and we saw before and we typed dot schema on our long list.db we had more than just column names we had column names we had perhaps data types we had primary keys and foreign keys so we'll get to that in just a minute but suffice to say for now we're going to keep improving this over time let's take one more is it required to put spaces the four spaces the indents or that's just for the visual look yeah great question is it required to have these four spaces before each column name in fact no it's not but it makes the code more readable so if I I could put this all in one line I shouldn't but I could uh if I have instead this new line followed by four spaces I can make this more readable for myself and for my colleagues too good question okay so to our earlier point there are things that are missing from this schema like we have column names but as we saw before we should ideally specify what kind of data should be able to go into each of these columns and for that we'll need some new ideas to talk about here so let's focus now on this idea of data types and storage classes data types and storage classes are two very similar but distinct ideas and they're distinct in a way we'll talk about in just a minute now SQL Lite has five storage classes five kind of storage python types so to speak of values that can values that can hold so let's talk about the first one null for instance null in this case means nothing there's nothing that actually is inside of this value it's kind of a central value to me nothing is here integer means a whole number like one two three four five real talks about decimals like floating points like 1.2 or 2.4 and so on text is used for characters and blob kind of a funny one blob stands for binary large object and it represents the data exactly as I give it to this value if I tell it to store one zero one zero one zero it'll store exactly one zero one zero one zero in binary so useful for storing in this case like images and video files and audio files things that have some structure we don't want to mess around with now let's focus on this idea of a storage class these I'll say are storage classes and not data types in sqlite now a storage class like integer can comprise can hold several data types notice how there are seven different data types that could be stored under this integer storage class we have a six byte integer two by integer uh eight and zero and so on it could be any of these particular types but each of these types Under the Umbrella of this integer storage class and sqlite itself will take care of making sure it uses the appropriate data type like if I give a very large number like let's say 4 billion or 6 billion or even bigger than that it'll probably use a longer that is a bigger byte integer to store that kind of value if I give it a smaller one like one two three or four it'll probably use a one byte or a two byte integer for that but sqlite's idea is that I as a programmer shouldn't have to care if I use an eight byte or a one byte or a two byte integer I just care that I'm using integers whole numbers and they give me a storage class to use any of these up to their choice here as well now let's look at a few examples of values in SQL light that we could store well we have perhaps the red line as some text and because this is characters it's quoted we could use the text storage class to represent this particular value here we could have maybe an image into the earlier point we could say well this image might be best represented using a blob a binary large object to keep all of these pixels exactly as they are in this image but we do get some Choice some interesting design challenges we look at the idea of fares so let's say to our Point earlier fares are 10 cents back in the 1950s or so well 10 cents we could store as an integer which seems just fine but this could get confused I'm talking about dollars here or cents maybe it would be better let's say if I did this a dollar sign 0.10 and what might that be stored as well probably text right I could say this dollar sign isn't really a number but now I have to include it so I'll say this will be quoted essentially as dollar sign 0.10 now there's some downsides here too like let's say I have uh I want to add these up well I can't add up text like what does it mean to say dollar sign 0.10 plus dollar sign 0.20 I can't do math with tech so maybe it'll be better after all if I used a real or a decimal like this 0.10 but I mean even here you're in some problems if you are familiar with um how is represented in binary you might know that decimal values or floating Point values can't be perfectly precisely represented and if I talk about 0.10 the computer might store 0.1000056789 it could get very wonky out to the many many decimal digits down below here so trade-offs and challenges overall let's look at these three though I have the first one to store as an integer I'm trying to store fares here second one as text and the third one as a floating point or a real in this case let me ask for some opinions here which one would you use and why we're trying to represent fares in this case thank you people are using integers because of course I need to get the calculation very accurately that's my point of view well sometimes I can use float but you know like you said before it can get very wonky if I get you know if I really need that kind of position I don't really recommend using fluids yeah a good point so if they go back to some slides here you might argue for the integer because you know you can precisely represent integers and let's say I want to add up fares over a lot a lot of riders this might be useful for me because I know that each number will be perfectly precisely represented I can do lots of big kind of math with this number here to your point this decimal might kind of as you should get wonky later on towards later decimal points I might get some unexpected results if we add up these overall let me ask though are there any proponents of this floating point value or a real value all right so I think we uh I think so so is uh in the number of for example for each pair like we uh like the answers need that like uh truncation probably suggests one of the comments then yeah a good point so if we talk about using this float value I mean one thing we could say for it is that this decimal could be you know it's more accurate to say like were you working with dollars now and we could have maybe 10 cents which is only 0.1 of a dollar I totally hear that point as well and the point we're making here is that they're really just trade-offs among these data storage classes to use whether you're using integers or real values it just depends on your use case and what you're designing for so be sure to read up on trade-offs among these data types and determine for yourself which one should you best use okay so we have now these storage classes to store values and it turns out that columns can be made to store certain storage classes or prioritize certain classes and the key distinction here is that columns in sqlite don't always store one particular type instead they have type affinities meaning that they'll try to convert some value you insert into a given cell or given row to the type they have the affinity for now there are let's say five type affinities in sqlite text columns can be of the type Affinity text meaning they store just characters at the end of the day there's also numeric which stores either integer values or real values depending on which one seems best to convert to you have integer type of energy which means it'll store whole numbers real to store floating points or decimal values and we have blob here again our old friend to store binary exactly as we get it so whereas before we're talking about storage classes those are especially with individual values type affinities are now associated with individual columns so let's see some example of my how this might work in SQL Lite let's say I have this table of fairs and we've decided to store fares as integers well if I said that this column called amount has the affinity for the text storage class what would happen is if I insert this integer 10 it would look a bit like this later on it'll be converted to text be quoted in the sense represent is now been converted to some set of characters let's say I insert this value 25 well 25 has a storage class right now of an integer it is a whole number but if I insert this into a column that has the text Affinity it'll be converted into in this case text at the end of the day let's do the opposite let's say I have my fare as a string some text in this case I want to insert it into this column called amount but now amount has the integer type affinity or if I insert 10 quote unquote into the column amount I'll get back not 10 the text but 10 the integer because again amount this column here has an affinity for the integer storage class let's try this 25 some text again all inserted into this table now I'll have 25 as an integer so this is how SQL Lite allows us to give certain columns an affinity for certain types that they'll try to store values of that type so long as we insert values it could be fees will be converted to that type here so let's go back to our schema and try to improve it now to use not just column names but also type affinities to store a certain data of a certain type go back to my computer here let's improve this once more so I'll go over to my table and now I probably want to improve the design here and often if I want to improve this I might just need to erase what I've already done so let me introduce this new keyword this new statement called drop table to drop a table means to delete it to remove it effectively so let me try doing this for Riders stations and visits I'll type drop table writers semicolon enter nothing seems to happen but if I type dot schema writers is gone I'll try drop table stations then semicolon hit enter and type let's see dot schema again no more stations I'll try drop table visits semicolon enter and then dot schema our table our database is really gone there are no more tables inside of it so let me propose that instead of working inside of the sqlite prompt like typing out again and again and again create table Riders create table stations create table visits let me just be more efficient about this and create myself a schema file that I could reuse throughout this lesson and also waited on while I'm working on this database on my own to do that let me quit my um sqlite prompt here and let me type something like code schema dot SQL I'm just creating this file called schema dot SQL now a DOT SQL file allows me to type in SQL keywords and have them be syntax highlighted so I know what's going on inside of this file so let's just try this once more I'll type create table writers and inside I'll say it has the ID column of what type Affinity well IDs are whole numbers so perhaps integer in this case I could say ID has the integer type affinity now let me say that writers also have a name and how could names be best represented maybe text right characters here so I'll say name and text now I'll include a semicolon to say this is the end of my create table statement and before remember how I had to kind of error out or I had to like backspace and so on to improve the design here I can literally just point and click and edit this file to improve my schema and I'll later on apply this in my database using a command that we'll see a bit later so let's keep going here I'll say create table stations and inside of the station's table I'll make sure it has an ID column of type integer a name column that probably stores some text and a line column that also stores some text where name is the name of the station like Kendall MIT Harvard and line is blue line or green line what line is part of in our Subway let me try now visits I'll say create table visits and then I'll do writer ID which has what type Affinity probably integer and then I'll do station ID which has this same type Affinity it's replying on integers whole numbers here for IDs so my colon to finish this statement now this is my schema as a whole I have writers stations and visits but now I want to apply this schema to my database so what could I do I could reopen it let's say I'll do sqlite3 mbta.db in my terminal and now I want to read in this schema.sql file that is run the commands that are inside of this file so what can I do I could say dot read schema dot SQL where dot read is a command that says take whatever file you give me like schema.sql and read it into this database running any SQL keywords you come across there I'll hit enter and now nothing seems to happen but if I type dot schema I now see my schema improved and this is helpful for me because I what I could do now is I could just edit my schema.sql file and rerun it and rerun it to make sure I now have these tables being improved over time okay so this then is our new representation of our database we have Riders of course their own entity and stations they have an ID and a name and stations have an ID a name and a line we've also now included these type affinities integer text integer text to tell SQL what kinds of storage classes could be put inside of each of these columns now before we keep improving this let me ask what questions we have on these storage classes and type affinities sir you will use it when you were creating the table the table was not in the line so when we searched for the authors or books so it comes with a perfect table so how can we make a perfect table in the SQL yeah do you mind um clarifying what you mean by like the perfect table uh so I mean that it was arranged in something like inbox block boxes ah good question so um before we're able to see the results of our queries inside some boxes in our terminal and that is actually a mode of SQL light I think you type like dot mode table to see results in that version um here though we have no data inside of our tables so we can't really select anything from them like if I go to my terminal here and I try to select some data from this writer's table let me say select star from writers semicolon I won't get anything back next week though we'll see how to insert and add an update and delete data inside of these tables at which point you could write select star from writers and see some data you've inserted yourself great question let's take one more here yes I would like to know if you have a column data type Boolean yeah do we have a Boolean uh type of energy let's see so here we don't at least in sqlite some other dbms's database Management Systems might have Bool or Boolean true or false right let me show you this instead if I go to my terminal I can see if I type dot schema I haven't used Boolean there's no need for me in this case but I have used integer and integer for sqlite can kind of for uh serve the same purpose I could have the integer 0 or the integer one to be true or to be false or true respectively zero being false and true being one I believe in this case but good question okay so to the earlier question like we've improved our tables we now have type affinities for our columns but we don't yet have this idea we talked about last week which was primary keys and foreign keys this idea of trying to uniquely represent each item in our own table using primary keys and trying to reference that primary key from some other table using foreign keys so let's try to work on that now and for this we'll need this new idea called a table constraint in sqlite you can apply what's called a constraint to your entire table a constraint means that some values have to be a certain way like let's say for a primary key primary Keys must be unique they can't repeat and they must be at least in our case there's going to be integers to be able to First quickly like add on to them over time similarly for foreign keys well a constraint is that if you have a foreign key you better find that value in some other table otherwise you've violated this constraint to having a foreign key so we have two kinds of table constraints among others but two of these are primary key and foreign key and we can apply these to our table by applying them underneath the columns we tend to say will be inside of our table let's try these two here now so let me come back to my terminal here so we can Implement our very own primary key and foreign key constraints we'll go back to my sqlite terminal and clear my screen and let's then pull up our schema.sql file so we can keep modifying our schema I can now see I have the writers table stations and visits and I have some columns that could be primary keys or foreign keys but I need to declare them as such so here in the writers table what was our primary key well it was ID every Rider should have their own unique ID that should not be duplicated across any two Riders so to ensure that constraint is applied I could follow this up with a comma and then say primary key key ID just like this now ID is the primary key of this writer's table I can go down to stations and ask what was my primary key well similarly it was ID this ID column on line eight so I'll type a comma followed up with primary key ID and now that ID column is a primary key it has that constraint applied in stations but now if I get down to visits we'll have a few more options visits here actually doesn't have its own ID column that I created i instead have a writer ID and a station ID column so a few options here one option is actually to make a joint primary key I could have a primary key composed of two columns both Rider ID and station ID if I apply that constraint that would mean that if I were to ever insert a row that had the same writer ID and the same station ID as another row I would trigger a constraint violation every row that has a writer ID and a station ID has to be unique in their combination of those two values to write that kind of scenario I could follow this up and I could say similarly the primary key of this table is not just writer ID like this but it's also station ID now this is a joint primary key constraint but if we think about this logically I mean it kind of stands to reason that somebody would visit a station more than once and I don't want to make sure that every combination of rioter and station ID should be unique I want people to be able to visit a station more than once so maybe not the best design for this table but I could certainly use it in other cases or other contexts one other option would be to do before and to have maybe the ID column here of type integer and then down below make that our primary key a bit like this and now visit has its own ID column but actually sqlite by default will give me its a very own primary key one called row ID it's implicit I can't see it but I actually could query it I could query for row ID all one word and get back a unique primary key for this table sqlite is automatically created for me now we have seen the primary key options what are our foreign key options well it seems like Rider ID and station ID are the foreign keys of this table where Rider ID references the ID column in writers and station ID references the ID column in stations So to codify that to make that a reality in our schema I could follow this up with the foreign key the foreign key of this table is Rider ID and it references it references The Writer's table and the ID column inside of it using this syntax here now I could keep going I could say I have more than one foreign key I also have a foreign key foreign key called station ID and that references as we said before the stations table and the ID column inside of it so now here I have my completed schema I have a primary key for the tables I've declared an explicit column for a primary key and now I also have four on key constraints for those columns that should be foreign keys so now let me ask what questions do we have on the schema or on primary keys and foreign keys uh so yeah I just noticed that whenever before we hadn't added the affinities and the case we were not applying commas after each column name so what is the difference there yeah a good catch let me um kind of show you what this looks like in my terminals you can see it live um you I think had noticed that before we had let's say this primary key ID constraint in writers we had done something like this let me just copy paste that and we had removed this last column or comma from the name column is that right and if that's the case well it's um just convention just style here so if I want to keep adding some constraint or like a new line to my table I should include a comma here this name column was the last portion of my table I had specified I have this column called name that has Type Affinity text right but now if I add this new constraint we'll have to follow put up with followed up after a comma from this new column here notice now this constraint primary key ID is the last um let's see attribute of my table I specified I no longer need to include a comma at the end of it so whatever is the last portion I should have a I should not have a comma after but everything else I should let's take one more question here too would it be okay for the visits table to have an ID column as well yeah a good question would it be okay for the visits table to have an ID column as well it certainly would be we could Define our very own primary key for this table too so let me go back and show you how that could work I'll go to my visits table here and I could try to add my own primary key to this table I could say ID make a new column here make it up type Infinity integer like this let me scroll up and now let me add some new constraint I could say because I've made my very own primary key I'll say primary key ID now this table has a primary key that I've created called ID and this will be in place of SQL lights that it would have made called row ID itself but hidden from my own View okay let's keep going then we've seen table constraints we've seen type affinities but we could probably do more to improve the design of this table or this database so let's introduce one more kind of constraint this one called a column constraint so whereas a table constraint applies to the table as a whole a column constraint applies to a particular column let's say maybe I want a column to have certain data inside of it uh maybe I want to make sure it doesn't have null values and so on I could do that with column constraints there are four in sqlite check default not null and unique each one does something different check allows me to make my very own check like check to be sure that this amount is greater than zero or I could use default default means if I don't Supply a value when I add a new row it'll just use a default value instead not null means I can't insert null or empty values into this column in fact it's required unique means I want to make sure that every Row in this column is a unique value it doesn't appear twice in my data set so let's try applying a few of these to our schema here go back to my terminal and now let me check out this well I could try applying the not null constraint when I know I want a column to be required effectively now where could I best apply that maybe I could apply that to the name column in stations like stations must have a name so I'll say the name column cannot be null it cannot be empty in here line also should probably be not nullestation must be part of some line I can't have an empty value for line so I'll say 2 this should be not null now I could apply this up at name I could say writers must have a name too let me try that I'll say text not null or I could leave it optional I could say maybe text just on its own and let writers choose to supply a name or not now the question here is should I apply not null to my primary key columns like ID not null or ID not no here you might think that you should for thoroughness sake well it turns out that when you apply the primary key table constraint down below here this already ensures that there are several constraints applied to this particular column called ID among them being that ID cannot be null so no need to duplicate this and say that this ID cannot be null when I already have it testified down below that ID is a primary key let me check others here you might also think could I do it for Rider ID and station ID should I include not null here writer ID not null station ID not null well that would be a good thought but again we're taken care of by our table constraint using our foreign key here again this constraint will say if Rider ID doesn't already exist in the ID column of writers I can't insert that value and we could probably presume that if Rider's ID is a primary key well no will not be part of this column and therefore I already can't insert null for writer ID or station ID this would be in this case redundant so not null is good when you have columns that are neither primary Keys nor foreign keys and you want to make sure that they have all they always have a value that they are never uh null effectively okay so that is not null and we could keep going here we also had one called unique that makes sure every um every value every Row in this column is unique where could we apply this I could try to apply it let's say to the name of a station like station should have unique names so I'll say not null and unique now this column has two constraints the first not null it should always have a value a second unique the value shouldn't repeat throughout this column line I might leave without this constraint I could imagine two stations being on the same line like both on Blue I'll allow that in this case now again we could try to apply unique to our primary keys or our foreign Keys as I just did here but it's already taken care of for us using this primary key constraint a primary key again is always going to be unique and never null so we'll take advantage of that already using our primary key and foreign key constraints here Okay so we've seen unique and not null and I might argue we're at the point where this schema seems to be fairly optimized at least using our column constraints our table constraints our type affinities and so on so let's ask then what questions do we have on not null and unique if any so basically to recap if I understood correctly it's not precisely about not not unique but about the concept of the key labeling key immediately gives the attributes of not null unique and to be referenced right that is true so when you use a primary key or a foreign key constraint there are other constraints that go along with that constraint a primary key for instance must not be null it must be unique and so on so it'll be redundant to apply that again to say that this primary key should be unique or not null good clarification there okay so I think we're at the point where this schema is pretty well set for us and we're going to need to think about though how we use this in the real world like if this is our schema here we have Riders and stations well what could we do I mean Riders tend to they could register for the subway and Riders well they do visit stations but I think if we apply this to the real world we'd see that this isn't quite how it's actually done like Riders don't really register that often if a writer who's knew the city comes in they want to ride the subway they should be able to ride two so it turns out at least here in Boston the MBTA doesn't really attract Riders per se but they do track what we call Charlie cards if you want to ride the subway you get a Charlie Card Charlie Card allows you access to the subway keeps track of your fare your balance and so on and allows you to swipe in to certain stations so and when we come back from a break here we'll see how we can actually Implement these Charlie cards instead of riders to make our system more efficient and More in line what happens in the real world we'll see you all in just a few and we're back so we saw last time this challenge of Designing a system for the Massachusetts Bay Transportation Authority to represent Riders and stations people who get on a subway to go around Boston but as we also saw we learned that the MBTA doesn't keep track of riders the

Original Description

To follow along with this lecture, visit https://cs50.harvard.edu/sql. *** This is CS50, Harvard University's introduction to the intellectual enterprises of computer science and the art of programming. *** TABLE OF CONTENTS 00:00:00 - Introduction 00:00:17 - Designing 00:06:48 - MBTA 00:14:04 - Normalization 00:20:24 - CREATE TABLE 00:27:25 - Data Types and Storage Classes 00:45:27 - Table Constraints 00:53:34 - Column Constraints 01:00:45 - Altering Tables 01:20:01 - Charlie *** HOW TO SUBSCRIBE http://www.youtube.com/subscription_center?add_user=cs50tv HOW TO TAKE CS50 edX: https://cs50.edx.org/ Harvard Extension School: https://cs50.harvard.edu/extension Harvard Summer School: https://cs50.harvard.edu/summer OpenCourseWare: https://cs50.harvard.edu/x HOW TO JOIN CS50 COMMUNITIES Discord: https://discord.gg/cs50 Ed: https://cs50.harvard.edu/x/ed Facebook Group: https://www.facebook.com/groups/cs50/ Faceboook Page: https://www.facebook.com/cs50/ GitHub: https://github.com/cs50 Gitter: https://gitter.im/cs50/x Instagram: https://instagram.com/cs50 LinkedIn Group: https://www.linkedin.com/groups/7437240/ LinkedIn Page: https://www.linkedin.com/school/cs50/ Medium: https://cs50.medium.com/ Quora: https://www.quora.com/topic/CS50 Reddit: https://www.reddit.com/r/cs50/ Slack: https://cs50.edx.org/slack Snapchat: https://www.snapchat.com/add/cs50 SoundCloud: https://soundcloud.com/cs50 Stack Exchange: https://cs50.stackexchange.com/ TikTok: https://www.tiktok.com/@cs50 Twitter: https://twitter.com/cs50 YouTube: http://www.youtube.com/cs50 HOW TO FOLLOW DAVID J. MALAN Facebook: https://www.facebook.com/dmalan GitHub: https://github.com/dmalan Instagram: https://www.instagram.com/davidjmalan/ LinkedIn: https://www.linkedin.com/in/malan/ Quora: https://www.quora.com/profile/David-J-Malan TikTok: https://www.tiktok.com/@davidjmalan Twitter: https://twitter.com/davidjmalan *** CS50 SHOP https://cs50.harvardshop.com/ *** LICENSE CC BY-NC-SA 4.0 Creative Co

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from CS50 · CS50 · 0 of 60

← Previous Next →

Hello, World: Hadi Partovi

Hello, World: Hadi Partovi

Content Distribution and Archival in a Digital Age

Content Distribution and Archival in a Digital Age

CS50 2014 - Week 1

CS50 2014 - Week 1

CS50 2014 - Week 3

CS50 2014 - Week 3

CS50 2014 - Week 0, continued

CS50 2014 - Week 0, continued

CS50 2014 - Week 4

CS50 2014 - Week 4

Week 3, continued

Week 3, continued

CS50 2014 - Week 3, continued

CS50 2014 - Week 3, continued

CS50 2014 - Week 7

CS50 2014 - Week 7

CS50 2014 - Week 7, continued

CS50 2014 - Week 7, continued

Breaking Through The (Google) Glass Ceiling by Christopher Bartholomew

Breaking Through The (Google) Glass Ceiling by Christopher Bartholomew

Introduction to Amazon Web Services by Leo Zhadanovsky

Introduction to Amazon Web Services by Leo Zhadanovsky

CS50 2014 - Week 9

CS50 2014 - Week 9

How to Build Innovative Technologies by Abby Fichtner

How to Build Innovative Technologies by Abby Fichtner

Light Your World (with Hue Bulbs) by Dan Bradley

Light Your World (with Hue Bulbs) by Dan Bradley

Building Dynamic Web Apps with Laravel by Eric Ouyang

Building Dynamic Web Apps with Laravel by Eric Ouyang

CS50 2014 - CS50 Lecture by Steve Ballmer

CS50 2014 - CS50 Lecture by Steve Ballmer

CS50 2014 - Week 10

CS50 2014 - Week 10

This is CS50 with Steve Ballmer?

This is CS50 with Steve Ballmer?

Meteor: a better way to build apps by Roger Zurawicki

Meteor: a better way to build apps by Roger Zurawicki

Data Analysis in R by Dustin Tran

Data Analysis in R by Dustin Tran

Data Visualization and D3 by David Chouinard

Data Visualization and D3 by David Chouinard

CS50 2014 - Week 6

CS50 2014 - Week 6

Build Tomorrow's Library by Jeffrey Licht

Build Tomorrow's Library by Jeffrey Licht

CS50 2014 - Week 9, continued

CS50 2014 - Week 9, continued

Essential Scale-Out Computing by James Cuff

Essential Scale-Out Computing by James Cuff

iOS App Development with Swift by Dan Armendariz

iOS App Development with Swift by Dan Armendariz

Sam Clark Leads Yale Students on Tour to CS50 at Harvard

Sam Clark Leads Yale Students on Tour to CS50 at Harvard

3D Modeling and Manufacture by Ansel Duff

3D Modeling and Manufacture by Ansel Duff

CS50 2014 - Week 5, continued

CS50 2014 - Week 5, continued

CS50 2014 - Deep Thoughts - Hash Table

CS50 2014 - Deep Thoughts - Hash Table

CS50 2014 - Deep Thoughts - Binary Tree

CS50 2014 - Deep Thoughts - Binary Tree

CS50 2014 - Deep Thoughts - Scratch

CS50 2014 - Deep Thoughts - Scratch

CS50 2014 - Deep Thoughts - MySQL

CS50 2014 - Deep Thoughts - MySQL

LaunchCode Visits CS50

LaunchCode Visits CS50

CS50 Live, Episode 100

CS50 Live, Episode 100

CS50 Field Trip to Google

CS50 Field Trip to Google

This is CS50 AP

This is CS50 AP

Week 4: Monday - CS50 2011 - Harvard University

Week 4: Monday - CS50 2011 - Harvard University

Week 2: Wednesday - CS50 2011 - Harvard University

Week 2: Wednesday - CS50 2011 - Harvard University

Week 1: Wednesday - CS50 2011 - Harvard University

Week 1: Wednesday - CS50 2011 - Harvard University

Week 11: Monday - CS50 2011 - Harvard University

Week 11: Monday - CS50 2011 - Harvard University

Week 3: Wednesday - CS50 2011 - Harvard University

Week 3: Wednesday - CS50 2011 - Harvard University

Week 12: Monday - CS50 2011 - Harvard University

Week 12: Monday - CS50 2011 - Harvard University

Week 1: Friday - CS50 2011 - Harvard University

Week 1: Friday - CS50 2011 - Harvard University

Week 3: Monday - CS50 2011 - Harvard University

Week 3: Monday - CS50 2011 - Harvard University

Week 10: Wednesday - CS50 2011 - Harvard University

Week 10: Wednesday - CS50 2011 - Harvard University

Week 2: Monday - CS50 2011 - Harvard University

Week 2: Monday - CS50 2011 - Harvard University

Week 9: Monday - CS50 2011 - Harvard University

Week 9: Monday - CS50 2011 - Harvard University

Week 7: Monday - CS50 2011 - Harvard University

Week 7: Monday - CS50 2011 - Harvard University

Week 5: Monday - CS50 2011 - Harvard University

Week 5: Monday - CS50 2011 - Harvard University

Week 5: Wednesday - CS50 2011 - Harvard University

Week 5: Wednesday - CS50 2011 - Harvard University

Week 7: Wednesday - CS50 2011 - Harvard University

Week 7: Wednesday - CS50 2011 - Harvard University

Week 8: Monday - CS50 2011 - Harvard University

Week 8: Monday - CS50 2011 - Harvard University

Week 9: Wednesday - CS50 2011 - Harvard University

Week 9: Wednesday - CS50 2011 - Harvard University

Week 8: Wednesday - CS50 2011 - Harvard University

Week 8: Wednesday - CS50 2011 - Harvard University

Week 10: Monday - CS50 2011 - Harvard University

Week 10: Monday - CS50 2011 - Harvard University

Week 2: Wednesday - CS50 2010 - Harvard University

Week 2: Wednesday - CS50 2010 - Harvard University

Related AI Lessons

Your AI Keeps Making Things Up. RAG Is How You Make It Use Real Facts Instead.

Learn how to use RAG to make your AI provide accurate answers based on real facts instead of making things up

Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…

Learn to evaluate RAG models using metrics that measure retrieval, generation, and end-to-end quality

Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…

Learn to evaluate RAG models using metrics that measure retrieval, generation, and end-to-end quality

Medium · Data Science

When Does HyDE Help RAG? I Tested 3 Query Types and It Failed on Two

Learn when HyDE retrieval helps or hinders RAG performance across different query types, and why it matters for improving search accuracy

Chapters (10)

Introduction

0:17 Designing

6:48 MBTA

14:04 Normalization

20:24 CREATE TABLE

27:25 Data Types and Storage Classes

45:27 Table Constraints

53:34 Column Constraints

1:00:45 Altering Tables

1:20:01 Charlie

RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python

Professor Py: AI Engineering