Lecture 11: Coded imaging

MIT OpenCourseWare · Intermediate ·🛡️ AI Safety & Ethics ·3y ago

Skills: CV Basics80%Research Methods70%AI Alignment Basics60%

Key Takeaways

This video lecture covers coded imaging, a technique that captures images in a way that allows for sharp detail recovery in software, using tools such as Fourier transform, Ferroelectric LCD, and coded aperture. The lecture also discusses the applications of coded imaging in photography and computer vision, including motion blur, defocus blur, and extended depth of field.

Full Transcript

the following content is provided under a creative commons license your support will help mit opencourseware continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of mit courses visit mit opencourseware at ocw.mit.edu kind of a summary of a lot of the things we talked about so if you remember in the beginning he said you know you can just start with a pit and then it kind of develops with the lens but from even here you can go down two different paths either compound eyes where each sensor or set of sensors have their own optics like a soda straw uh or same lens sorry same pixel might get image from multiple lenses like here right so that's superposition so this is a position and this is superposition um and that concept of opposition or superposition uh applies to all three types shadows or refraction or reflection based techniques uh so we saw this last time and uh you know we'll see how you know we already have some projects that are inspired by by biological vision you know matt is trying the chicken and uh and uh i think it's gonna be it's gonna be very popular and uh i believe um santiago santiago oh yeah he's trying the um the piston kind of you know so it's really some really great ideas so i'm glad a lot of this concepts are coming together in the final projects so today we'll talk about uh coded imaging um and the concept here is is very simple okay so i'll start with this one which is uh you have a taxi zipping very fast um and you want to kind of take a photo in such a way that you can recover the sharp detail afterwards in software so it's a form of a core design between how you capture the image and how you process the image in a typical film camera or even a twitch digital camera you take the picture and that's basically the end of the story and here you're trying to do something clever about how the picture is taken so of course you know there are other opportunities of capturing this you can either take a really short exposure photo but that's going to be very dark if you take a high iso you know you can recover some information but still quite dark or you can just take a long exposure photo by keeping the shutter open but then you will get a blurry photo which is well exposed but a lot of the high frequency details are lost and then if you try to apply some de-blurring uh you'll get a result that looks like this which is kind of reasonable you can see the number one uh on this this uh i guess thomas train but you get a lot of banding artifacts a lot of kind of repetition and noise here so what are those ones these lines so when you try to recover uh this information you start getting this banding artifacts and we'll see in the next slide why that happens so what's going on here is that uh if you have a a a sharp photo if you if you have a blur photo you can basically represent that as a sharp photo where it's there's a convolution of the sharp photo with some kind of a convolution filter okay so if you look at my laser pointer if you look at if you look at this uh you know the tip of letter one here uh it's been blurred by certain number of pixels in the horizontal direction and if you keep the shutter open for even longer it will blur correspondingly longer so you have basically a 1d convolution that's converting this image into this image and of course the goal usually is this is the photo that you capture and you would like to invert and get back this photo right so one would say okay um this multiple this convolved with this gives me that so just deconvolve using the same filter and maybe you'll get that back that doesn't work because something called division was zero okay and the way to think about that is in the fourier domain because convolution in the image domain or primal domain is multiplication in the fourier just standard for your object for your transform so uh if you take the fourier transform of this and multiply that by the fourier transform of this you will get the fourier transform of this okay so so let's say we take this photo find its fourier transform here multiply that by the fourier transform of a box function which is a sink so what basically that means is that i'm going to take the lowest frequency multiplied by that value i'm going to take the next frequency multiply by this value next frequency multiply by this value and so on right we're just going to multiply each of the frequencies in the image by the amplitude of the fourier transform of this okay and you can already see that lower frequencies will be preserved but higher frequencies will be highly attenuated okay but there's also something strange happening even some of the lower frequencies are actually being set to zero which means that in this photo these frequencies are missing altogether they have been suppressed so it's not a traditional low pass filter it's a low pass filter where some of the uh even lower frequencies are also being nullified which means that if i uh try to recover from this photo this photo there's no chance because i have already attenuated and and have lost all those frequencies so the moment you take the photo the damage is done and there's nothing you can do to recover those frequencies because uh in the fourier domain all you have to do is take the fourier transform of this and divide by the fourier transform of this which is this and it will give you the original four okay but the fourier transform has some zeros so you cannot divide those frequencies by zero and recover an image so the culprit here is really this box function which is equivalent to you know when you release the shutter opening the release the yeah release your shutter button opening the shutter and keeping it open for exposure duration and closing it but that's the most natural thing to do but apparently it's not the most effective so what if you change that what if you change that and instead of keeping the shutter open for the entire duration you open and close it in a carefully chosen binary sequence okay so for some time the shutter is open the shutter is closed it's open for some time again it's closed here is closed for quite some time open for short time and so on so at the end you still get just one photo but now something magical has happened because okay first of all if you look at this number one you'll see that it's not the same as before it has it seems to have this replicas okay and the reason why this is better if you take the fourier transform of this it's actually flat which means it's preserving all the frequencies in the image so we can be sure that in this photo all the spatial frequencies low frequencies high frequencies they're all preserved of course they're attenuated it's not as high as you know it's not 1.0 it's reduced maybe it's 0.1 or so so they're all attenuated but there's still some hope to recover this photo back from this because in the denominator we will not have zero okay so of course if you uh try to implement this mechanically where you you know open the shutter and then mechanically try to close the shutter that will be problematic so what we did was we used an lcd actually a ferro electric lcd that becomes opaque and transparent and in the old virtual reality screens or even some of the some of the games you have these eyeglasses that flicker at 60 hertz so you know for time sequentially so that you can see the left eye versus like that these are the same glasses and a traditional city unfortunately doesn't have a very high very high contrast and simon is discovering that one more time but the ferroelectrical series have have a contrast of thousands to one so when it's opaque uh the amount of light that passed through as compared to when it's transparent and the amount of light it passed through the ratio is minus two thousand okay so when you when you when you turn this for electrical city off it's really really opaque yeah how can you just do it with high speed video or just some sort of mistaking video and cut out your frames so so the question was why not just capture high speed video and take all these frames right and then put them together the problem is each of the frame will be extremely dark so you're basically adding up a lot of noise every frame is dominated by noise so when when the when the um shutter is transparent light goes through when the shutter is opaque light uh doesn't go through and that's your one zero one zero uh encoding um so again the idea is very simple uh instead of keeping the shutter open for the entire duration uh and getting a well-exposed photo the shutters open for uh only half of the time domain of that function that you described there is infinite right so you actually truncate this it's not infinity because you still have some width all right but you have the the you know you know infinite high frequencies there by the the sharp transitions right uh yeah you can think i mean you can think of you this one goes to infinity but there's hardly any energy left so although it goes to infinity there's not much energy left but when you're gonna invert the process then that's why you still not get the perfect image assume you know in this case in in this case as well you still lost so you haven't seen the results yet for this you show the facts yes so this is what it looks in this case but it's a very controlled experiment uh in a laboratory so you take the toy and you move it in a very controlled way um and this is what you can get in a traditional camera and this is what you get in the flutter shutter again so this is the real photos and it's yeah you're right i mean you still get some noise and actually if you compare this with ground truth you'll see that it's okay but it's not perfect yeah so so uh let's say that you took the zeros from uh the sync function right and you just replace it by something that is pretty close to zero but not zero uh and if you you know rework the process from here this is what you get that's okay it's already there's a deep learning of this okay yeah and that's this lots of these frequencies also shows up as this artifacts at regular frequencies the regular intervals so again this one this doesn't go to infinity all the way this graph it cuts off at corresponding to the width if the width of this pulse was very short then yes will go very far the filter is dependent on multiple factors you know so if you're if your toy is moving or in your tax is moving really slow then there is no need to in this case the sequence of sequences sequence was about 51 actually 52 entries 52 vector long so let's say your exposure time is about 104 milliseconds uh it's open for two milliseconds here it's open for four milliseconds off for two milliseconds four milliseconds to maybe it's off for 18 seconds two and so on but with with a vector length of 52. this is this feature isn't in time in time and you think about the filter space it corresponds automatically to filter in space yes so so your actual blur in the image may not be exactly 52 pixels it might be 10 pixels it could be 100 pixels so your 52 vector is going to stretch or shrink based on how fast the object is moving and you're saying that it also depends on how far the object is in space because faster moving objects and you mostly have to think about image space motion because the speed in the real world and the distance are they get you know you divide to normalize by the distance so you only have to worry about the image space distance could you get a similar effect if you have like instead of a hooded uh shutter it could have flasher yeah exactly so if you're in a dark room you can just if you're in a dark room then you can just stroke the light rather than opening and closing the shutter i think we might have a verbal demo of that scene yeah well i don't know how fast you can shut the promise yeah uh so what are some uh let's let's look at some pictures right so here is uh a demo i think i've shown it to you before um this is on broadway it's humane try to figure out the car make and the license plate number what's the license plate number four five three four yeah you know so you get a reasonable result so but going back what are the limitations of this method yes you need to know you know you need to know the point spread function how the blur is created if the car is moving from left to right versus right to left you need to know that because the way your point spread function will be uh imposed on the scene will be different you just have the light very important right so this image is about half as bright as this one what else i guess there should be little less of an acceleration or all of them should be whatever is moving has to move you know at a constant speed if within 100 milliseconds it picks up speed then your assumption that the 52 length vector will map to some you know stretched or or or shrunk version of 52 is not valid some parts will go faster and slower what else sorry if if the order is moving in space these levels yeah so you so if it's moving in some in a perspective for example it's not so bad because you can rotate the image and again it will become so that's not acceleration that's still constant speed it's acceleration in the image plane but in the real world is to constant speed so you can you can play with those tricks you can either go to object space or you can come back to image space to make sure there is no acceleration it's all linear so does this techniques to work if you're moving in multiple directions at once over the duration so if you have you know multiple cars for example and they're all independent then it's fine because i can say this car is going this way that car is going this way as long as moving in a straight line at a constant speed you're okay but if the two cars overlap what happens our model fails again right so if two cars are partially overlapping during the exposure it's possible but it's more challenging because you want to know exactly how fast the two cars are moving yeah sorry when do you need to know uh how fast when you're setting up your shutter no no when you're okay so when you're setting up your shutter if the car is moving really slow and you don't expect it to blur by 52 pixels and you expect it to blur by only 10 pixels then using a 52 sequence is overkill maybe you should use a new sequence that's only about 10 long or 11 long right so it's just like that's just so you can get more light that's no that's so that it's most optimal for that setting yeah right so it's like setting an exposure time you know when i take a picture the camera automatically decides what the exposure time should be similarly should look at the speed of how things are moving maybe with ultrasound doppler or whatever and it says you know things are not moving at all so i should not use the floater shutter at all and things are moving very slowly maybe i should use a 10 long sequence or things are moving a lot maybe i should use the 52 series and and to answer your other question where you need to do is when we solve the system we need to know how long the blur is which is true in other cases as well you know you need to know how much the blur is okay another major disadvantage is let's say uh you know i want to take this bottle and if i just rotate this and motion blur that it will not work for any point in the front that you're looking at it will work but the point that was in the back that out of the 52 sequence maybe for the first 10 it was occluded and the remaining 42 it was seen you have to know exactly when when that point became visible during the 52 window so in general the technique works well when you know things are moving naturally but if somebody wants to do you know this kind of an experiment um or or you know move things behind an occluder and move out those are very challenging scenarios mass uh vertical horizontal is fine you can it doesn't matter it could be moving vertically you're basically your your point spread function the blur function will be vertical rather than cosine yeah i mean but if you have a combined as long as it's in any one direction it's okay so let me draw it the basic association is that if you take any point in the scene it's moving in a straight line that's it and if you have an object and every pointer that updates them in a straight line you're okay it doesn't matter which direction and what speed so this doesn't help at all with any with image stabilization of somebody holding the camera it helps as well so if you have let's say you know you have a camera shape and i take a picture of a led and it creates you know some curve like that yeah because i mentioned if i know that curve maybe i can put a gyro then i can again figure that out okay so the the problem here really is the point spread function or the blur function is very critical and this is what we're going to study about you know half of the class and the concept is very very uh very interesting because uh light is linear so imaging is very linear what happens to a point happens to rest of object so if i have a car that's moving and i tell you how exactly one point on that car is behaving in the image i can tell you automatically how rest of the car is behaving right because it's going to all of it is going to have the same spread in image so you can either for for experiments you can just put you know an led on the car and see how that led moves and that tells you everything about it so this is i'm sure you use this trick and in other scenarios where you you know you look at a very small impulse and see how it responds it's also like an impulse response those of you with audio you know you might want chirp and see how the room uh re-operates and when you do when you're trying to find a speed of a car with a with a rudder you send a chirp a very small impulse and it bounces and comes back right that's the point square function for your your time of life so that's the same concept here we're just going to put an led in the world player picture and see how it looks and this whole field of order dimension is basically engineering of the point square function so if you take an ordinary camera a film camera and take a picture you have no control over how light is spreading when something is moving or focus uh has different color spectrum and so and a coded imaging basically means you want to control how something is spreading on the image we're going to engineer actively how that happens so in this particular case a point that was moving created a blur that looks like this and by engineering the time points per function instead of looking it like that it's going to look like that right it's going to look like dashboards and then it just turns out that this one is easier to deal with than this one that's the best concept engineering or activity changing the points so this is very counter-intuitive because you would say let me just build the best lens and the best exposure time and so on that kind of mimics the human eye and once i have that i have the best possible picture but when it comes to actually extracting information from that scene it turns out you need to spread you need to strategically modify how the camera works so that all the information is somehow preserved now the problem is even after you are very careful and you have captured that image is still going to be somewhat garbled it's going to be mixed in but that's where the core design comes in so so once you have this image there is some hope there's some computational technique that will allow you to go from here to here okay and this is what kind of separates an animal eye from a computational eye because in most scenarios uh an animal eye is just going to take the picture and try to make the best sense out of it but a computational eye is going to apply a lot of processing to this and be able to recover that as far as i know you know animals don't have the convolution circuitry or tip blurring circuitry i can look at a blurry image and kind of figure out i mean this was a challenge for you right right so we have pretty sophisticated eyes but we're still not able to de-blur what this is you know if you have some prior knowledge of how the volkswagen logo looks like maybe you can say okay maybe that was this but on the other hand if i give you this you're immediately willing to believe that this photo is a blurred version of this photo right and so kind of thinking about that this when you go from here to here information is lost when you go from here to here we're trying to recover some information right so going from a sharp photo to a blur photo is easy for us because we just have to lose some information we have to imagine what it will look like is some of the information is removed from this image so uh the goal of coded imaging is to come up with clever mechanisms so we can capture light but not just by converting photons into electrons but actually modulating those photons either blocking them or attenuating them or bending them uh and so on so in that that's why computational camera is doing the computation not just in silicon but also in optics okay so so that was you know what we can do to preserve information in case of motion blur right and the circuit is very very simple you just take the hot shoe of the flash and it triggers when you reduce the shutter it triggers the circuit and then you just cycle through the core that you care about what can we do for the focus blur that was for motion blur what can you do for defocus blur we again want to engineer the point spread function spatial coding how would you how do you apply special coding coded aperture so this is coded exposure coded aperture very easy um and all you're going to do is put some kind of a code in the aperture of the lens and this is how actually us it started you know there's a in in scientific imaging especially in astronomy uh quarterly pictures are very well known and those of you who attended uh professor horn's lecture on wednesday that's what he talked about you know coded apertures uh so i've been following this for a long long time uh and i thought it must be useful for something in photography uh and so i said okay let's let's try to put a coded aperture in in the in the camera and see if we can deal with you know focus and so on and that was back in 2004 uh and we tried it for about six months and it just didn't work it was really frustrating really really frustrating and then you know one fine day i said okay if you can do this in space i'm sure we can do this in time as well and so we did this and this was this worked right away within a couple of weeks so we went ahead and built this whole system and you know that was the secret of paper and then we said okay let's come back and think about this what's going on why don't we get good results so it took almost two years to realize that to put this coded aperture uh in in a camera there are only a few places where you can put it to get good results so out of that came this particular experiment uh so i have a colleague jim coppler at mgh and uh one day he showed me this is his lens by the way he was telling me the story that he was fishing with his camera and some creature came out of the water uh some kind of an alligator and he lost his balance and uh you know the boat flipped upside down somehow he managed to you know flip back in and the alligator went away but he completely damaged his camera that was with him and it just wouldn't work so he just took out his lens which is a standard uh canon lens um and he said let's open it all the way so you know he ripped open all the all the damaged it had all the mud in it um and so on and then he just showed me this this thing as is um it was very fascinating because uh this is a standard film lens uh which of course can also be used with digital camera and this is a fixed focal length lens it's a 100 millimeter uh focal length lens and when you focus with this uh it works in very interesting ways first of all it doesn't have a single lens element it has multiple lens elements so when you change the focus it has to do some really interesting things it has to deal with um chromatic aberration geometric aberrations such as radial distortion and so on so it has to move all these lenses with corresponding ratios okay so i'll pass this around and you'll see that there are these notches on this lens that are in a parabolic fashion okay so when i rotate this okay the internal lenses the outermost lengths and the inner most length remains at the same place but all the inner lenses move with some particular ratio it's amazing the way it's it's it's structured right so the multiple lenses are moving every time i move this and they're moving because they're guided through these groups okay but there's one particular location that does not change in this lens and that's the aperture okay so we said let's let's look at the sub pressure and uh back then it was you know still a reasonable looking lens so we went in our lab and we cut open all the way and you can start putting new apertures in this plane so you know you can cut open that particular guy and start putting this aperture now it turns out the center of projection of this lens is very carefully designed by camera makers to be in the same plane where you put your aperture so when you you know when you change your f-stop and decrease it and increase it it's all happening in the center of projection okay everybody knows center of projection so when you think about a before camera you make this very simplistic assumption that there's a pinhole and there's a sensor and when you put a lens you know we assume that the center of the lens is the central projection that means all rays can be assumed to go through that point when you have a bunch of lenses like whatever here where is the center of protection is it here or here or here or here uh and of course there is you can take a collection of this lenses and create one single central location for for for normal cameras for for fisheye lenses that's not true but for normal cameras you have uh the central position where you can conceptually assume that all the rays are going through that mode so finding that uh plane is actually a tricky problem uh and then in retrospect it's very easy you know if the if the lens makers are putting everything there we should put our recorded aperture also in the same plane so initially we said oh let's put it in the front let's put it in the back we tried all those things but that creates blur that's not constant all over the image and it has a lot of issues but by placing it over there it turns out you get the same blur so what exactly happens if you take a picture of a point light and everything is a sharp focus nothing changes okay if you have just an open aperture and adding a picture of a point light it looks like a disc now what's going to happen when you put that put this code like the 7x7 mask and take an autofocus picture what will happen to the led it's going to look like the code right and why is that why is that happening so let's think about a very simple case so we have our lens right and we have a point light and we have put some code here when you sharp focus it doesn't really matter what the code is basically you're blocking about half the light so the photo will be half as far but other than that it looks like an original and that's why if you have some dust on your lens and so on usually it doesn't matter unless you have the dust all the way on your front uh lens because the the central position is over here so if the dust was over here nothing will happen the image will be slightly different but if the dust is all the way in the front then you start seeing this anyway so when you sharp focus you just see the point but let's say now you're out of like this here what will you see you'll see the same inside mask right so the red display comes in it's got this way goes in it goes through this array comes in it's blocked this way goes through and so on so basically you see the same guy if you put the put the sensor all the way here using the whole code i just start moving away fold the shrink and eventually when you put it here we get another one that's exactly what's happening here when it's out of focus you just sleep so the idea came around the same time of how to make this happen um okay um imaging of this code now still have to be blurred like because those are basically multiple apertures that you're seeing yeah so the photo here is nothing you know the photo here that you see you know is still blurred right it's it's just that it's blurred in a slightly different way you know here is blurred with that shape every point is blurred with that spread function and you cannot see anything on the resolution chart but here if i just you know this guy that won't work because i'm in a different mode right if i look at this picture you will see that so this is a sharp photo this blurred with disc and is blurred with that function you can already see that it seems to preserve slightly more information but still you won't be with your naked eye you'll not be able to figure out what underlying patterns are but it turns out after deep blurring you can all right so then you can do these simple tricks where you know the person you are interested in is out of focus uh but then you can refocus digital so this is the input photo and this output so the same exact trick which is in in case of motion we created a point spread function that was engineered in one dimension right and here we are engineering a point spread function that's two dimension so here we know that the fourier transform of this 1d 52 length vector is broadband it has energy at all the frequency what can we say about this is fourier transform what can we say first of all let's say one by seven so it's fourier transform is also seven by seven when it's 52 it's fourier transform is 52 long it has it's more distributed it's still just all being near the center so in one day this is what we saw right this fourier transform is flat so there are 52 entries here and almost all of them are the same now we're saying think about the problem in 2d and what's the fourier transform of this the first for this one the fourier transform is the dc and it's flat and then if you kind of take that in 2d right i'll give you a hint if i just take a square aperture like a traditional one and take a square transform it will look so the full transform of this one looks something like this right earlier so fully transform of this one if i take the cross section here it's going to look the same same thing here for a square aperture and now we are saying for this crossword puzzle shaped pattern should be easy right it's going to look just like this so the fourier transform of 7x7 will have a peak in the middle so the 2d fourier transform will have a peak in the middle but rest of the values will be constant and that's the magic of a broadband code so by placing a broadband code suddenly we have an opportunity to recover all the information so it seems very very um uh long-winded right if all i wanted to do was create a photo from which i can deep blur to get sharp photo what do i need to think about all this theory right and the reason is when you think about point spread function uh it's just traditional signal processing it's a convolution and so on and it's much easier to think about convolution and uh deconvolution in frequency domain than in in primal domain and in communication theory everything is in frequency right we think about carrier frequencies of radio stations in frequencies we say my fm channel is at 99 megahertz 100 megahertz and so on and we think about guard bands and and audio bands and everything interesting frequency domain and that's because it's signal processing it's the same thing that's going on here and convolution deconvolution much easier to think in frequency domain although all the analysis in the frequency domain at the end the solution is very easy just flutter the shutter or just put a code at a picture you know extremely simple solution to to achieve that okay so those are all good things about uh coded aperture uh what are some bad things about cold aperture what are some disadvantages here it's very similar to the literature so half the light very good and when that's when when you talk to people who build cameras and you tell them no no no that's not allowed losing half the light yes the book is at least depends on your i mean for for average consumer i don't know whether this matters but you're right if you're looking at something that's you know we have bright lights in the scene at a distance you know take out a focus photo they will all look like this parts in it or like right yeah i was thinking so so an interesting art problem is how do you create how do you create a mask that visually looks aesthetic but is mathematically also invertible other disadvantages are our challenges not related remember in the motion case we have to know how much the motion is what do we need to know here how much the blur is and what is that function of when it's in plane of focus it's sharp when it's out of plane of focus it's blurred but the size of the blur is dependent on what the depth but not just depth depth from the plane of focus right so that's an extra parameter you have to estimate somehow maybe you can use a rangefinder or something like that or just some software the methods you can employ don't you just try to assume something try to be focusing to this contrast yeah yeah you could do that it doesn't work that well but but you're right that would be another way to another way to try this you can like just maximize your like hard edges in the image exactly that's what you would do like in a light field when we did the refocusing that's the trick we used right we said okay let me try to refocus i don't care about the depth when it comes into sharp focus almost my edges that must be the right depth unfortunately it doesn't work out in this case and we won't go into the detail but the main reason is that because it's coded aperture no matter where you refocus it still looks like it has very high frequencies so that makes it challenging yes oh exciting so you need to find this 7x7 pattern or in the previous case the 52 pattern and you you know take a random sequence take it for your transform to see if it's flat if it's not flat you go to the next so the initial that's what i did i said wow you know it can't be that bad you know 250 i mean it's 52 element long and i know some of them i only want to take the ones in which about half of them are once and half of them are zero so can't be that bad so you know i wrote a matlab script and i said you know by the time i come tomorrow morning i'll find a really good code and i came back next morning nothing had happened i waited whole days still running and you know it never came out of that so 2 to the 52 is a is you know it's pretty challenging yes yeah so so sorry where's your hadoop cluster we need it exactly but even if you use a cluster it's still pretty big number so you can you can do some approximations so you can start with some code and do kind of a gradient descent and so on yeah does the harder work or anything is that here so actually i after we did these two projects uh i attended professor han's lecture on computational imaging which i highly recommend by the way it's it's terrific and uh there are all these theories about how to create different codes for different applications so hadamard code which we we learned about a few weeks ago or so-called broadband codes you know they all have polynomial solutions and and this and that there's no good solutions for 2d but for for one day there are some really good solutions uh to to come up with them and even for 2d for certain dimensions uh they call it one more four or three more four because prime numbers can be one more four or basically when you divide by four the the remainder can be one or three and the certain sequences they have beautiful mathematical properties of which sequences could have broadband properties and which may not uh so it turns out you cannot there's a little bit of cheating going on here so you cannot really use the broadband code here either to give you the best result you can call them broadband because their behavior is broadband but the traditional course called muraku m-u-r-a multiple uniform redundant array they were invented not very long ago maybe 20-30 years ago and they used in in cdma and and many other astronomical imaging applications and they have similar properties of being if you take this frequency transform it's it's broadband the problem is uh in many of those uh example many of those applications uh your convolution is actually circular so you apply the filter and then when you go off the edge you apply the filter to the beginning of the signal okay this particular filter is actually not circular but it's linear so when you apply the filter here when you start applying the filter at the end of the image you don't go back to the front of the image right because clearly if i put an led here and you get out of focus if i put an led here you'll only get half of that the rest of the half is just blocked it's not going to magically appear over here so that's the difference between linear convolution and circular convolution it turns out for circular convolution the match is very clean and beautiful and you know this mura course work but for linear convolution you know there's no good mechanism so we came up with our own code called rat code rat which is after three co-authors included enough padding there wouldn't you be able to use circular convolution yeah i mean circle convolution i mean linear convolution is basically circular convolution with a lot of padding of zeros yeah because you said that the map would be easier right but then it's too large i mean finding a code that's that's you know seven long or maybe 30 long is okay finding a code that's thousand long is nearly possible so the difference between muda and rat is only on the edges or is it all over the picture it's only the fact that one is the inner convolution and one circular convolution okay yeah i think it's pretty amazing that this could be that because i mean if you start just having very simple patterns yeah all over the place right so yeah so it's it seems like you can just use a random sequence and get a similar property but actually it doesn't work the chances of a random sequence doing the right thing for you is very very low um so in astronomy you have circular convolution because they use either two mural tiles and one sensor or one neural tile and two sensors so they have circular convolution so all right if you tighten it up also it will be like that if you repeat that if you tile the mask at the aperture if you talent temperature you'll get really horrible frequency response unfortunately because if you put two tiles that means certain frequencies are lost basically by taking the dc coefficient you're reconstructing almost everything no no no not district coefficient because if you look here all the high special i mean the whole image is not one value yeah but look at that that's the you know the spectrum of your europe right now no but there is a non-zero value at other frequencies yeah yeah i feel but i mean you know no no that's very important yeah but but thinking that you could get a very good approximation yeah but if you know to a naive uh naive consumer this photo and so look at this part okay this photo and this photo looks almost the same right and remember in this photo many of those frequencies are lost right uh and in this photo those frequencies are not lost because all the frequencies are preserved but that's because our eye is not very good at thinking about what the original image could be given either this one or the previous one right so given this i can challenge you you know that you'll not be able to predict that it has all this structure right from here you cannot predict that you have all the structure so how would you describe i mean the mask as a basically you spread the energy you know sort of uh over many frequencies but you know very small coefficients that exactly it's about depending on the code it's about one tenth or one twentieth of the original power of that frequency so you get significant attenuation so you know the results are not perfect right if you if you look here right it's not it's not perfect results whether it's here or or here right look at this you know it's i wouldn't call it photographic quality yeah yeah but if you apply very simple by the way these are raw results there's no median filtering or smoothing or anything it's just pure ax equals b x equals a backslash b but just the fact that you know the mask yeah it's it's fun right it's it's it's what's amazing about quarter imaging is that the math is elegant and beautiful and sometimes complicated but the implementation is very easy at the end all i have to do is put this code or shutter it and very easy to explain my previous boss is to say the best ideas are the ones that are easy to explain but difficult to conceive all right so let's move on uh okay let me finish this one so this is just one way of we only saw two ways of engineering the points per function one in motion and one in focus right but there are many others we have some we saw some of them over the course of the semester where uh you know you can put for example a special filter uh in the lens so that you get blur that's independent of depth yes go ahead he had this binary message right what if the nest was not finally if you have some attenuation by the water so that you could sort of approximate something or stuff right so what would we have so that's a very good question so let's see let's get this up first okay so if that function was okay uh if the function was continuous right so in case of fluttershutter we didn't have much of a choice it's either opaque or transparent so it's one or zero but in case of uh aperture yes you know it doesn't have to be opaque or transparent it could be a continuous value and um initially actually i and my author amit agrawal very smart guy we always had this arguments about you know maybe continuous is better maybe binary is better and he continued to believe that continuous is better uh but it turns out and we still don't agree with this by the way and you know nobody has written this down it turns out that for any continuous code there is a corresponding binary code that will do an equally good job so far and that's because uh in a binary code you get to play with the phase function i won't go to the detail but because here we are only showing you the amplitude of the fourier transform but not the phase so you get that extra degree of freedom to play with so if you play with the right phase then it turns out you can always have a binary function mike that's a great idea people talk about it but nobody has done it just one of those things it's just one of those things and you know it's like we are sick of it so we don't want to do it but but i think it's worth worth trying and and because those are orthogonal motion blur so here's a great uh thought experiment right so mike's question was there could be something that's moving so it's motion blur but it's also out of focus okay is can you use both at the same time and recover uh yeah it's one fourth of the line but let's not worry about that okay expand or diagonal technologies yes exactly so it's amazing because motion is time and the focus is space they're completely orthogonal so you know you can you can play with it it's it's it's it's very interesting but still motion is being represented by space on that yeah eventually you have a 2d projection yeah so that's very interesting all right so um the point spread function uh you know although i and my team were the first one to do that in kind of a graphics vision domain people have been trying to do that since mid 90s in imaging and there was a very classic paper by uh kathy and taoski and others for so-called wavefront coding and a lot of it is actually being used in cell phone cameras and what they do is they put this face mask between the object i mean on near the lens so that and we saw this in the beginning of the class so that the image does not come into sharp focus ever instead of that it it feels it's like a set of straws imagine these are all straws that are coming in and you just twist them okay so the top one kind of goes at the top i thought you're at the bottom the bottom one goes at the top and when you when you think about the the cross-section of all the straws it's kind of cylindrical when they all come together okay so i'm going to take all these straws or maybe strings if you want to think about it and i'm going to twist them so that they remain cylindrical so if i put my sensor here if the image is out of focus by this width if you put the sensor here it's still out of focus but by the same width so no matter where you are the image is out of focus but by the same amount okay and you say what's what's good about that you know it's always out of focus but turns out the wavefront coding as they call it but you can think of this now we know what light field so you know this is just a unique light field of the scene it turns out that from that you can recover images like this so this is open aperture what happened here i'm sorry i don't have a picture but we discussed it in the class so i hope you remember that i missed that picture we saw this right in the very first class by the way um and the benefit of that it turns out is that it preserves the spatial frequencies and it has the benefit that no matter which depth you are at you have the same defocus blur so the disadvantage of coded aperture was that you need to know what the depth was to be able to blur but now because it's independent of depth you can just apply the same deconvolution and get back a sharp image so whether you know if i hold my cell phone camera whether i'm here or here or at infinity i get the same amount of blur same point spread function and from that we can deconvolve and get an extended depth of field that goes from very close to the lens to infinity so omnivision which bought this company cdm optics which is named after kathy dawski [Music] and somebody uh those are the two professors at colorado and last one i forget that was just bought by omnivision which is a big cell phone i mean big imaging company most of the businesses and cell phones and they acquired the company and immediately laid off all the smart people who invented this very sad uh because you know that part is done so they want they want they just wanted the technology uh and it's in a lot of cameras there's another company called tessera which has a very similar solution um but what they do is so this one basically what it does and we discussed this i think in the beginning uh the wave encoding is they are simply facing uh an addition here so that this part of the lens will focus an image here this part of the lines will focus eventually this one focuses here so the top of the lens has a short focal length focus is here the second one focuses here third one focuses here hold some focus is here here so if we can imagine the main lens has certain focal length and we're just going to add a little bit of additional focal length to each piece and that's why you have focal length f1 f2 fn and then they're going to focus and this is the the twist that also because this will continue here but within this region the thickness will be about the same so you can either think of it as adding small lenses on top of the main lens right or the way they do it is they actually put one single sheet that looks like that okay with an additional layer of so-called face mask and a phase pass basically means uh you are changing the phase of individual light and as you know if you have a piece of glass and light flowing through it's going to slow down here and then again speed into space that means you basically slow down the light a little bit and that's what a glass lane system if you have at the top of the lens if light goes through it doesn't slow down that much if you go through the middle of it it slows down for a bit right that's why as you as we learned right at the beginning if you have something very far away this slows down a little bit so it goes over here this goes over here and everything just works out with the traditional things but by adding this extra piece of glass you're saying i'm going to screw it up and slow down slightly so this is the cdm optic solution what the tester guys did which is that you bought is another company company forgetting the name the solution is very similar i'm sure they are finding out in the court right now same solution instead of putting this particular gap that is going to add some extra glass but mostly in a binary form let's just come for discretization so basically the same solution but creating different focal length for different parts yeah although you said i mean there's this this portion there where i mean yeah right right but uh what is being blurred that each pixel bears independent of the depth you get the same blur yeah but see i mean some guys are focusing say you know at an angle it doesn't really matter it doesn't really matter because just like in a traditional camera even if the point is not on axis but off axis you still get the same you still get a disc right which we saw in the yeah you ge

Original Description

MIT MAS.531 Computational Camera and Photography, Fall 2009 Instructor: Ramesh Raskar View the complete course: https://ocw.mit.edu/courses/mas-531-computational-camera-and-photography-fall-2009/ YouTube Playlist: https://www.youtube.com/playlist?list=PLUl4u3cNGP61pwA6paIRZ30q1sjLE8b6c Lecture 11: Coded imaging License: Creative Commons BY-NC-SA More information at https://ocw.mit.edu/terms More courses at https://ocw.mit.edu Support OCW at http://ow.ly/a1If50zVRlQ We encourage constructive comments and discussion on OCW’s YouTube and other social media channels. Personal attacks, hate speech, trolling, and inappropriate comments are not allowed and may be removed. More details at https://ocw.mit.edu/comments.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from MIT OpenCourseWare · MIT OpenCourseWare · 0 of 60

← Previous Next →

21. Post Trade Clearing, Settlement & Processing

21. Post Trade Clearing, Settlement & Processing

MIT OpenCourseWare

10. Financial System Challenges & Opportunities

10. Financial System Challenges & Opportunities

MIT OpenCourseWare

7. Technical Challenges

7. Technical Challenges

MIT OpenCourseWare

3. Blockchain Basics & Cryptography

3. Blockchain Basics & Cryptography

MIT OpenCourseWare

19. Primary Markets, ICOs & Venture Capital, Part 1

19. Primary Markets, ICOs & Venture Capital, Part 1

MIT OpenCourseWare

1. Introduction for 15.S12 Blockchain and Money, Fall 2018

1. Introduction for 15.S12 Blockchain and Money, Fall 2018

MIT OpenCourseWare

Chalk Radio, A Podcast about Inspired Teaching at MIT (Teaser)

Chalk Radio, A Podcast about Inspired Teaching at MIT (Teaser)

MIT OpenCourseWare

Nuclear Gets Personal with Prof. Michael Short (S1:E1)

Nuclear Gets Personal with Prof. Michael Short (S1:E1)

MIT OpenCourseWare

How Africa Has Been Made to Mean with Prof. Amah Edoh (S1:E2)

How Africa Has Been Made to Mean with Prof. Amah Edoh (S1:E2)

MIT OpenCourseWare

Making Deep Learning Human with Prof. Gilbert Strang (S1:E3)

Making Deep Learning Human with Prof. Gilbert Strang (S1:E3)

MIT OpenCourseWare

Social Impact at Scale, One Project at a Time with Dr. Anjali Sastry (S1:E4)

Social Impact at Scale, One Project at a Time with Dr. Anjali Sastry (S1:E4)

MIT OpenCourseWare

Film is for Everyone with Prof. David Thorburn (S1:E5)

Film is for Everyone with Prof. David Thorburn (S1:E5)

MIT OpenCourseWare

Lecture 12: Aircraft Performance

Lecture 12: Aircraft Performance

MIT OpenCourseWare

Lecture 3: Learning to Fly

Lecture 3: Learning to Fly

MIT OpenCourseWare

Lecture 13: Interpreting Weather Data

Lecture 13: Interpreting Weather Data

MIT OpenCourseWare

Lecture 21: Weather Minimums and Final Tips

Lecture 21: Weather Minimums and Final Tips

MIT OpenCourseWare

Hand-on, Minds On with Dr. Christopher Terman (S1:E6)

Hand-on, Minds On with Dr. Christopher Terman (S1:E6)

MIT OpenCourseWare

Part 4: Eigenvalues and Eigenvectors

Part 4: Eigenvalues and Eigenvectors

MIT OpenCourseWare

Part 5: Singular Values and Singular Vectors

Part 5: Singular Values and Singular Vectors

MIT OpenCourseWare

Part 3: Orthogonal Vectors

Part 3: Orthogonal Vectors

MIT OpenCourseWare

Part 2: The Big Picture of Linear Algebra

Part 2: The Big Picture of Linear Algebra

MIT OpenCourseWare

Part 1: The Column Space of a Matrix

Part 1: The Column Space of a Matrix

MIT OpenCourseWare

Intro: A New Way to Start Linear Algebra

Intro: A New Way to Start Linear Algebra

MIT OpenCourseWare

9. Chromatin Remodeling and Splicing

9. Chromatin Remodeling and Splicing

MIT OpenCourseWare

28. Visualizing Life - Fluorescent Proteins

28. Visualizing Life - Fluorescent Proteins

MIT OpenCourseWare

20. Roth's theorem III: polynomial method and arithmetic regularity

20. Roth's theorem III: polynomial method and arithmetic regularity

MIT OpenCourseWare

8. Szemerédi's graph regularity lemma III: further applications

8. Szemerédi's graph regularity lemma III: further applications

MIT OpenCourseWare

19. Roth's theorem II: Fourier analytic proof in the integers

19. Roth's theorem II: Fourier analytic proof in the integers

MIT OpenCourseWare

12. Pseudorandom graphs II: second eigenvalue

12. Pseudorandom graphs II: second eigenvalue

MIT OpenCourseWare

1. A bridge between graph theory and additive combinatorics

1. A bridge between graph theory and additive combinatorics

MIT OpenCourseWare

Special Episode: Teaching Remotely During Covid-19 with Prof. Justin Reich

Special Episode: Teaching Remotely During Covid-19 with Prof. Justin Reich

MIT OpenCourseWare

Spring 2020 Update from Dean Rajagopal

Spring 2020 Update from Dean Rajagopal

MIT OpenCourseWare

S1E7: Unpacking Misconceptions about Language & Identities with Prof. Michel DeGraff

S1E7: Unpacking Misconceptions about Language & Identities with Prof. Michel DeGraff

MIT OpenCourseWare

Climate 101 Live

Climate 101 Live

MIT OpenCourseWare

Welcome for Volunteers (for EarthDNA's Climate 101)

Welcome for Volunteers (for EarthDNA's Climate 101)

MIT OpenCourseWare

Learning to Fly with Drs. Philip Greenspun & Tina Srivastava (S1:E8)

Learning to Fly with Drs. Philip Greenspun & Tina Srivastava (S1:E8)

MIT OpenCourseWare

Thinking Like an Economist with Prof. Jonathan Gruber (S1:E9)

Thinking Like an Economist with Prof. Jonathan Gruber (S1:E9)

MIT OpenCourseWare

2. Cyber Network Data Processing; AI Data Architecture

2. Cyber Network Data Processing; AI Data Architecture

MIT OpenCourseWare

1. Artificial Intelligence and Machine Learning

1. Artificial Intelligence and Machine Learning

MIT OpenCourseWare

2: Resistor Capacitor Circuit and Nernst Potential - Intro to Neural Computation

2: Resistor Capacitor Circuit and Nernst Potential - Intro to Neural Computation

MIT OpenCourseWare

14: Rate Models and Perceptrons - Intro to Neural Computation

14: Rate Models and Perceptrons - Intro to Neural Computation

MIT OpenCourseWare

4: Hodgkin-Huxley Model Part 1 - Intro to Neural Computation

4: Hodgkin-Huxley Model Part 1 - Intro to Neural Computation

MIT OpenCourseWare

18: Recurrent Networks - Intro to Neural Computation

18: Recurrent Networks - Intro to Neural Computation

MIT OpenCourseWare

3: Resistor Capacitor Neuron Model - Intro to Neural Computation

3: Resistor Capacitor Neuron Model - Intro to Neural Computation

MIT OpenCourseWare

15: Matrix Operations - Intro to Neural Computation

15: Matrix Operations - Intro to Neural Computation

MIT OpenCourseWare

13: Spectral Analysis Part 3 - Intro to Neural Computation

13: Spectral Analysis Part 3 - Intro to Neural Computation

MIT OpenCourseWare

16: Basis Sets - Intro to Neural Computation

16: Basis Sets - Intro to Neural Computation

MIT OpenCourseWare

20: Hopfield Networks - Intro to Neural Computation

20: Hopfield Networks - Intro to Neural Computation

MIT OpenCourseWare

8: Spike Trains - Intro to Neural Computation

8: Spike Trains - Intro to Neural Computation

MIT OpenCourseWare

7: Synapses - Intro to Neural Computation

7: Synapses - Intro to Neural Computation

MIT OpenCourseWare

19: Neural Integrators - Intro to Neural Computation

19: Neural Integrators - Intro to Neural Computation

MIT OpenCourseWare

5: Hodgkin-Huxley Model Part 2 - Intro to Neural Computation

5: Hodgkin-Huxley Model Part 2 - Intro to Neural Computation

MIT OpenCourseWare

6: Dendrites - Intro to Neural Computation

6: Dendrites - Intro to Neural Computation

MIT OpenCourseWare

17: Principal Components Analysis_ - Intro to Neural Computation

17: Principal Components Analysis_ - Intro to Neural Computation

MIT OpenCourseWare

12: Spectral Analysis Part 2 - Intro to Neural Computation

12: Spectral Analysis Part 2 - Intro to Neural Computation

MIT OpenCourseWare

11: Spectral Analysis Part 1 - Intro to Neural Computation

11: Spectral Analysis Part 1 - Intro to Neural Computation

MIT OpenCourseWare

9: Receptive Fields - Intro to Neural Computation

9: Receptive Fields - Intro to Neural Computation

MIT OpenCourseWare

10: Time Series - Intro to Neural Computation

10: Time Series - Intro to Neural Computation

MIT OpenCourseWare

1: Course Overview and Ionic Currents - Intro to Neural Computation

1: Course Overview and Ionic Currents - Intro to Neural Computation

MIT OpenCourseWare

The Power of OER with Profs. Mary Rowe and Elizabeth Siler (S1:E10)

The Power of OER with Profs. Mary Rowe and Elizabeth Siler (S1:E10)

MIT OpenCourseWare

This video lecture teaches coded imaging techniques, including Fourier transform, coded aperture, and wavefront coding, and their applications in photography and computer vision. The lecture covers the basics of coded imaging, including convolution, deconvolution, and point spread function, and discusses the challenges and limitations of coded imaging systems.

Key Takeaways

Take the Fourier transform of an image
Multiply the Fourier transform of an image by the Fourier transform of a box function
Create a binary sequence of shutter opening and closing
Use a ferroelectric LCD to implement the binary sequence
Truncate sinc function to reduce noise and artifacts
Adjust exposure time and filter width to improve image quality

💡 Coded imaging can be used to capture images with sharp detail recovery in software, and has applications in photography and computer vision, including motion blur, defocus blur, and extended depth of field.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA Developer

Related Reads

DPO vs RLHF: The Alignment Tax You Pay Without Knowing

Understand the alignment tax paid by aligned models like RLHF and DPO, and how it affects their reasoning capability

Dev.to · Vasileios

Why Every AI Deployment Needs a Bouncer at the Door

Learn why AI guardrails are crucial for safe AI deployment and how they act as a 'bouncer' to prevent potential issues

Medium · Machine Learning

Why Every AI Deployment Needs a Bouncer at the Door

Learn why AI guardrails are crucial for safe AI deployment and how they act as a 'bouncer' to prevent potential risks

Medium · Cybersecurity

The Invisible Cage: What the Evolution from Claude Sonnet 4.6

Learn how the evolution of Claude Sonnet 4.6 to Sonnet 5 reveals Anthropic's design approach to AI safety and its implications on systemic gaslighting

Medium · Data Science

Big Tech Is Turning Its Own Workers Into AI Training Data