Chip Placement with Deep Reinforcement Learning | Paper Explained

Aleksa Gordić - The AI Epiphany · Beginner ·📄 Research Papers Explained ·5y ago

Key Takeaways

The video explores the use of Deep Reinforcement Learning for chip placement, as presented in the paper 'Chip Placement with Deep Reinforcement Learning'. It delves into the novel RL approach, combining electronics and machine learning for efficient and highly performant chip placement routines. The video also discusses the application of graph neural networks, PPO algorithm, and supervised learning to predict rewards and optimize chip placement.

Full Transcript

what's up in this video i'm covering this paper called chip placement with deep reinforcement learning uh basically yesterday i saw this tweet from anna goldie where they've just published the nature version of this work and i thought covering it because it's a really interesting piece of work it combines both like the electronics which is something i have some background in as well as machine learning more concretely like reinforcement learning and graph ml so it's exciting combination and super super impactful work so i thought covering it and basically what they achieved is they managed to train an rl agent uh combined with some graphml for learning like nice representations uh in order to do displacement much faster than uh what previously was necessary so like they needed like humans usually need like multiple weeks to do this and now it's like less than six hours and it's achieving even super human performance looking just at the metrics like uh the area so basically yeah we want to minimize the area because we can cram more more transistors inside of it and more compute does and also they want to minimize uh other things like uh power consumption as well and under certain constraints like density congestion like you don't want to have too much to too many wires overlapping on a single like on a certain piece of like on a certain surface of the chip so that that's a high level overview now let me let's dig into the paper and yeah see what's what they what they did so in this work we present a learning based a learning based approach to chip placement one of the most complex and time consuming stages of the chip design process unlike prior methods our approach has the ability to learn from past experience and improve over time so yeah that's that's the main thing basically uh obviously uh like prior to this work there were many other approaches uh mainly some optimization based approaches like simulated annealing and some more like heuristics and yeah a bunch of different theory was developed but none of those were actually deep learning algorithms and they say here our objective is to minimize the ppa so as i mentioned so the power consumption the performance whereas they mean basically timing and we'll see what that exactly is like you basically want to make sure that certain uh like uh like logic gates in your like flip-flops in your in your uh like circuit uh get certain signals on time so there's these like timing uh constraints you have to obey and we'll see more details a bit later and obviously the area again like more transistors you can cram into the chip the more compute you get and that's better so and we showed that in under six hours our method can generate placements that are super human or comparable on modern accelerator netlists where netlist is just a fancy way of saying like a graph that basically tells you how each of the components on the chip are are connected uh together so whereas existing baselines require human experts in the loop and take several weeks so that's the the main like the the main like uh point they're driving here they they automate this they make it even superhuman and they make it faster which is huge and so why do we care about this like basically well they say it here the world is moving towards specialized hardware to meet ai is exponentially growing demand for compute however today today's ship take years to design leaving us with a speculative task of optimizing them for the machine learning models of two to five years from now and the logic is pretty simple if you can cut down the time you need to to loop over different designs you can experiment much faster and it's the same thing as in deep learning where you wanna make sure that your training uh length is like uh yeah shorter obviously like the same thing that happened with imagenet training for example they're there we needed like multiple weeks and then finally there was a paper like a couple of years ago where they've trained like imagenet like a vision model classifier in like half an hour or something it's probably even better today so basically when you short those you can iterate much faster obviously and that's that's really really precious so one one nice example of these uh custom chips is maybe tesla that's the first thing that popped onto my mind they have a customly designed computer version uh chip set on their vehicles and it's not just computer version they're they're also using like different sensors not just the rgb cameras they're also using raiders et cetera so the only thing they're not using and that's lighter because yellow musk as you probably know rants about those all the time and says that we only need to use rgb cameras but that's a topic for another for another video um okay so the objective is to place a netlist a graph of macros for example static ram which is a basically you probably know of dram that's dynamic random access memory these are uh srams are usually non-volatile so they are even more expensive and they're uh they're more performant than drams but they do take more areas so that's a trade-off standard cells logic gates such as nand snores and x-horse you probably heard the facts or if you did anything related to cryptography onto a chip canvas such that power performance in area are optimized while turning to constraints on placement density and routing congestion um yeah for the time being like you can just have an intuitive understanding of these and i guess you already have like basically you don't want to have multiple wires too many wires going over a certain cross section of the certain grid cell on your like canvas okay uh that's the high level overview now let's dig into the architecture let me show you the high level like pipeline how this looks like so basically what i do is the following so they they treat this they pose this as a reinforcement learning problem and as you can see here the status model like you have this chip canvas so currently you don't have anything on it and basically as is already mentioned you can treat your chip as a hyper graph and that means the following so let's assume i'll just draw macros as these squares so we'll have a couple of macros and macros can be as you saw like sram or it can be memory control or whatever so these are somehow connected like this and then you have these standard cells i'll just denote them as circles and basically what was so it's a hyper graph so that means basically that a single edge can connect to multiple nodes that's the definition of the hypergraph and uh now so imagine you have a bunch of these you have hundreds of macros you have millions of standard cells and you want to somehow place them onto this fixed size cheap canvas okay and so what you do is you somehow and we'll see the details a bit later using this graph neural network you kind of embed uh this information about the connectivity and uh that will be your state and then you're doing the following so the agent takes that state and outputs an action and the next one is just take a macro so we'll be just focusing on placing these macros as we'll soon see these will be handled by some forced direct method which is some heuristic that was previously devised so yeah they're just kind of piggybacking on certain other methods to do that part so they place the the macro here and after that you you get a new state and then you just up with a new action where you place another macro onto the chip canvas and you do so until you you've placed all of the macros as you can see here and then from that point on you use this force directed method to place the standard cells so you will just take those cluster clusters of of standard cells and we'll place them somehow onto your cheap canvas and finally you get the reward and as you can see immediately the problem is pretty hard because it's you get zero rewards all the way down and only at the end do you receive the actual reward so it means the credit assignment problem is going to be pretty tough i mean the horizon is not that big they won't have more than let's say a thousand macros but like still it's a tough problem and this reward as you can see let me just decrypt this for you so this is a half perimeter wire length we'll see what that exactly is but it's a roughly it kind of takes into account the the length of wires you have in your in your chip and that kind of correlates positively with power consumption and with area and we'll see those a bit later okay so that's up that's a high level overview um and i mentioned it here uh the so our our true reward is the output of a commercial uh electronic design automation tool including wire length routing congestion density power timing and area however rl policies require hundreds of thousands of examples to learn effectively as you already know and so it is critical that the reward function be fast evaluate ideally running in a few milliseconds so in order to be effective these approximate reward functions must also be positively correlated with the true reward so the point the whole point is they they can evaluate the real reward the true reward but that's super slow so you have to use these eda tools and so they decided to just take something as a proxy which is fast to compute a and b positively correlates with the things they care about and we mentioned those already so that's the ppa stuff okay so while we treat congestion as a soft constraint so lower congestion improves the reward function we treat density as a heart constraint masking out actions grid cells to place nodes onto whose density exceeds the target density so in order to understand this part we should go and see how the policy network looks like let me scroll down here i'll skip some stuff and then i'll let her return okay so this is how the the model looks like end-to-end uh basically we have two portions here the first part is the the representational learning part so we have something called edge graph neural network some graph neural network device specifically for this work and then we have your basic value net and uh basically policy network so this is your ecto critic uh rl agent they are using ppo that's proximal policy optimization algorithm um published by uh open ai a couple of years ago so that part is pretty standard now let's see this part is pretty interesting and novel so let's see how they manage to parse the netlist so the hypergraph of your components into something into some representation that contains valuable information that can be used to yeah figure out the where to place the next components so the way they train this whole pipeline is using supervised learning and the whole goal is to learn how to predict the reward so that's how they train this part so they they take this part in insulation and they train it the following way so again you have this hypergraph so you have macros you have some like standard cells they are somehow connected it's not that important okay like this and what to do as you can see here so if you're familiar with graph ml uh if you're not i i made a whole playlist and i'll link it somewhere here you should check it out but like i'll try to explain it on a high level and basically it's pretty simple so every single node in your graph has a feature vector associated with it so whoops let me delete that okay so you have a feature vector like this you have a feature vector here every single node will have a feature vector associated with it the second thing is in this paper actually even the edges have feature vectors associated with them so you have something like this so every edge and every node has a feature vector so what i do is as you can see here so the the graph neural network takes in the connectivity information and all of those feature vectors and it kind of pro it somehow processes these so what graph we'll see we'll see the exact equations here a bit later but like basically you just somehow combine those you manipulate them using fully connected neural networks you aggregate the neighbor the neighbors so for example this one will somehow take uh all of its neighbors so this is the neighbor and this is the neighbor so we will take uh those feature vectors like this one and this one and it would somehow combine them with this one so it's a it's it's using this relational inductive bias of connectivity and it's accumulating those feature vectors to create a new novel or presentation so this is going to be something maybe of a smaller dimension let's call it v prime if this was vector v this is v prime and they are going to output that as the output of of of g and n so as you can see here edge edge embeddings so that's a set that's that's a matrix that's of dimension e times basically d sub e so e is the number of edges in your graph and d sub b is just the uh like feature vector dimension so once you have that matrix what i do as you can see here they do this reduce mean operation so they just take a mean over like element-wise mean over all of these feature vectors and they get a single future vector that's the graph embedding okay that somehow roughly captures the dynamics the the information about this graph and the second thing they do is they take the current macro id so that's the current macro you want to place onto the chip canvas and they so again here you have a table that's n times uh d sub v so n is the number of nodes d sub v is the dimension of your node future vector and they just take a single one so they just take a dv vector here okay and extract it and finally they have some metadata information like the like the length of the wire on your chip etc so they have some metadata information and they concatenate all those they feed it through the fully connected layer and that's it so now as i already told you they're going to train this thing to to kind of uh understand uh to predict the reward of of the of the placement and i'll get into details a bit later but that's that's that's that's it in a nutshell so once you train that you use the representations it learned to uh basically uh as a kickstarter for these policy and value networks and they are trained as i said using ppo so that's it and now i'm gonna slowly start digging into the details and later we'll see the experiments and the results they got okay uh before i do that i think it will be quite useful for you to maybe understand a little bit about some electronics 101 and you'll hopefully have an epiphany moment okay so as software the same thing goes with hardware basically you have a bunch of abstraction layers so in software we have like on the lower lowest layer you have firmware you have something like what called hardware abstraction layer you have drivers and you have operating systems then you have compilers and you have the all of those systems that translate the higher level languages into like something that your computer can parse and understand and finally you have the application layer so obviously a bunch of abstractions and the same thing goes for for hardware so you you have like you have one on a really high level you have like a you have a cpu and you have maybe like the direct memory controller so that's just a device that can to whom uh cpu can just uh delegate uh basically certain memory uh intensive operations and then you have memory like you have a memory block here and let's say both are connected to memory both the cpu and dm are connected to memory so if you zoom in into any of these components like take maybe a direct memory controller you'll end up with two types of circuits so basically you have combination logic and you have these uh circuits which are called sequential logic which contain the memory uh in the system so they are stateful so if you take a look at this one so what's the main difference the main difference is the following you if you have this circuit this is called inverter basically if you feed it to zero you will output one and if you tweeted one it will output the logic value of zero and the thing is if uh the thing is with these uh combination logic gates is that as soon as you change the input so if you go from zero to one the output will go from one to zero really quick like maybe in 20 picoseconds or something now the trick with these flip flops with these memory circuits is so this is a d flip flop it's a simple one of the simplest flip-flops out there so basically what happens here you can change this as much as you want like from zero to one nothing will change here it's gonna keep one here uh until you have uh the clock so the optic of the clock or the dawn take of the clock and basically so hopefully you know what like a clock signal is it's it's something that drives your computer it's a basically digital signal that goes from zero to one with a certain frequency and on each of these optics or downticks something is going to happen so on each stick the sequential logic gates so these these guys like flip-flops will change their state so if you keep one here and then the optic comes at that point the one will propagate here on the net and if you change it now to zero and the downtick comes for example then this will go to zero so these are keeping there they have the memory that's that's that's that's the main trick so having said that let me go uh even even deeper so let's take one specific gate like an end and what nan does is the following yeah you've seen these if you've ever done any programming so you know about the end operator uh this is how you write it down in python and basically this is the the opposite of an of an and gate so you just not end and basically as you can see here looking at the table uh you know that uh and gates give you one only when you have one and one as the input and all of the other values will be zero and nand is just the opposite of that as you can see it's inverted uh and gate and now uh this is the like uh abstracted view of this gate and if you dig a bit deeper and take a certain implementation which is called uh cmos implementation of the nand gate which is basically complementary metal oxide semiconductor doesn't matter but basically there are different ways you can implement these abstract logic gates the same thing as you can do with if you have a like image classifier you can implement it using a neural network using svm using i don't know like even like k nearest neighbors whatever your favorite classifier is the same thing goes here you have a bunch of different ways how you can implement the very same thing and that is the nand gate so here we have a specific logic you can use ttl there are many different logics you can uh implement now let me show you why this works so basically let's take as an input one and one and when i say one you again have to implement one somehow on your system so let's say for example um that we have a voltage supply that's maybe i don't know like 0.9 volts and so on this system uh one will be defined as something close to 0.9 volts and zero would be defined as something close to zero volts so that's how you implement zero and that's how important one and so if i feed once here for example and so that's 0.9 volts and if you feed one here as well and as you can see this small circle again that means it's inverted so we'll be adding zero onto this gate and zero into this gate and what happens is because you brought a huge voltage here you're gonna toggle on this transistor and so the current will start to flow here and here and you're gonna block these two because you brought zero voltage here and so as you can see you're basically making a direct connection to the output to this thing here and let's say this is ground so it's zero volts and basically that means your output is zero volts which is zero and let's see so one one you get zero the same thing goes for zero zero so if you have if you have uh if you brought bring zero zero you're going to basically turn off these two and you're going to turn on these two and so you're basically bringing the voltage supply directly to the output which is 0.9 volts which basically translates into logic one and yeah so that's that's how it works that's how your computer computers basically work and going even deeper like uh people have to design these and you can see here so this is how you you basically place these metal wires and this is how you implement an end gate and finally you have the last step and that's the production you you actually do some like photolithography using like light waves you uh engrave these into a silicon and this is how a nand gate multiple nand gates look like uh actually manufactured and yeah i had some prior experience myself uh i was studying electronics and computer science and so this was one of the radio frequency circuits i designed which just takes your signal like maybe your audio and basically modulates it so it's on a higher frequency and then you can send it over the wire like a radio signal and yeah so that was the uh basically one of one of how computers work uh you have these logic gates you have these sequential gates and everything is made out of them so now you know how those macros are all built and hopefully like if you found this useful please comment down so that i know for future videos whether uh this is interesting or not uh okay let's see some of the simplifications they are making to make this thing work and make it like computationally trackable because the problem is huge as you can imagine uh because of the like millions of standard cells a million like and hundreds of macros so first things first they group millions of standard cells into a few thousand clusters using this tool h mattis so uh that way they basically make it easier for those forced direct methods to do the placement like themself okay and second thing they do they discretize the grid so the the chip canvas to a few thousand grid cells okay so that means they won't have obviously so you have your chip and it would be super super hard and probably wouldn't be any like meaningful uh to to just make it continuous or to make it really like small resolution so what they do is they kind of divide these into uh i think it was 128 by 128 cells so they have something like this and they can only place uh macros on a specific grid cell so that's the second simplification they make the second the third one is when calculating wire length we make the simplifying assumption uh the wires start from from the standard cell blah blah blah that's not important and finally to calculate the routing congestion cost we only consider the average congestion of the top 10 percent most congested grid cells so again certain heuristic which makes it a bit more feasible to calculate this congestion which is used later on in the reward function so um this part uh the the half perimeter wire length is something they use as a proxy as i mentioned for training the rl agent and it positively correlates as they show with area and other metrics of interest so let me just explain you briefly how it works and the formula is pretty simple as you can see so imagine you have a wire like this and this is your wire and basically what it do is they for this particular wire the this number would be equal to this thing here plus so let's hold that height and this would be width so that's the half parameter right so the perimeter would be two times that and you can see here so max over the uh basically x-axis and minus min over the x-axis same thing for the y-axis and if you had something a bit more complex somewhere that goes like this maybe like this and finally ends up like this did again do the following so they would find the max x value so for this particular wire so that would be maybe this and they'd uh subtract this thing and they would do the same thing for y so it's a kind of uh you find the bounding box and you you find the half parameter of that bounding box that's it and then they just kind of combine uh all of those across different wires okay that's the that's that part okay so um certain heuristic additional heuristic and you can immediately see because uh and i forgot to mention that this work was actually used to uh design the newest tpus in google so that's the version five if i'm not wrong and so you can immediately see this is this is in production already so there are lots of details but like the high level picture you already got it pretty much hopefully and yeah let's see some additional details so to select the order in which the macros are placed we sort macros by descending size and break ties using topological sort so that's additional heuristic they they use in order to make the action space smaller and to make this problem computationally tractable so they just sort the the the macros so those blocks in your chip by size and then the policy will just take the the next one and then learn how it just needs to decide where to place it it doesn't need to learn which one to pick uh in each step okay so i i forgot to mention one detail here uh hopefully you already understood it but like if you you have this policy and the output is 128 by 128 so those are the great cells i mentioned previously and so it just outputs a distribution across all of the grid cells and where the distribution is where the probability is highest so we just take arc max that's where you're going to place the macro okay so it would be really simple if but like additionally you need to mask certain areas and so for example if you have your your your chip canvas and you've already placed the chip the the macro here so let's say that's this part and let's say that's this part uh then what will happen is they'll have this uh density this mask which will be used to kind of mask out certain uh like uh let's call them pixels uh basically to zero so uh for example if if now the uh if the network if the policy network outputs i don't know maybe 0.0 here and we already have that part occupied this mask will just uh zero it out to zero and uh you'll just distribute the probability across the other uh valid cells so that's that part and yeah that's what i mentioned here so basically to meet this constraint during each rl step we calculate the current density mask a binary m times and matrix that represents grid cells onto which we can place the center of the current node without violating the density threshold criteria before choosing an action from the policy network output we first take the dot product of the mask and the policy network output and then take the arg max over feasible locations so that's the process i just explained so we also enable blockage aware placements such as clock straps by setting the density function of the blocked areas to one so block it just means that basically you explicitly say hey i don't want to have anything placed around this macro for example or around this wire and you explicitly encode that into that mask and then you can't place macros on top of that location okay and that's just minor detail finally the loss function here is the wire length i already explained the half perimeter wire length calculation they have the congestion and the density uh constraint and they'll just include this density part while lagrangian so you'll just have a weighted sum of congestion density and wire length so as for the loss they are just summing over all of the net lists in the data set and they just want to maximize the expected cumulative reward that was the reward part let's now jump into explaining the the graph neural network so so our intuition was that a policy network architecture capable of transferring placement optimization across chips should be able to encode the state associated with a new unseen chip into a meaningful signal at inference time we therefore propose training a neural network architecture capable of predicting reward on new netlists with the ultimate goal of using this architecture as the encoder layer of our policy network so that was the part i explained here the encoder part and now they say here we therefore created a data set of 10 000 chip placements where the input is a state associated with a given placement and the label is a reward for that placement so basically they used uh for example like those ede eda tools and they found the ground truth rewards and so they basically had this simple supervised regression task of uh on this with this graph neural network uh called the edge graph neural network uh for the computation itself uh it's fairly simple so let's say you have two components here you have a macro here and it's connected to this macro here you have a feature vector here and you have a feature vector here as well as here okay so what i do is they kind of uh so they process using a neural network a fee like a simple fully connected uh layer they process these vectors and then they just concatenate those and they uh place it again they just feed it through another another uh fully connected layer i'm not quite sure what this part means here because like it doesn't make any sense to concatenate the learnable weights with your with the output here but yeah basically yeah you you somehow combine them and then what you do to calculate the node uh feature is for example if you have multiple edges going into this node you just take all of those edge features so you take this edge you'll take this edge and you just aggregate those and do a mean across those feature vectors so the computation is pretty simple again check out my graph ml videos this will be pretty self-explanatory okay this is just your ppo objective function and i won't go to go into details there let's see the final results so uh on the x axis we have different tpu blocks so they have block one through four and this uh open source risk five cpu uh chip and you can see here the placement cost so you wanna have as low of a cost as possible so using zero shot uh basically it means that um they don't fine tune uh that so basically what they do is they take the the whole system you saw with the policy network value network and graph a graph neural network and they train it on maybe for example 20 uh 20 different uh chips and now they they they take that network and just apply it onto a new chip like this tpu block one and if they don't find tune it on the block itself it's just zero shot so as you can see zero shot is pretty decent and um then if you fine tune it for two hours you get even better performance in 12 hours you get even better performance but the point is if they train from scratch so without that previous information from those other 20 uh chips so it takes after 24 hours it still sucks pretty much it's it's still much much worse than compared to this this one that's fine tuned for 12 hours so this means it generalizes to new never before seen chips so that's the benefit of a learning system okay here what they show is again training from scratch versus fine tuning as you can see basically only after a couple of hours the the the pre-trained one gets a really low loss whereas you need much more time to to to train i mean this is pretty pretty obvious bottom line you want to pre-train these are all agents and then later just fine tune them onto the new chip you're trying to place okay that's the whole point there uh here they compare so i mentioned data sets so i mentioned the largest one the one that has 20 blocks but i also trained them on the medium one which is a subset of this one and they also train it on a small one which is a subset of the medium one so again you can see that going to more data again gives you like lower placement costs and looking at this curve here they are basically what you see here is uh the rl cost on an unseen chip and you can see that the with the big data set you you get to overfitting much much slower and you get better performance than than basically using uh medium or smaller data sets so it's both harder to overfit and you get you get lower costs so again more data nothing new there but yeah these are the placements they get so here you can see these purplish blocks are macros and here in the middle this like a blob of nothingness is a bunch of basically those standard cells and here here is the zero shot placement and here is after you fine tune them you can see there is much less conjunction a congestion you can see so basically these red and orange colors you know that there is a huge overlap across those layers and here it's much better a place and you can also see a much more regular structure here compared to the zero shot one but this is still fairly fairly good for something that's that's working on a chip never before seen um here are some other results so here they blurred this because of the confidentiality but again you can you can see the main thing is that humans when they when humans design chips you can see there is a lot of structure you can you can see some regular shapes here whereas this rl agent just outputs this blob and this donut shaped uh thing here it outperforms the the human design on many different metrics and that's cool uh and they mentioned it here like the wire length and the congestion and you can see that it's both everything is lower using this rl agent approach okay uh plus table and i think we're done here is this one so they comp they compare it with this replace method from 2019 it's one of the state-of-the-art placement methods and again looking at different metrics like timing area power wire length congestion it outperforms replace which not only has lower performance it also like a higher cost it also uh many times uh just creates invalid placements because there was too many um like either too many like the congestion was too big or uh there was some invalid placement was made and so it's it's harder to to to to get it to work as a summary they mentioned here so for example the process of standard cell partitioning row and column selection as well as selecting the order in which the macros are placed all can be further optimized in addition we would also benefit from a more optimized approach to standard cell placement so what i see here is potential future work the first things first is that the heuristic i just mentioned was that they are using the the the biggest micro first and then the smaller one blah blah blah so they argue that uh basically that process can also be learned and they can uh squeeze out more performance by doing that uh the second thing is instead of using those force directed method that can also be kind of thrown into the rl framework approach and yeah that's pretty much it so great results uh super effect will work so ai is building chips which make ai more powerful so we have a pretty pretty nice loop here and it will be exciting to see what else is there to come so if you found this video useful subscribe share it out and until next time bye bye [Music] you

Original Description

❤️ Become The AI Epiphany Patreon ❤️ ► https://www.patreon.com/theaiepiphany ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ In this video I do a deep dive of the "Chip Placement with Deep Reinforcement Learning" paper. They devised a novel RL approach for an efficient and highly performant chip placement routine. ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ ✅ Paper: https://arxiv.org/abs/2004.10746 ✅ Tesla's custom chip: https://www.youtube.com/watch?v=Ucp0TTmvqOE&ab_channel=Tesla ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ ⌚️ Timetable: 00:00 Paper keypoints 01:21 Why do we need faster chip design? 05:30 High-level pipeline explained 09:20 RL agent explained 14:15 How are computers made 21:00 Assumptions 22:35 Wirelength and macro heuristic 24:55 Macro placement masking 27:50 Edge Graph Neural Network details 29:50 Results 32:25 Visualizations 34:25 Further research ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ 💰 BECOME A PATREON OF THE AI EPIPHANY ❤️ If these videos, GitHub projects, and blogs help you, consider helping me out by supporting me on Patreon! The AI Epiphany ► https://www.patreon.com/theaiepiphany One-time donation: https://www.paypal.com/paypalme/theaiepiphany Much love! ❤️ Huge thank you to these AI Epiphany patreons: Petar Veličković Zvonimir Sabljic ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ 💡 The AI Epiphany is a channel dedicated to simplifying the field of AI using creative visualizations and in general, a stronger focus on geometrical and visual intuition, rather than the algebraic and numerical "intuition". ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ 👋 CONNECT WITH ME ON SOCIAL LinkedIn ► https://www.linkedin.com/in/aleksagordic/ Twitter ► https://twitter.com/gordic_aleksa Instagram ► https://www.instagram.com/aiepiphany/ Facebook ► https://www.facebook.com/aiepiphany/ 👨‍👩‍👧‍👦 JOIN OUR DISCORD COMMUNITY: Discord ► https://discord.gg/peBrCpheKE 📢 SUBSCRIBE TO MY MONTHLY AI NEWSLETTER: Substack ► https://aiepiphany.substack.com/ 💻 FOLLOW ME ON GITHUB FOR COOL PROJECTS: GitHub ► https://github.com/gordicaleksa 📚 FOLLOW
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Aleksa Gordić - The AI Epiphany · Aleksa Gordić - The AI Epiphany · 45 of 60

1 Intro | Neural Style Transfer #1
Intro | Neural Style Transfer #1
Aleksa Gordić - The AI Epiphany
2 Basic Theory | Neural Style Transfer #2
Basic Theory | Neural Style Transfer #2
Aleksa Gordić - The AI Epiphany
3 Optimization method | Neural Style Transfer #3
Optimization method | Neural Style Transfer #3
Aleksa Gordić - The AI Epiphany
4 Advanced Theory | Neural Style Transfer #4
Advanced Theory | Neural Style Transfer #4
Aleksa Gordić - The AI Epiphany
5 Anyone can make deepfakes now!
Anyone can make deepfakes now!
Aleksa Gordić - The AI Epiphany
6 What is Computer Vision? | The Art of Creating Seeing Machines
What is Computer Vision? | The Art of Creating Seeing Machines
Aleksa Gordić - The AI Epiphany
7 Feed-forward method | Neural Style Transfer #5
Feed-forward method | Neural Style Transfer #5
Aleksa Gordić - The AI Epiphany
8 Alan Turing | Computing Machinery and Intelligence
Alan Turing | Computing Machinery and Intelligence
Aleksa Gordić - The AI Epiphany
9 Feed-forward method (training) | Neural Style Transfer #6
Feed-forward method (training) | Neural Style Transfer #6
Aleksa Gordić - The AI Epiphany
10 What is Google Deep Dream? (Basic Theory) | Deep Dream Series #1
What is Google Deep Dream? (Basic Theory) | Deep Dream Series #1
Aleksa Gordić - The AI Epiphany
11 Semantic Segmentation in PyTorch | Neural Style Transfer #7
Semantic Segmentation in PyTorch | Neural Style Transfer #7
Aleksa Gordić - The AI Epiphany
12 How to get started with Machine Learning
How to get started with Machine Learning
Aleksa Gordić - The AI Epiphany
13 How to learn PyTorch? (3 easy steps) | 2021
How to learn PyTorch? (3 easy steps) | 2021
Aleksa Gordić - The AI Epiphany
14 PyTorch or TensorFlow?
PyTorch or TensorFlow?
Aleksa Gordić - The AI Epiphany
15 3 Machine Learning Projects For Beginners (Highly visual) | 2021
3 Machine Learning Projects For Beginners (Highly visual) | 2021
Aleksa Gordić - The AI Epiphany
16 Machine Learning Projects (Intermediate level) | 2021
Machine Learning Projects (Intermediate level) | 2021
Aleksa Gordić - The AI Epiphany
17 Cheapest (0$) Deep Learning Hardware Options | 2021
Cheapest (0$) Deep Learning Hardware Options | 2021
Aleksa Gordić - The AI Epiphany
18 How to learn deep learning? (Transformers Example)
How to learn deep learning? (Transformers Example)
Aleksa Gordić - The AI Epiphany
19 How do transformers work? (Attention is all you need)
How do transformers work? (Attention is all you need)
Aleksa Gordić - The AI Epiphany
20 Developing a deep learning project (case study on transformer)
Developing a deep learning project (case study on transformer)
Aleksa Gordić - The AI Epiphany
21 Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained
Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained
Aleksa Gordić - The AI Epiphany
22 GPT-3 - Language Models are Few-Shot Learners | Paper Explained
GPT-3 - Language Models are Few-Shot Learners | Paper Explained
Aleksa Gordić - The AI Epiphany
23 Google DeepMind's AlphaFold 2 explained! (Protein folding, AlphaFold 1, a glimpse into AlphaFold 2)
Google DeepMind's AlphaFold 2 explained! (Protein folding, AlphaFold 1, a glimpse into AlphaFold 2)
Aleksa Gordić - The AI Epiphany
24 Attention Is All You Need (Transformer) | Paper Explained
Attention Is All You Need (Transformer) | Paper Explained
Aleksa Gordić - The AI Epiphany
25 Graph Attention Networks (GAT) | GNN Paper Explained
Graph Attention Networks (GAT) | GNN Paper Explained
Aleksa Gordić - The AI Epiphany
26 Graph Convolutional Networks (GCN) | GNN Paper Explained
Graph Convolutional Networks (GCN) | GNN Paper Explained
Aleksa Gordić - The AI Epiphany
27 Graph SAGE - Inductive Representation Learning on Large Graphs | GNN Paper Explained
Graph SAGE - Inductive Representation Learning on Large Graphs | GNN Paper Explained
Aleksa Gordić - The AI Epiphany
28 PinSage - Graph Convolutional Neural Networks for Web-Scale Recommender Systems | Paper Explained
PinSage - Graph Convolutional Neural Networks for Web-Scale Recommender Systems | Paper Explained
Aleksa Gordić - The AI Epiphany
29 OpenAI CLIP - Connecting Text and Images | Paper Explained
OpenAI CLIP - Connecting Text and Images | Paper Explained
Aleksa Gordić - The AI Epiphany
30 Temporal Graph Networks (TGN) | GNN Paper Explained
Temporal Graph Networks (TGN) | GNN Paper Explained
Aleksa Gordić - The AI Epiphany
31 Graph Neural Network Project Update! (I'm coding GAT from scratch)
Graph Neural Network Project Update! (I'm coding GAT from scratch)
Aleksa Gordić - The AI Epiphany
32 Graph Attention Network Project Walkthrough
Graph Attention Network Project Walkthrough
Aleksa Gordić - The AI Epiphany
33 How to get started with Graph ML? (Blog walkthrough)
How to get started with Graph ML? (Blog walkthrough)
Aleksa Gordić - The AI Epiphany
34 DQN - Playing Atari with Deep Reinforcement Learning | RL Paper Explained
DQN - Playing Atari with Deep Reinforcement Learning | RL Paper Explained
Aleksa Gordić - The AI Epiphany
35 AlphaGo - Mastering the game of Go with deep neural networks and tree search | RL Paper Explained
AlphaGo - Mastering the game of Go with deep neural networks and tree search | RL Paper Explained
Aleksa Gordić - The AI Epiphany
36 DeepMind's AlphaGo Zero and AlphaZero | RL paper explained
DeepMind's AlphaGo Zero and AlphaZero | RL paper explained
Aleksa Gordić - The AI Epiphany
37 OpenAI - Solving Rubik's Cube with a Robot Hand | RL paper explained
OpenAI - Solving Rubik's Cube with a Robot Hand | RL paper explained
Aleksa Gordić - The AI Epiphany
38 MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | RL Paper explained
MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | RL Paper explained
Aleksa Gordić - The AI Epiphany
39 EfficientNetV2 - Smaller Models and Faster Training | Paper explained
EfficientNetV2 - Smaller Models and Faster Training | Paper explained
Aleksa Gordić - The AI Epiphany
40 Implementing DeepMind's DQN from scratch! | Project Update
Implementing DeepMind's DQN from scratch! | Project Update
Aleksa Gordić - The AI Epiphany
41 MLP-Mixer: An all-MLP Architecture for Vision | Paper explained
MLP-Mixer: An all-MLP Architecture for Vision | Paper explained
Aleksa Gordić - The AI Epiphany
42 DeepMind's Android RL Environment - AndroidEnv
DeepMind's Android RL Environment - AndroidEnv
Aleksa Gordić - The AI Epiphany
43 When Vision Transformers Outperform ResNets without Pretraining | Paper Explained
When Vision Transformers Outperform ResNets without Pretraining | Paper Explained
Aleksa Gordić - The AI Epiphany
44 Non-Parametric Transformers | Paper explained
Non-Parametric Transformers | Paper explained
Aleksa Gordić - The AI Epiphany
Chip Placement with Deep Reinforcement Learning | Paper Explained
Chip Placement with Deep Reinforcement Learning | Paper Explained
Aleksa Gordić - The AI Epiphany
46 Text Style Brush - Transfer of text aesthetics from a single example | Paper Explained
Text Style Brush - Transfer of text aesthetics from a single example | Paper Explained
Aleksa Gordić - The AI Epiphany
47 Graphormer - Do Transformers Really Perform Bad for Graph Representation? | Paper Explained
Graphormer - Do Transformers Really Perform Bad for Graph Representation? | Paper Explained
Aleksa Gordić - The AI Epiphany
48 GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation | Paper Explained
GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation | Paper Explained
Aleksa Gordić - The AI Epiphany
49 VQ-VAEs: Neural Discrete Representation Learning | Paper + PyTorch Code Explained
VQ-VAEs: Neural Discrete Representation Learning | Paper + PyTorch Code Explained
Aleksa Gordić - The AI Epiphany
50 VQ-GAN: Taming Transformers for High-Resolution Image Synthesis | Paper Explained
VQ-GAN: Taming Transformers for High-Resolution Image Synthesis | Paper Explained
Aleksa Gordić - The AI Epiphany
51 Multimodal Few-Shot Learning with Frozen Language Models | Paper Explained
Multimodal Few-Shot Learning with Frozen Language Models | Paper Explained
Aleksa Gordić - The AI Epiphany
52 Focal Transformer: Focal Self-attention for Local-Global Interactions in Vision Transformers
Focal Transformer: Focal Self-attention for Local-Global Interactions in Vision Transformers
Aleksa Gordić - The AI Epiphany
53 AudioCLIP: Extending CLIP to Image, Text and Audio | Paper Explained
AudioCLIP: Extending CLIP to Image, Text and Audio | Paper Explained
Aleksa Gordić - The AI Epiphany
54 RMA: Rapid Motor Adaptation for Legged Robots | Paper Explained
RMA: Rapid Motor Adaptation for Legged Robots | Paper Explained
Aleksa Gordić - The AI Epiphany
55 DALL-E: Zero-Shot Text-to-Image Generation | Paper Explained
DALL-E: Zero-Shot Text-to-Image Generation | Paper Explained
Aleksa Gordić - The AI Epiphany
56 DETR: End-to-End Object Detection with Transformers | Paper Explained
DETR: End-to-End Object Detection with Transformers | Paper Explained
Aleksa Gordić - The AI Epiphany
57 DINO: Emerging Properties in Self-Supervised Vision Transformers | Paper Explained!
DINO: Emerging Properties in Self-Supervised Vision Transformers | Paper Explained!
Aleksa Gordić - The AI Epiphany
58 DeepMind DetCon: Efficient Visual Pretraining with Contrastive Detection | Paper Explained
DeepMind DetCon: Efficient Visual Pretraining with Contrastive Detection | Paper Explained
Aleksa Gordić - The AI Epiphany
59 Do Vision Transformers See Like Convolutional Neural Networks? | Paper Explained
Do Vision Transformers See Like Convolutional Neural Networks? | Paper Explained
Aleksa Gordić - The AI Epiphany
60 Fastformer: Additive Attention Can Be All You Need | Paper Explained
Fastformer: Additive Attention Can Be All You Need | Paper Explained
Aleksa Gordić - The AI Epiphany

This video teaches how to apply Deep Reinforcement Learning to chip placement, using graph neural networks and PPO algorithm to optimize placement routines. It also covers the importance of combining electronics and machine learning for complex tasks. By watching this video, viewers can learn how to design and implement efficient chip placement routines using novel RL approaches.

Key Takeaways
  1. Train an RL agent with graph ML for chip placement
  2. Combine electronics and machine learning for complex tasks
  3. Apply graph neural networks to embed information about connectivity
  4. Use PPO algorithm for RL training
  5. Fine-tune neural networks for zero-shot learning
  6. Optimize chip placement using supervised learning
  7. Apply multimodal learning for chip placement optimization
💡 The novel RL approach, combining electronics and machine learning, can achieve superhuman performance in chip placement routines, outperforming human design on many metrics.

Related Reads

📰
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
📰
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
📰
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
📰
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning

Chapters (12)

Paper keypoints
1:21 Why do we need faster chip design?
5:30 High-level pipeline explained
9:20 RL agent explained
14:15 How are computers made
21:00 Assumptions
22:35 Wirelength and macro heuristic
24:55 Macro placement masking
27:50 Edge Graph Neural Network details
29:50 Results
32:25 Visualizations
34:25 Further research
Up next
Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom
SumanTV Classroom
Watch →