ControlNet Depth Explained - Full Tutorial // easy stable diffusion ai

CreatixAi · Beginner ·🎨 Image & Video AI ·2y ago

Skills: CV Basics90%Multimodal LLMs80%Modern CV Models80%Prompt Craft70%

Key Takeaways

The video provides a comprehensive tutorial on ControlNet Depth, a pre-processor for estimating depth maps from reference images, and its applications in image-to-image synthesis and text-to-image generation using tools like ControlNet, Depth Midas, and Stable Diffusion.

Full Transcript

control net is one of the most powerful stable diffusion tools there are many pre-processors to choose from but today let's talk about depth with depth you can change image Styles copy poses change a person's gender modify material create text effects make up unique image compositions and so much more let's break it down by the way if you want to know more about control net and how to install it check out my full control net guide that I published on my blog creax a.com link is in the description so what is control net depth control net depth is a pre-processor that estimates a basic depth map from the reference image a depth map is a 2d grayscale representation of a 3D scene where each of the pixels values corresponds to the distance or depth of objects in the scene from the OB observer's Viewpoint basically when we look at the output you will notice a gray scale white is the closest to you the viewer and black is the furthest away usually a background you can clearly see it here with the figure being white some of it has a grayscale gradient like the skin cuz the face is a little bit further than her leg then the leaves are slightly darker as they are behind her and black for the background there's so many things you can do with with control net but let's look at a very simple workflow this is just the basics and more interesting uses are coming up later so pick your model I use dve animated write a prompt choose your settings and now open up the controlnet tab drop your reference image select enable and choose depth if you want to see depth in action check mark allow preview and run preprocessor the exploding icon leave the other settings as they are for now now and run notice how similar the images are especially when it comes to face and composition if you've seen my previous control net Kenny video you would have noticed that the results there were quite different in every image that's not really the case here this makes the dep3 processor incredibly effective in retaining spatial information from the reference image while reimagining certain parts in this new AI generation now let's talk about pre-processors there are four depth pre-processors available for you to choose from in the drop- down menu depth Midas is the classic depth estimator that's used by default with a lower amount of detail depth Zoey is a pre-processor with a moderate level of detail it's between Midas and leras depth Lis has more details than the other two and depth l++ generates the most amount of detail notice how depth mirors and depth lus Plus+ have the sharpest object outlines in comparison with the other two but is having more details always better I'm not so sure let's say I wanted to change a few specific color choices from an AI generation I made so I dropped the generated image into text to image so it has all of the same settings prompts and parameters but I made a change from blue hair I decided to choose pink hair and baseball cap becomes white baseball cap the question is is which of these pre-processors can create the most close to the original generation With The Changes I've made to the prompt I'm using control nut balanced in this case and here are the results that showed up here's the depth map for Midas and the result Zoe with the result Lis with the result and lus Plus+ with the result you can see the gradual increase of details with meas being the lowest and lyas Plus+ the most just look at the leaves in the background as you can see more detail is not better in this case as the closest to the original with my changes included is probably me's depth all right but what if I wanted to make a significant change a huge change you know like say turn her into a boy so let's test that I changed the prompt from one girl to one boy and added a negative prompt girl woman once again here are the results masas Zoe Lis and lus Plus+ once again we can see more detailed progression from left to right but as to our goal of making a gender change well they all did pretty well with my favorite being Midas and Zoe because lus and lus Plus+ have a lot more detail in the clothing in the background something that I don't find that is needed in this image and just makes it look way more AI generated than the first two so perhaps you you should choose the depth preprocessor depending on how close to the original image you want the new one to be and on the intensity of details that you prefer now you might be wondering if you can use it with sdxl models and the answer to that is yes you just need to download any depth XL model from hunging face make sure to select it in the drop-down and run your generations for my sdxl checkpoint I currently use the diffusers XEL depth mid but any from the link will do which link you might ask well all the information in this video is also published as an article on my blog crea.com this article is entirely dedicated to control net depth you can check it out if you'd like link is in the description and there you will also be able to find a link to download depth Excel from hunging face and the same link will be present in the YouTube description as you can see in the three examples I was able to use three different ASD XL models with control net for very different and beautiful results now let's talk a little bit about image to image you can do so many things with it and control net but today I want to show you how to transform a picture you like into something else while keeping the same composition maybe colors definitely the outline so we need to scroll down to the control nut section in the image to image tab in automatic 1111 to use this powerful feature with other images if you want to keep the same colors and only change the image slightly then use a lower denoising strength they higher the denoising strength the less your reference image will impact the new image if you want to use control n with the same image that you dropped into image to image tab you won't need to upload it again inside control net just check mark enable then change the prompt to whatever change you want to make and hit generate for example gold statue paper statue here I'm using a reference image of a dog so I can change it to a bear a cat a bunny I added all of these one at a time to the original prompt I also used 0.9 denoising strength and 0.8 control net depth weight for most of the images shown on screen now though I had to lower the control net depth weight to 0.4 for the cat and the Wolf to get pointy ears instead of the Droopy Dog ones so playing around with the weight will have a big impact on how believable it changes in some cases how cool are these results of using image to image with control net depth I think these AI generation look absolutely stunning another powerful feature of using control net depth is to generate text you can use it to create text based images that look like something other than the type text or fit nicely with a specific background here are some examples on the screen you would just need to prepare a text file with whatever text that you want after you've prepared your text files you can go to text to image and do the usual settings you know write a prompt pick the size do all that stuff and then drop in the text image inside control net and select all in the pre-processor choose invert and in the model choose depth so we're using depth a little bit differently here the first prompt is pasta noodles on a white table and the second is pretzel on a marble table and these are the results that you can achieve or here with a boo example with a little ghost there's so much more you can do with text effects there's a lot of tips tricks and hacks that I'm sharing in my AI text effects video check it out in the popup on the top right corner and it should also be in the YouTube description too so if text is something you're interested in I suggest looking into that video you can also use control n depth to replicate specific poses and while you do have the option to use open pose it might not always work correctly or the way you wanted it to so you could always try and do depth you know the process is exactly the same as before and you can use text to image or image to image depending on what you want to go for so here I generated an image of a girl all right and now I'm going to drop in a different image to use as a reference for a pose you can see what it looks like with a depth map when we hit generate we have a kind of a similar image to the one before with a pink background white suit everything but the pose is now different you can also use other tools like magic poser web for example and pose a character however you want take a screenshot drop it in and use that as a reference and that is a different option or you can use a photograph of somebody doing some action you know there's a website called unsplash it's one of the many with royalty-free photos so I grabbed a photo from there and used the depth map from it to generate an image and it looks pretty good but there's a lot of things we can still fix so you can always drop it into inpaint and then for the control net drop the original image so when you're in painting something like hands or feet the information is still taken from the depth map from the original image and it will just help you fix your AI generation that much better and here is the final result another exciting thing you can use control net depth for is for some really unique compositions so add some creativity to the mix and grab an image that you generated or a photo reference of something completely unrelated to what you want you know in this case I found this building on unsplash and I really love the composition in this this Photograph I decided to use it as a starting point for a bunch of illustrations and let's see how it turned out so I generated a bunch of like you know girls and gods and there's an eye and a broken mirror and a rocket ship there's so many things that can be done with it but they all have this like similar composition to the photograph something that control nut probably wouldn't have done without this input I suggest you give it a shot because you never know how it's going to turn out and sometimes it's really fun to play with like when I used this photo of buildings and said that they're supposed to be bread the result is not some perfect Masterpiece but it was definitely fun to play with if you use control andapt in some other way that I haven't mentioned here I would love to hear from you so let me know in the comments down below by the way I'm creating a whole playlist on control net even if it's going to take me a little bit of time please be patient with me this is not my full-time job I just just really enjoy exploring AI worlds and learning together with you other than that thank you so much for watching this video and hey if you're still here check out this one next by the way to all of those who say that my voice sounds like AI I don't know if I should take it as a compliment but I can promise you that this is my voice though I suppose if I wrote it in an AI voice generator AI could say the same thing so it's up to you to decide till next time bye

Original Description

Stable Diffusion ControlNet Depth EXPLAINED. This is a full tutorial dedicated to the ControlNet Depth preprocessor and model.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

This video tutorial explains the ControlNet Depth pre-processor and its applications in image-to-image synthesis and text-to-image generation. It covers the basics of depth estimation, image generation, and deep learning, and provides practical steps for using ControlNet Depth with different models and tools.

Key Takeaways

Pick a model and write a prompt
Choose settings and open the ControlNet tab
Drop a reference image and select a depth preprocessor
Test different depth preprocessor settings for desired level of detail
Use image to image feature to transform a picture into something else
Prepare a text file for text to image generation
Use invert pre-processor and depth model for text to image generation

💡 ControlNet Depth can be used to estimate depth maps from reference images and apply them to various image-to-image synthesis and text-to-image generation tasks, allowing for creative and unique image compositions.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA Developer

Related AI Lessons

FREE AI Sin City Photo Generator — Turn Any Photo Into High-Contrast Noir Art (2026)

Transform any photo into a Sin City-inspired high-contrast noir art using a free AI generator

Google makes Gemini’s personalized image generation free for all US users

Google's Gemini personalized image generation is now free for all US users, allowing them to generate images informed by their Google data

The Next Web AI

Gemini’s personalized AI image generation is now free for U.S. users

Gemini's AI image generation is now free for U.S. users, allowing for personalized images based on user interests and data

WebP's Compression Secret: How a 1MB PNG Becomes a 200KB WebP

Learn how WebP compresses images more efficiently than PNG and JPEG, and why it matters for web development

Dev.to · swift king

OpenAI Kills Sora then Descends into Chaos