Data Visualization in Data Science | DataHour | Analytics Vidhya
Key Takeaways
The video discusses the importance of data visualization in data science, covering its history, types of graphs, and best practices for creating effective visualizations, with tools such as Python, Matplotlib, Plotly, and Tableau being demonstrated.
Full Transcript
foreign visualization in data science my thanks for talk was that hosts for making the stock possible I trust a couple of words about me as it was already set I am book Grand Master in in hold the first place in the ranking for some time so I have some experience in data visualization and currently I am working at the senior data scientist at Karim and use that resultation at my job too so I have some practical experience in Industry too uh let me present you the content of my talk first of all I'll speak about better reservation in general why is it useful how can it be used for everyday tasks and what is it then I'll speak a bit about the history the history of data visualization is a huge topic so I'll just highlight several important points nowadays people usually use specialized software for data visualization and I'll show examples and when possible the pieces of code for it uh there are a lot of types of charts which people use for data visualization so they have various ways of being categorized and I'll show these ways um next if you are doing that visualization like for yourself or infrequently then you could use just default ways of doing it but you aim to do it professionally you would need more skills and I will talk about the skills is that previous points were more about the theory even though with some pieces of code and next I'll talk about practice I'll show several examples of taking a mediocre visualization and improving it we will see how can they be improved step by step and then I will show some practical examples of visualizations used for machine learning specifically for models for interpretation and so on so what is that visualization if we use a formal definition then data visualization is the graphical representation of information and data Yeah so basically we are humans our life our eyes are drawn to colors to shapes and patterns so data visualization uses them and regrets our interest and tries to keep us interested in the message when we see a chart we usually can see the trends and developers if we see something we can easily easily internalize and understand it and this is how storytelling works this is much better than just studying at a huge spreadsheet with data and then selling nothing but it's not that easy you can't just slap a random chart on the data yeah to create an efficient that visualization we need to balance the form and the function for example if we have a practical bad playing graph it could be too boring to capture people's attention yeah on the other hand if you have Estonian visualization it may have may fail at delivering the right message information yes so the data and the visualization need to work together and there is not to combine in great analysis with great storytelling now I will show two practical examples where data visualization shines compared to these looking at simple back statistics this small data set is called as compressed that it worked it shows four pairs of X and Y they have different values and yet they have the same statistics and correlation as you can see in all the cases the mean value of x is nine the mean value of y is 7 7.5 when the correlation is the same yeah but if we look at the plots we see a completely different story yeah the first Port is something which we could expect of a Europe situation when I have some data or some random noise and but it's still currently yeah but if you look at the second and the third charts you can see that they are very different from the first one and of course there is the last plot which has a huge outlier and uh yes so so you can see that we have the similar data they have similar statistics but they look completely different of course when we have this small table we can see the problem almost immediately here but if we have some huge data some big data with million Scrolls it would be impossible to just understand something by looking at it so visualization would really help in this case yeah another fun example if this is the set it's called the resource doesn't yeah as you could guess from its name it has 12 data sets with the same stats so basically we have a 12 uh different data sets which have the same means the same statistics but as you can see they look completely different yeah so this is another great example of the importance of data visualization um as I said history about the visualization is a huge and vast topic yeah but it's not possible to speak about everything in my talk so if you want to know more about some important Milestones then you could read this slide yeah but I'll talk about two points first of all I want to show this awesome visualization yeah this visualization was done around two centuries ago and it's called the minor minor diagram it shows the losses suffered by Napoleon's Army in the 1812. here you can see six variables the size of the army the location in the two-dimensional surface this time the direction of movement and the temperature Yeah so basically when you look at the line the width illustrates the size of the army at a specific point of time the temperature axis shows the possible cause of the change of environment size yeah and you can also see where the Army was situated Yeah so basically you have six variables on the two-dimensional surface wrote in 1900 take a series that it could be the best statistical graph ever drawn of course since the time there were some other awesome graphs but this is still amazing for the time that Bishop was created uh speaking of two uh John 2K and Denver are two famous uh persons because they did a lot of progress data visualization one of them had a statistical approach for exploratory data analysis and the other one wrote the book the visual displays of quantitative information they paved the way for refining data visualization techniques so that can be used by simple people and not only by statisticians yeah in this book that also defines the principles for a fixed effective graphical display and they are still actually in use nowadays yeah there are a lot of great ideas but they can be received in the following points the first one is maximize data in Croatia the think ratio is the term of the also and he uses it a lot in his book yeah basically he was against using excessive decorations he performed simple and functional charts yeah another point is minimize life Factor so avoid distorting what the data has to say I will show several examples of it and there's the points I think are self-explanary so it's understandable now I want to talk about the software if we talk about some beautiful data visualizations they're usually done by hand or maybe by usual visual editors even Photoshop for example but usually it's too time consuming pretty inefficient and people prefer to use more practical software let's start with python yeah python is commonly used to programming language used for machine learning data science and many other things there are a lot of packages and MacBook leap is the most popular one for data visualization it's possible to do almost anything with it yeah also it would require a lot of code those people who already tried math complete know that it's very verbose sometimes you need to write a real result of code yeah but it's extremely flexible and versatile yeah once in it isn't very good at is making interactive clothes because the world's well it's possible to create them yeah it's possible but it will be very difficult and there are better tools for example this tool this is plotly this is another python library and it is what's created for interactive visualizations making pop-ups filtering zooming and many other things there is a huge ecosystem around it for example there is a dash for creating the dashboards yeah of course there are many different libraries for interactive visualizations but for example I like this one and some of my visualizations will have a code illustrating them so here you can see how to make a singular similar visualization basically you import the plotly you get some data as a point of the frame and you make a plot you define what data you use so what will be the high values what do the flow values and so on and then you make a plot so it's quite consistent yeah and not very difficult uh another Library which can be used both foreign I think it deserves a separate mention well it may be not very popular not as popular as possible but it's still very awesome yeah it's one of the few libraries that are based on the grammar Graphics grammar of Graphics is like a huge separate topic I can't discuss it now too but basically it's a framework which follows a layered approach to describing and construct visualizations of graphic and structured many when you define different parts of the plot in different ways so here is an example yeah it's very different from plotly but basically you also import this Library you have the data and you pass it to the code and then you define what you want to show you want to show Colors you want to show tool tips and you can Define that it will be interactive or not yeah so it I think it's about it's interesting properly yeah and very likely to use it now if we talk about visualizations and the programming languages we have to speak about r r is a widely used programming language for data science and machine learning and the digi plot 2 is the most famous library for visualization many people say it's the best that we can use for visualizations even if we compare iron python yeah so and you can notice that this button the previous ones are pretty similar yeah yeah and this is because digital plot 2 is also based on the gram of graphics and you can see that the syntax is even simpler yeah and many people prefer the syntax they say is that JoJo Port 2 is awesome and very easy to use and they can't use anything else well difficult to argue with them considering that we need someone to code for it right uh now while a lot of people use programming languages for visualization yeah there are many solutions to for making visualizations without them for example Tableau is one of the most famous tools for creating visualizations and dashboards yeah it can be you published a white audience it can be self-hosted and so on and so on it's very flexible it allows a lot of things to do and it's quite popular so if you don't want to use programming language yeah you can use this for example but there is another side of the spectrum so while Python and R have some Syntax for right for making visualizations that would be difficult it would be simple but there is that redress it's a library for JavaScript it produces Dynamic and interactive visualizations many stunning visualizations on the internet are done create with Nick but it's like extremely difficult to use it requires a lot of code it's very difficult so yeah you can you should use it only if you are ensure that you want to invest a lot of time when making that organizations but yes you will be able to create awesome things which you won't be able to do any other way yeah [Music] now let's talk about different types of charts yeah there are really a lot of them there are box plots error charts and so on I don't think it's possible to talk about each and one and every of them in details yeah I'll speak about them in a different way so if we have so many different types of we need to have some way to structure them to categorize them to understand which one should we use because if we know that there are like 50 different types of graphs then we maybe we won't have any idea which one to use in our case so there are many different ways to categorize them so here for example this nice visualization shows different types of plots which can be used for different types of data so basically this is categorization based on the nation of data for example if you have a single numerical feature yeah you can use some density photo histogram it's a basic approach and it's pretty useful now if you use two of them you need to show some kind of interaction and the Scatter Plots or box plots are great ways of doing when you have more than then it could be something difficult and you will have to use some Advanced types of codes stack the reports and so on but in practice they are rarely used so usually you just take one or two features and compare them now if you have categorical variables then there are different approaches I suppose bar plot is the most popular one you just make blocks which shows for example the account or the variable or some statistics of it yeah it's pretty popular it's pretty easy to use and it's extremely good and versatile now if you have more than there are some other types of visualizations and you can see them on the screen but I think that most of them are rarely used like well maybe if we don't speak about Trend diagram but things like shorts are currently unique but also you could have some like non-tabulary data yeah for example networks well it's possible to make to keep the network data in structured and table that are uh interview but they usually it's kept some different way and here you can see for example networks are doing the grounds that can be used to plot now it was about the nature of the data but now let's see if what sharp should be used based on what we want to show so basically if you want to show some distribution if you want to show the variable it's maximum minimum value how it behaves over time then you will use violence histograms Rich spots and so on if you want to show the correlation between two more variables you will use catapults for example and so on and so on so there are many ways of showing different things which you want to show and that was more about like uh what you want to show that was more about like just taking reservation and making yeah but if you want to make a profession of it if you want to work in that visualization as a specialist and use it yeah then you would need some much much more skills yeah I want to go into someone details but I'll describe them at least a week so for example that realization is self-explanatory yeah but for example if you want to make a dashboard or some other complex visualization you would need to know experience interface design ux and many other things because you need to understand how to present data so people are interested for example there are a lot of dashboards which no one uses because they have too many plots or the ports are not understandable yeah it's important to do it next point is storytelling because one thing is when you have some visualization you send it to someone and that's it another point is when you're making some presentation so for example you're showing the audience some thoughts and that is describes them in this case it's important to have storytelling when you capture the audience you tell the story and prove the points which you're making there are many other things like selecting good colors uh making the books accessible for different groups of people and so on and so on and one separate and very important thing is style guides so basically style guides are some standards for formatting and designing the graphs so basically they show what graphs should be used in specific cases what colors should be used what formation should be used and so on and so on usually there are some templates to make it easy for people to apply the science of the guide usually these tailgates are created to buy some organization to show that people work in this organization who made a similar plot yeah so for example they could think have things like logo brand colors language stones and so on and so on so it makes uh it easier for different people to make great visualizations without spending too much time on designing it and it provides some uniformity yeah so right now I talked a lot about the theory but now let's move a bit into practice so for example if we have some kind of not good visualization what are the general ways of improving it first of all we'll fix errors because there are some mistakes in visualizations and this is precisely what I will talk about a bit later next to make it look better because sometimes people choose bad colors sometimes people don't don't show the nested information another point is crypto audience for example if you show presentation for technical people you would show the information detailed analysis detailed numbers but if you show the presentation to some high-level audience to business people you don't need all the details you need to show the trends some general high-level information yeah so right now we will show several examples of taking mediocre pots in the province them most of them are from a wonderful book storytelling with dated by by called nothing so let's take the first one let's try to understand what do we see on the screen so we have some survey yeah it seems that people asked how do they feel about doing science and there are some groups of people yeah but I'm not sure what exactly they also was supposed to show in this visualization so let's see how can this be improved this is the same data but presented in a different way so what is what is the difference first of all we have a different title and it shows us like everything we have the title pilot program was a success so we know that this is about pilot programming we know that it was successful and it was complete yeah so it's much better much more understandable next we have a few recovers because when we have too many colors it's difficult to concentrate yeah here we have only several colors yeah we have only important info and it's more structure so basically let's see on the both spots again here we will see two pie charts and it's difficult to compare the categories which were worse which were good yeah and we don't really understand what was the message but here we can see that at first the people were like okay before the program and now we can see that most people are excited to kind of Interest so it's like this visualization it accentuates the information it doesn't distract you and it shows the main message right now this is why it's better next one is this so basically it's the quote about uh it demonstrates Effectiveness that's the most important consideration When selecting the provider yeah but basically we like have a lot of text on the screen it's difficult to understand what we should look at at first we have a lot of bars and we are not sure what which of them are more important yeah so what can be done by time you can see right now that the most important information is highlighted in the text so we now see that the code is about attributes yeah this the survey shows some results and which results are the most important yeah this is the source and we see three most important parts so basically here we attract with properly on purpose and track the attention of the readers to the parts of the port which are important yeah so it's really good idea to do it another case we have this plot Yeah so basically it's some kind of line chart of the time we see that something was received something was processed and that's it there is no other information so obviously we need some other information so what could we do now we can see the clear message that we have some action item please approve the value to people we show that the number of process and the received tickets were almost the same so all the problems were processed but after some event happened two people left we see that the real discrepancy so basically this chart shows that there was a problem it shows the consequences of this problem and it shows how can the problem be solved so it's like a perfect thing which is a call to action we know what happened we know what should we do yeah so it's like an amazing thing I think and the last example we can see that here we see two bars we show the population of U.S and the customers of some company and we have some segments with uh some of them highlighted yeah but for me this is like uh not very looking good looking how can this be improved yeah first of all this text they are aligned in the same way because we don't have this like credit team with different climate next we have less colors and the most important parts are highlighted so now we see or what is what are the differences between them yeah and it's much more aesthetically okay this was about improving some mediocre plots now let's talk about some things which you shouldn't do at all on the screen you can see a cargo notebook this is just me scrolling it as you can see it shows the distribution of some feature and that's it so we have we see like thousands or hundreds of plots they are almost the same they don't have any titles they don't have any comments they don't have any analysis so it's like it's very it isn't really useful so if you want to do something useful okay so in this case I would say that there are two approaches to the generalized data first one is for your own analysis and the second is for demonstrating your results when you're doing your own analysis it could be okay to make dozens of thoughts yeah but even in this case you should like make titles make descriptions for this box so that you could say understand them yourself on the other hand when you're presenting something within your Publishers notebook it is much better to have several plots which are made with scare which have explanations with share analysis and some comments yeah so this is uh this is what you should do instead of this visualization next point is that using the three-dimensional plots is often not very useful because it's like it's very difficult to compare the things so for example if we look at this plot if we look at the column view but it's this value can you guess it just by looking I would say it's for example 14 but if we pronounce lines it's certified so it's like this visualization can be misleading it could be difficult to compare the things so try to avoid it and the one more thing is like pie chart pie charts in general are very like hot debated topics some people say that no one should use them some people say that you should take care when using them but anyway in this case when we have a three-dimensional pie chart it's extremely difficult to compare the things like for example is B bigger than C I have no idea from this chart so like this could be useful like when the different categories have clearly different percentages yeah but when we want to compare them then it's like not very useful and this is like a pretty different topic if you look at this plot you could think that the difference was like extremely huge like something draws the values dropped and now is a much lower but if you look at the values you can see that the difference is around 10 percent yeah so the authors on purpose cut of the part of the visualization maybe he wanted to mislead some people and uh to make them think that the things are different from the reality yeah so please don't do it now I want to show a couple of examples of good visualizations here you can see the link at the bottom of the screen just do this visualization so basically I think it's like an awesome visualization it's first of all it's like pleasing to the eye it has nice colors next it has a lot of data so we have categories we have two two groups of people we have ages we have their values for some variable and also we have the highlight of the most important part of the graph so it's like this plot is concise it's aesthetically pleasing and it prevents information a lot of information is an understandable way yeah another plot from the same cargo notebook it's like it uses a very cool type of chart so basically when you want to show the quantities or percentages of some variable you could use some buyer charts in this case we have multiple variables multiple categories and we use a stacked bar chart but if you used we used like absolute values we would have bar charts with very different heights and it would be very difficult to compare them but here we can see that all the values sum up to 100 so it's very easy to compare them yeah so it's like an amazing tool showing how to compare different values for many categories no this is an extremely simple but at the same time like awesome chart it's a pair plot from Seaborn so it shows a lot of things it shows a variable distribution it shows clusters of the data based on the class it shows configure interaction so it's like extremely easy to use very aesthetically pleasing and shows the very useful information so for example we considered one of the classes is very difficult from the others yeah uh this is a completely different type of visualization it's a Kepler tool for visualizing High dimensional geographical data it was developed by Uber so if you have like a lot of the data a lot of geographical data which you want to show on the plot it's a perfect rule for this purpose because it supports showing like millions of points and it's very cool now now I want to show how to use visualizations for machine learning for interpreting the models for showing the models and so on this tool you can see a visualization from a library called shop so it's for explaining machine learning models uh you can see here that there are like multiple things shown on the screen when you see this Library it could be pretty difficult to understand it but after you get used to it it's like awesome because it shows a lot of information in a conscious way basically it shows that for example High although values of some variable how do they contribute to the Target variable so for example if we have some classification for a binary we have two classes and we see that blue values are on the left part of the screen then we say that the low is this value variable the more chances that the class is zero the higher this variable the high chance of this class is one some other variables are different yes they're open so it's like it's easy to put and it's very useful pointer opening things and that often is used for industry it's really a business key to understand it so it can be used and it very it really helps to explain people how the models work this is another type of Port it ports modern buildings so there are many ways to create more than weddings like Fast text Perth and many other things but usually they have a huge dimensionality like 300 values 700 values or more so you can't compare them directly but if you do some dimensionality reduction and put them in two-dimensional space you can see some awesome scenes so here you see different clusters you can see that some words belong to specific groups and you can understand them much better yeah uh next this is the the screenshot from weights and biases weights and biases is a cool library and like local new library it's like huge platform for login machine learning model training for example here you can see that someone made a heapers parameter of optimization for some gradient boosting model yes or maybe no I think it's for neural Nets so and the main thing is like you can see on the left part of the screen different experiments and you can select only several of them you can see how different keeper parameters influence the accuracy and you can see that some of them have have like high values of the first variable they have a high impact on accuracy some have low impact you can see that some of the parameters have like values and uh like some here parameters have a higher influence on the model performance are lower so it's better to optimize only some of them you can see the best performance you can see compare training plots for different brands so it's like an extremely great visualization and it's used in practice for comparing the models this is sharper game and it can be used not only for tabular data but also for some neural Nets so in this case it's for Imaging we can say that when we have this image it's a photo of an apple and the strawberry when we try to predict a strawberry we can see that some part of the image is very important like it's logical but when we try to predict some completely different classes then there is nothing useful in the image and we can see that nothing is highlighted another tool which is also used for analysis analyzing the images and the models for images use grotcom basically it shows activations on different layers of the network so you can see that for example we try to do something with an image of pizza and you can see that different parts of neural net highlight different parts of the images so this way you can see how some internal layers of neural network work yeah it's pretty cool and can be useful to improve the model performance this is a different tool it's called neuron it shows it can show the model structure it's the model agnostic it's like framework agnostic so you can use it for by torch for tensorflow for carrots for some other libraries you can use it for on a mix and for many other formats and it will show the structure of the neural net in details yeah so there are some inbuilt capabilities and different Frameworks for making visualizations of the structure but here you can see it's like presented in a very nice way and it's easy to look at it to share it and so on so it's like a great tool uh now this was most of the things that I wanted to say yeah you can see there are several pages with references so you can see the examples of the plots you can see the code from the plots which I may which I showed to you you can see some additional thoughts and uh some blog posts about them also if you are interested like in pursuing that visualization more seriously I would suggest joining data visualization Society it's a huge Society of professionals working in naturalization industry yeah I'm part of it and it's pretty interesting and cool so these were the things which I wanted to talk about like so that resolution is an interesting topic like it can be interesting both in theoretical and practical matters there's a lot of ways to make simple visualizations there are a lot of ways to make some awesome visualizations and it's up to you to decide where you want to draw the line and to make the plot yeah so these are my contacts I hope this talk was interesting to you and now I can answer the questions because if you're left in the chat do you want me to read out the questions or how do you want to do this I can agree with myself if you are oh okay it isn't sure sorry sir but uh I'd request you to read out the questions as well so that the larger audience knows what the question is about okay yes so I'll start with the questions from the chat and then I'll move to the Q a section okay sorry please uh I mean the questions will will take all the questions to the question and answer you needn't worry about the chat section okay sure so okay first one uh how do you connect data visualizations from python in to nice infographics so I think I showed the some examples of the plots in my presentations basically if you want to make some interactive plots you would use broadly or for example Tire if you want to use some simple mobile flexible plots you could use for example metal clip the examples we showed are pretty basic but it's really possible to make very nice infographics in MacBook leap but interactive is yeah it's more about broadly bokeh and Altair for example um the next question can we make visualization model of data like progressing of spreading disease and timeline on the map with real data or how we could do that or where we could find how to do that yeah it's pretty possible I would think that there are well right now I can think about two approaches so for example in Python there are two libraries which a good mention the first one is again probably another one is volume so following is a library for um showing visualizations of geographical data so basically it has a map and you can do anything on it and it supports uh supports different widgets and for example it it has some time access so basically you can make like a lot of visualizations for each point of time so basically you have your data set and you have a column showing some point of them for example the date and then you will be able to make putting volume where you will be able to show to move the slider of the time and show show how the disease is spread over the time and something similar could be done in broadly it isn't supposed to be used for with all this in geographical data but there are some plugins and support for it and I also use it for showing some geodata so it's pretty possible yeah you can find it for example living in cargo um the next question is what do we use nowadays for interactive dashboards uh as I already said for example in Python it's uh Dash in R it's uh would be shining also if you use Python right now there is an awesome Library called stream leap it allows you to make uh like to make websites very easily and of course uh you could include visualizations in them and the it's very easy to publish your like what you do and it's very easy to make like several tabs several plots several visualizations so I suggest trying streamlined it's widely used for showing some kind of analysis it's used in organization and I use them for example in my in my current company too yeah so that's it uh next question is I also worry about which color combinations to use in graphs uh do you have any tips and tricks about color combinations can I keep using same combinations for all my graphs thanks yeah so first of all if you look at the references in my slides you will see several links to the articles about selecting uh like colors for the visualization basically I would say is there are three General approaches the first one is that you use some color maps from the library so for example in R and in Python in most libraries you could select some color map so color map is uh is like it defines the colors used in the visualization and the Styles so you can just use one of the popular Styles and it will be fine yeah so you don't you won't be you won't have to select every color by yourself another point is when you select the columns by yourself each time for each plot it's very time consuming but it could be satisfying but again as you answered asked if it's possible to use the same combination for all the graphs yes it would be like your own style guide so you at first decide that for example you like using I don't know uh yellow and green colors on your plugs yeah then you would Define them in some settings of your library and you use them by yourself later so it will be like your own personal style guide and those people who make visualizations professionally often have their own style so it's like a great idea it's good that it's not about it uh next question how would you present sharp values for 121 features top to 80 covered by 77 of the feature I would say that it's like a lot of features first of all I'm not sure if you would really need all of them yeah because it's like too many fishes but if we really need to show all of them I would do something like that first of all I would calculate shape values then I would make two separate clothes one for example for top 10 or top 20 fishes because they're most important they have the most impact and the second thought for the rest of the features so basically people who don't have enough attention will just look at the first plot and people who have enough time and attention will look at the second clock so I think it will work um which visualization tool is best power bi Tableau operation libraries I would say that it depends on the purpose like it would depend on how many time how much time do you have at hand how much experience do you have these programming languages and what is used in your company so for example if you work in some company usually they already have some software for making visualization so for example if in a company it's commonly accepted that everyone should use Tableau or everyone should use power bi to present uh close to each other then like you won't have much choice you will have to use it yeah next point if you're comfortable with using programming languages then you of course can use right person libraries the only question is will be how would you show this plot so you can't expect people just to launch YouTube feature notebook and look at it you will either have to share screenshots or you will have to post your code in some way yeah so it's like really depends on the use case um the next question do you have a roadmap for becoming a databilization expert um I don't have really a roadmap but for example on one of my slides there was an information about which skills do you need for to become an expert in that visualization you could just use it like at the start and then research how to improve your skills in each of the areas I mentioned in this slide um next question is you showed us a lot of good examples but how can you create it often I see that such full charts are made with uh in this in the skin or something like that because Japan and Co are too limited what do you prefer yes this is a good point and I mentioned that the most beautiful visualizations are usually created by rent or in visual visual editors yeah but again for example I showed this uh several slides with cool data visualizations I think the for example that these visualizations like look pretty good and you can use you can make them importantly you can see the link at the bottom of the screen yeah but yeah if you want to make something perfect you would need to delve deeper so one of the options is using day 3.js you will have to learn JavaScript but you will be able to produce amazing database validation yeah uh if you want to you will have to do it by hand or by using some visual editor so like if you have a lot of time and like uh your job is about creating it for utilizations you will have to do it but if visualization is like a small a small or well not very small but like only one of the parts of a drop it's like much more efficient to use the libraries like visual button python or uh but broadly and just find a good color scheme to use to use and show [Music] um the next question is are there any resources to develop storytelling skill it's difficult to say because basically storytelling is like a combination of various skills first of all it's about creating the good presentation or good resultation itself another one is like having some good presentation skills capturing your audience making good accents noticing when your audience become like more bored and like like in living in them it's also about being able to make a good story so I would suppose that you need to understand what is storytelling is about which skills do you need to create a storytelling so like I said good presentation creating the presentation oratory skills uh understanding the audience and then work on each of the skills step by step yeah um next question about classical switch engineering very often there are similar steps can you advise some after email framework for visual validation I know Picard has something but maybe other as far as I know there is a pandas profiling it's a cool Library which will show a lot of visualizations for your variables most of them will be like monotonics new edit so like visualization only for single feature but it's pretty good it shows a lot of Statistics a lot of visualization so it's a great start so you could try it [Music] um next question I am fresh out of which visualization too you prefer to me then I would say just use Python and Seaborn for example almost it will be a good start uh and another question is uh sometimes I find it difficult to select the type of photograph can you elaborate or give more tips on when and which type of Charities visualization like for time series that which one to use and when we group the data then which one previews I had a section in my presentation about different ways of equalizing the data so like this for example if you have some cream series there is some specific types of visualization if you have like network data there are other types of visualization in the reference section you can find the links to this plot you will be able to see all the plot it's like I wasn't able to make a full screenshot because the plot is very huge yeah but you will be able to see which visualization should you use in which case so it's like uh you will be able to find this information by references [Music] um okay next question is which is better to show correlation cutter plots or heat Maps basically I would say they are both to good things but they depend on the nature of your data so for example if you have two numerical variables to continuous variables it's much easier to use the scatter plot between them yeah but heat map is I would say it's uh it's better when you have some categories so for example you have two categorical variables and you want to show the values of color across like different values of the combinations of these two features then it would be easier to use a heat map um so it was two questions next one what to do if I didn't find the influential factors in the date to know in order to predict the right decision and what is the best tool to know the factors [Music] um I would say that okay so I suppose it's more about machine learning so basically uh we need we need to know the most influential features in the data to make the model yeah uh I would say that if you want to start then you can simply train some model for example if you step over data you could just take some gradient boosting model or in the forest you train it then you use for example shop or some other library to see Fisher importance and you can consider this uh which is to be most influential then you could create some features and see that maybe the new features will be even better than the additional ones um another question this is a very beautiful presentation may we have it or code being used in the presentation uh yes of course I will share the link with the hosts and they will share it with everyone using email so you will have the link to the whole presentation and every content in it I see some people raise hands so how how do we manage it um most of the uh raised hands I mean the questions they had have been put into the Q a section um so that's pretty much it so okay if anyone has any questions you can again post them on the Q a and I will answer them yes guys we have a couple more minutes as well uh to anyone if you have questions if you do please post those questions in the Q a section um there were actually a couple of people's people who have raised their hands and I had asked them to put their questions in the Q a section so um and for all others uh just wanted to let you know we have put up a feedback poll please do fill in okay so you have a question here it's the question please post quote and the presentation it will be sent okay see guys we've repeatedly told you all that you'll get these files via a mail so don't worry about it um so um I guess that's the end of it uh thank you thank you Mr Luke and Uncle thanks for the session um on behalf of analytics video I'd like to uh thank you for your time and for you know delivering such wonderful knowledge um and also for being patient with all these questions we had I must say a bit more questions than we usually do um and I hope you have a wonderful day in Moscow I'd also like to thank our audience for being with us until the end so once again thank you sir and thank you also it was very interesting and it was nice seeing so many questions and being able to answer them oh we'll be glad you'll be happy have a great day engaging with my okay everyone thank you for being here uh hope to see you all again um again have a wonderful day ahead thank you [Music]
Original Description
Visualization is the best method to grasp the complex and hidden results from the data. Analyzing the visualizations is better than calculating data statistics and various plots and techniques can be used to do so.
In this DataHour, Andrey will share the history of data visualization. After which he will explain about different plot types and software which are being used for creating them. He will also demonstrate examples of visualizations used in practice for machine learning tasks.
Do subscribe to Analytics Vidhya channel & get regular updates on videos:
Stay on top of your industry by interacting with us on our social channels:
Follow us on Instagram: https://www.instagram.com/analytics_vidhya/
Like us on Facebook: https://www.facebook.com/AnalyticsVidhya/
Follow us on Twitter: https://twitter.com/AnalyticsVidhya
Follow us on LinkedIn:https://www.linkedin.com/company/analytics-vidhya
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Analytics Vidhya · Analytics Vidhya · 49 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
▶
50
51
52
53
54
55
56
57
58
59
60
The DataHour: Data Science in Retail
Analytics Vidhya
The DataHour: Anomaly detection using NLP and Predictive Modeling
Analytics Vidhya
The DataHour: Energy Data Science Project from Scratch
Analytics Vidhya
The DataHour: Explainable AI Need and Implementation
Analytics Vidhya
The DataHour: Google Cloud AI/ML
Analytics Vidhya
Prediction to Production in Machine Learning #machinelearning #prediction
Analytics Vidhya
Practical Applications of Data science in Ecommerce
Analytics Vidhya
How to tackle Overfitting?#machinelearning #overfitting
Analytics Vidhya
Building Data Pipelines on GCP #googlecloud #datapipelines #data
Analytics Vidhya
Hands-on with A/B Testing #abtesting #datascience
Analytics Vidhya
Efficient Implementations of Transformers #transformers #cnn #machinelearning
Analytics Vidhya
Modern Deep Learning Architecture #deeplearning #architecture #deeplearningtutorial
Analytics Vidhya
Key steps for Designing Artificial Neural Network (ANN) for Image classification #machinelearning
Analytics Vidhya
5 things you should know about Azure SQL #azure #sql #datahour #datascience
Analytics Vidhya
AI & ML in the Automotive Industry #machinelearning #ai
Analytics Vidhya
Building Machine Learning Models in BigQuery
Analytics Vidhya
NLP aspects in Telecommunication Industry
Analytics Vidhya
Practical Time Series Analysis
Analytics Vidhya
Fundamentals of Quantum Computing
Analytics Vidhya
A DAY IN THE LIFE of a Data Scientist (From waking up to working on algorithms)
Analytics Vidhya
Classification Machine Learning Model from Scratch
Analytics Vidhya
Knowledge Graph Solutions using Neo4j
Analytics Vidhya
Model Guesstimation (MLOps)
Analytics Vidhya
ETL Pipelines in Google Cloud Platform
Analytics Vidhya
Key steps for Designing Convolutional Neural Network(CNN) for Image Classification
Analytics Vidhya
Getting Started with AWS EC2 #amazon #aws
Analytics Vidhya
How to Use Azure NLP and Graph Databases for Intelligent Knowledge Mining
Analytics Vidhya
Certified AI & ML BlackBelt Plus Program #shorts
Analytics Vidhya
Visualizing Data using Python #machinelearning #visualization #python
Analytics Vidhya
DCNN for Machine RUL Prediction using Time-series Data #timeseries #machinelearning #datascience
Analytics Vidhya
M in ML stands for Math & Magic
Analytics Vidhya
An Unsupervised ML approach using Clustering
Analytics Vidhya
Customizing Large Language Models GPT3 for Real-life Use Cases #gpt3 #datascience
Analytics Vidhya
Model Parameters vs Hyperparameters - Techniques in ML Engineering #machinelearning
Analytics Vidhya
Practical MLOps #mlops #datascience
Analytics Vidhya
Data Engineering with Databricks #dataengineering #databricks
Analytics Vidhya
Multi-Objective Optimisation
Analytics Vidhya
When Airflow Meets Kubernetes
Analytics Vidhya
AI in Banking
Analytics Vidhya
Learn Convolutional Neural Network for Image Recognition
Analytics Vidhya
Extracting Value from Data
Analytics Vidhya
How to measure Marketing Channel Effectiveness
Analytics Vidhya
Transforming Lives | Data Science Immersive Bootcamp
Analytics Vidhya
Stock Market Analysis - AI driven approach
Analytics Vidhya
Become a Data Engineering Professional in 2022 | Future Trends + Skills Required
Analytics Vidhya
Ensemble Techniques in Machine Learning #machinelearning #ensemble #datascience
Analytics Vidhya
The Power of Visualization | Tableau Full Course | Analytics Vidhya
Analytics Vidhya
Demand for Data Engineers is on the Rise | Data Engineer | Analytics Vidhya
Analytics Vidhya
Data Visualization in Data Science | DataHour | Analytics Vidhya
Analytics Vidhya
Role of Optimization in Machine Learning & Deep Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Solving any Machine Learning Problem | Approach and Steps Involved
Analytics Vidhya
Topic Modeling Explained with Implementation | Using LDA in Python | DataHour by Arpendu Ganguly
Analytics Vidhya
Data Engineering in E-Commerce | The Best Case Study
Analytics Vidhya
Introduction to Classification using Azure Machine Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Introduction to Federated Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Diffusion Models for Generative Arts | DataHour | Analytics Vidhya
Analytics Vidhya
Master Google Analytics in 1 Hour | DataHour | Analytics Vidhya
Analytics Vidhya
Learn Hypothesis Testing | DataHour | Analytics Vidhya
Analytics Vidhya
A Practical Approach to Kaggle Competition | DataHour | Analytics Vidhya
Analytics Vidhya
Making AI work for Business | DataHour | Analytics Vidhya
Analytics Vidhya
More on: Data Literacy
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Müşteri Değerini Anlamak: RFM, CLTV ve Tahmine Dayalı CRM Analitiği
Medium · Machine Learning
Müşteri Değerini Anlamak: RFM, CLTV ve Tahmine Dayalı CRM Analitiği
Medium · Data Science
Müşteri Değerini Anlamak: RFM, CLTV ve Tahmine Dayalı CRM Analitiği
Medium · Python
Surviving the Data Science Behavioral Interview
Towards Data Science
🎓
Tutor Explanation
DeepCamp AI