Customizing a Graph Solution

Data Skeptic · Intermediate ·📊 Data Analytics & Business Intelligence ·1y ago

Key Takeaways

The video discusses customizing graph solutions for specific problems, with a focus on graph databases and their applications in various industries, including financial services and security. It highlights the importance of understanding the problem to be solved and choosing the right graph database, as well as optimizing data and queries for better performance.

Full Transcript

[Music] you're listening to data skeptic graphs and networks the podcast exploring how the graph data structure has an impact in science Industry and elsewhere Dave touched a core issue about working with network networks and graphs that has uh implications on the use of graph database and that is data exploration I guess most of us intuitively we use a scenario driven approach or an analytical approach towards our business we know what we need and now we're just trying to figure out how to get it in this case a graph DB is just another tool in the longtail of tools it can help for sure like in some cases make their runtime faster but I think the real power behind networks and graphs is in the exploration sure graphs are great tool for exploration then to find out what we don't know we don't know for example Dave talked about Community detection in order to find interesting clusters that might be involved in fraud here instead of trying to picture how a network of fraudsters looks like we can just use the network laws like Community detection to help us find the anomalies in the data and point us in the in an interesting direction if you find something interesting by using exploration you can maybe later put it into production maybe with or without a graph DB but networks I think their main power lies in exploration in that context I feel like the graph database is critical because if I want to do work on just my machine like in a typical case like let's say user data on a website I can do a random sampling of users and that population is representative so I can take a slice of the data even a small slice do some statistics on it no problem but it's not clear how I can random sample the graph I guess I would random sample the edges but it'll look like this very sparse graph a sampling doesn't tell much of the story of the graph I think I really need to query the full data set you can let the the algorithms the network algorithms find the interesting Parts in the network in the network for you meaning like doing community detection then asking which is the most Central Community in my data set and asking why is it the Central Community in my data set then dive into the communities and look for the central in in each community and ask yourself why are the the central nodes and these questions are the explorative questions right so your notion of exploration begins with the standard set of algorithms my instincts were I want to write ad hoc queries of the network uh but you would First Look for communities and centrality and these sorts of things is there a standard list or do you have to bring the domain into consideration when you pick which algorithms what I mentioned is the community detection and then using centrality measures I think it's the best practice it might not be relevant to each data set but I think it's a good practice to start with especially with large or huge networks right and small networks maybe we can skip Community detection but usually I I I'll start with that and then you can dive into the places that the network algorithms tell you here's something interesting take a look well here's something interesting take a listen my name is Dave Beck bger I am currently a graph architect at Amazon or on the Amazon NEP team at AWS but have worked in the past at a variety of different companies working as graphs both as a consultant as a uh implementer and and now as second company that provides it out and users so happy to talk with you today about my own opinions on how all these uh different things work you are also the author of graph databases in action to kick off can you give us the high level on the book yeah so graph databases in action is a Manning publication that that I worked on it came out a couple of years ago at this point when I started the process of writing a book it really was I wrote the book for myself five years earlier when I was learning graphs what what was it I wish that I had known when I started working with graphs and how you know using that to help make other people that uh make the process easier for others and could you share some details on the role of a graph architect what is your day-to-day like what I do is I work with customers our engineering teams and in you know in customers internal and external engineering teams and product management to really help people adopt graphs and be successful using them not only making working directly with customers on their use cases to make sure that they're actually able to be successful with it but also what what you know taking learnings from those different use cases bringing those back into our product and making sure our product can be you know making sure they get integrated in our product in a way to make it easier to use so for people who are at the starting line what is step one or question one for exploring graph databases I think the first question is what problem you know making sure you understand what problems you have and whether those problems are good fits for for graphs and then for graph databases because those are kind of two separate things you know there's a lot of especially as you know developers we use graphs all the time even though we may not realize it linked lists are a graph but you know understanding is the problem I'm looking at best solved by using this type of data and this type of data structure and then what are the requirements around that and that kind of leads you down the path of is a graph database the right solution or is something else do you find that people in general have good instincts or is graphs something you have to get the hang of the very beginning of getting started with graphs is something that us as you know as developers already are doing because we're using these sorts of data structures I think it the the the real challenge comes when you want to move beyond the little toy graphs into something that actually solves more business use cases you know starts you know problems that when you start to scale you start to scale not only data size but also queries the number of queries the complexity of queries the types of queries so I think that's usually where I see most people try and start to trip up so most organizations run a relational database you can correct me if I'm wrong I would be very interested to talk to a company that only ran a graph database and not a relational one but uh what uh what is the motivation to make that leap why can't I just cram everything into my traditional relational database that everybody knows I mean you you can and people do all the time it really comes down to it's like any other you know I always kind of look at graphs as really like similar to any other nosql technology they're all purpose built to answer a single or set of questions really when it comes down to it you know you use key value stores for very specific reasons there's a reason Cassandra or Dynamo DB are out there to solve very specific problems he use to solve very specific problem graphs are useful for solving very specific types of problems and really it comes down to the problems where the data you need to be able to answer those questions requires that you have to look through a lot of different connections um the one I always kind of use is kind of the canonical you know or say one of the canonical use cases is something like a social network if I have a relational database it's very easy to find who are my friends or who are my friends of friends but now try to extend that and say how are Kyle and I connected on LinkedIn that's a very difficult thing to be able to answer in a relational database because of the fact that the query languages and the data structures just aren't built to be able to efficiently hand handle those things so you know in a relational database you probably have to do some sort of recursive DTE type query which I don't want to think about you know think back to when I had to write those sorts of queries you know they're difficult to write they're difficult to maintain and they're really hard to make perform uh they're just not you know and those sorts of queries are the on that just relational databases aren't built for graph databases are so I I see them as you know graph databas is great way to augment the other pieces of data you have or the other data stores you have in your in your environment and there's lots of reasons to move to graph the example you gave is like what norder friendship connection do we have which yeah I don't even know how I'd approach that in SQL I've done it before it's about like that much sequel in very small font and it's like two lines in in a graph sure are there other common reasons uh entry points like are maybe I know page rank is a famous graph algorithm that some databases either have um natively or at least can support your usage of are there algorithmic cases that draw people to graph databases yeah I mean I would even extend it Beyond a little bit just around algorithms but there's also analytical type of use cases there's certain types of insight that you want to derive out of your data that is kind of uniquely able to be found by graphs page rank is a great example of something that's called a centrality algorithm you know determining how important something is inside of a network when we actually look at data you know data when we think about it in the real world it's all you know in most cases it's a network is actually the way you want to think about it even if it's something like a bill of materials bill of materials is a network it's things that are connected to other things those things have attributes associated with them and being able to look at that as a graph and being able to use that to help derive insight into something like what are the biggest risks in my supply chain well if you have all of these different bills of materials you can start to figure out which of the suppliers is not necessarily the most you know if you wanted to figure out which one of these parts is the most connected that's pretty easy even to do in a relational database it's a simple aggregation but being able to figure out which of these is kind of the Hub or the the central piece here where if I don't have it it's going to impact the most things is something where you can start to use a graph for that sort of thing that can be either done through algorithms such as page rank um Community detection is a very common one for Community detection algorithms I should say is very common in things like fraud use cases because you want to find groups of people that are gluing together so you know there there there sorts of unique at you know unique things are in that are just intrinsic to the data structures of graphs that allow you to kind of get out insights in ways that other data structures don't have you to do that and I'm saying data structures there specifically because you don't necessarily need a graph database to be able to answer this on there's lots of libraries out there like you know uh Network X if you want to do it you can do this sort of thing but doing that at scale is what it usually starts to be more challenging at scale within a certain amount of time so networkx is a great tool I've used it for a lot of projects but as you say it's not going to scale past my one machine at least not in a clean way is there a rule of thumb for when uh an organization's going to outgrow a tool like that I mean usually where I I start to see that I you know organizations wanting to switch towards a more graph database is one up two things either you're dealing with a scale of data you know well I should say when I look at any sort of data project there's kind of two phases there's the exploratory phase of figuring out what exists in there and then there's the productionize up now I now that I know what's there how do I actually move this into you know my actual data pipe lines because you know you you got to start with figuring out what's there and that usually you know you want to move towards uh you know you may start with networkx but you're only going to be looking at a small subset of your data it's great to start it's great to start to you know you can start to feel and understand what your data looks like then you want to be able to scale that up to see to make sure that the patterns that you're looking at aren't just in that small subset of data since you're just took a sample and that usually kind of that scale up is when I start to see people wanting to look at different Technologies to optimize those sorts of things on the exploratory side and then when you start to take this and you want to go to production you're probably you know if you have data at any reasonable scale you're going to probably want to look to something that's a little more optimized to be able to answer this thing than a networkx type of Library um you know networkx is a great library but it does have its limitations of being written in Python and memory and all of the things that you get with running things in Python well on the relational side uh I've on many occasions especially at bigger orgs worked with a database administrator someone whose sole job was you know add the indices debate the schema do Performance Tuning overall maintain the system is there an analog in graph databases I mean there at some organizations I have seen kind of a graph database Administration sort of uh role where it's usually you know usually it's a data you know a relational database administrator that also does this sort of thing but I think it's also a bit similar to most nosql technologies that I've had experience with where a lot of times those there it's a blurred line um because it's a bit of a non-traditional sort of role where a lot of times the development and operations team ends up doing a lot of those those pieces of work around the management of of that sort of uh system so it's actually one of the challenges with getting adoption is exactly that that that problem there aren't you know people with 30 years of you know experience writing graph databases in production whereas there's you know multiple people that have that that amount of work with a Rel Rel databases and in the relational world it's not a true dichotomy but I sometimes look at databases that are meant for real time transactional work versus those that are like a data warehouse for analytical work do you see the same distinction in graphs I do see the same distinction in graphs there's quite a few databases that are meant for either oltp type transactional workloads or olap type of workloads but I would also draw a little bit of a differentiator there in as much as there's sort of also there's databases that are meant for those things and then there are kind of other inmemory library or Frameworks around those sorts of things where you have these sorts of graph processing Frameworks up graph frames for you know spark if you want to do something like that that's really solidly aimed at the olap sort of Market it you know it's it runs on spark you're not run real Ty more cloes on spark you know you're running in a you're running inside of that framework and that framework is really you know a graph framework but not a database in the traditional sense I would say at least there's a I don't know if a ton of choices but certainly several choices one can make there's a lot of vendors out there what's the typical decision tree look at when people go and decide what to choose uh from a like a database perspective I mean I think it comes down to a couple of different things there's uh you know ease of use would be a very common one developer productivity cost is another very common one um as well as you know management is it a man manag you know is this something I have to manage myself or is this a managed service or does my preferred vendor even offer a managed service and then you know there's all there's those sorts of things too I mean I you know as I said I work at Amazon and one of those things one of the the the big selling points we have a lot of customers is just a managed service so you you know for some of a lot of those operational headaches that I talked about a minute ago don't necessarily exist for that many of the other vendors out there also support similar things but all of them are a little bit different but those those are usually the where we see people kind of start with you know ease of use operational headaches and cost I would say it's probably not much different than most than than than decision tree for most relational databases I guess the other point there is what do other people in your company actually use what do you have experience with internally and do you have a sense of the maybe industries that uh most often come to graph databases or perhaps it's the Departments like Finance versus something else who is the really um leader in bringing something like a graph database into an org I think that uh you know who really brings a graph database into an org probably depends a little bit on the industry that you're you're in I mean um but a lot of times it's the R&D type departments in certain industries as well as like the data engineering department so especially if you're working you know a very common industry that you use that has been you know probably one of the early adopters of graphs and graph technology and graph databases is like the financial services industry where they're looking for fraud I mean fraud is a kind of a very common graph use case so being able to have you know and but that usually involves your R&D departments looking at what does fraud actually mean in your your specific environment because you know having worked with a lot of different fraud customers out there there's General patterns of what fraud looks like but what fraud actually looks like is very different every single you know end user end every single department or even different parts departments within the same organization so you have a lot of like indeed data science people that are using that when you start to think about some of those other use cases um security graphs being a very common one being able to look at your maybe your something like your Cloud topology as a graph and be able to look for things like that a lot of times those come in from more of the I need to productionize my workload and be able to take the things someone has proven on their laptop and now actually scale it up to the size of uh data that we actually are dealing with so in the fraud use case clearly if someone's known fraudster their Associates their connections are probably suspect that's a rather obvious use case for it is there more to it how does a graph help someone detect fraud you know fraud in in a lot of ways what you're really looking for is it's an anomaly detection problem so being able to look at what is a common pattern within your of usage within your application and then being able to find the things that are outliers and using those as the starting point for an investigation because it usually still requires some level of you know graphs aren't you know that they're not magic they're not going to just tell you the answer what they're going to do is help point you towards the most likely places to start you may you know really common pattern I see is to be able to run something like okay I want to I have all of this transactional data and I want you know it contains things maybe like credit cards payment amounts addresses emails those sorts of kind of attributes that you would think most you know most e-commerce sites something like that might have I might take that data build My Graph out of it how these different things are connected and then run something like a community detection algorithm on it and then look at the distribution of those communities do I have a bunch of small disconnected communities or do I have a few large communities and then from there I start to look at what is it normal there because that's probably a good place to start my investigations if all of a sudden all of my you know all the communities or groups of people that are highly tyly connected in My Graph have an average of maybe five different nodes in it and this one has 300 nodes that might be of something of concern that I want to start looking you know I might want to start looking there to see what's going on because this seems like it might be collusion or it could be the flip you might have expect a lot of people to be interacting and you have things that are not that are off over here doing whatever they're doing and you know without hopefully not to be noticed when I think of use cases for relational databases it's sort of low level in my mind like if I'm going to do machine learning I might do some aggregations in the database But ultimately I'm pulling out my training data and I'm doing my ml outside inside the database even though I know some databases I guess have some interesting built-in stuff I think that's a common pattern do people using graph databases have the same approach or are there things like Community detection that's baked in as a service I mean I would say the answer there is both there are some some aspects of graphs um specifically if you want to look at ml with like graph neural networks uh that can provide some unique aspects to a machine learning uh type algorithm be able to do that but I also see people using graphs and graph databases as sort of a for lack of a better term like a graph featur store where you might be you might use some of those graph features a community an influence value as an input into you know into the feature Vector that you're giving to a traditional XG boost machine learning model something like that so I think it's a little bit from from both I mean a lot of the patterns that I've seen in relational databases are very similar to the to the usage patterns I've seen in graph databases just or I should say maybe flip that statement a lot of the graph database usage pattern are similar to The relational database usage pattern um just with a different technology kind of powering it and in your book you feature the Glutton app as a good walkthrough and example could you share some details for re for listeners yeah so so Glutton app was a a fictional application that we we me and my co-author came up with to basically demonstrate some of the common you know use cases and patterns that you get with that are very commonly used with with graph databases you know it has a social networking aspect to it of being able to say how are these two people related so you can build a kind of a friend Network for restaurant recommendation reviews and then I sort of like personalized recommendation sort of functionality around it to be able to say not only which of these restaurants is the most highly rated in the area in which I'm talking but also which of my friends have rated this restaurant kind of the the most you know the most highly to sort of kind of give that personalized you know a p a more personalized response back to the end user and could you share some details on how that could work as a template for someone wanting to follow a similar path so yeah the the kind of idea of the book and the idea we kind of did as we did worked through this different pieces was really talk about okay here are these different patterns that we've seen very commonly used out in in the applications you know that we've worked on in the past and being able to take those and simplify them down into kind of an end to end sort of example that you know is pretty relatable to people people have you know I would guess at this point most people have probably used Rec restaurant recommendation apps to being able to take that sort of common patterns and apply them to this specific thing but at the same point talking about them as common patterns and breaking down the different parts and pieces that are required you know to be able to take data model those those pieces Out start writing queries around it and then start to look at how you would optimize that which is a really kind of your general pattern applied to a specific problem so I find once I've built something I then know how I should have built the thing uh having been there and done that are there any sort of uh good advice for people starting out that uh you know maybe some pitfalls to avoid things like that you could share yeah oh absolutely there's a lot of pitfalls with that I've run across in my years of working with not not just myself but also working with customers but I think really the kind of the first key one I think about is really understand what problem you're trying to solve don't try to say I'm going to throw this graph I'm going to throw this into a graph and I'm just going to get information out that you know they they're like every other sort of nosql purpose built type of database where you really need to you know it it's an optimized view on top of a set of data so make sure you understand and optimize the data that or optimize the the graph that you build to solve the type of problem that you're actually trying to solve you know if you're trying to solve a social networking problem don't just throw all of the data in there just throw the data in there that you need in order to answer the question a common Pitfall that I see is people that have like click stream or iot type of data they want to throw it into a graph and it just it it's not adding value to to the types of questions you want to be able to answer so make sure you kind of put the data that you want into the graph that answers the questions that you want to get back out I guess this is along similar lines then how do I know if I even have a graph problem there's a variety of ways you can kind of think about this but the way I I Al I kind of like to think about it is is this a a problem that requires you to look at a you know how things are connected as much or more than what the things are themselves so in in you know just jumping back to kind of the The Social Network we talked about earlier it doesn't really matter who the people are that connect the two of us what matters is that they're connected and that you can draw this you know this line between a and b in graph kind of called a path between a and b in order to get that answer you know you don't necessarily care what all the individual pieces are in the middle and the other thing is a lot of times what we what we end up seeing people come to us when they have graph problems is they've tried this in other Technologies and it's too difficult like you start to try and write this sort of thing in a relational database and all of a sudden you're like five pages into a SQL query and still not able to be able to you know you're still not able to really get the answer you want out of it so really kind of having that that that moment of taking a step back and really say okay what is now that I you know to your point earlier I've written it wrong now what do I need to do to re to to build it correctly you know being able to take that step back think about the type of problem is this a type of problem that deals with connected data and connections between things so let's maybe take the case of the famous all pairs shortest path that most computer scientists should know anyone who took sort of a network science class should presumably know and also recall that there are multiple algorithms that maybe work better or worse in different situations does My Graph database solve that for me and and pick the right one or do I have some intellectual overhead and figuring out how to get an answer for all pairs shortest path I mean the question the answer there is it depends a little bit on what jbase you use so some of them um Implement multiple different algorithms some of them Implement only one of the different algorithms but when it really comes down to it uh I would say a lot of times the actual implementation at the graph database level is already significantly more optimize at anything you're probably ever yourself so in some ways you do have to kind of trust the experts on some of these things to be able to do it it's useful to understand it but it's sort of like a bubble sword you probably shouldn't no not you probably you should never write a bubble swort well nobody uses bubble sorts but you you get what I mean like you shouldn't write this yourself sort of thing there's libraries out there that people have gone and built these sorts of things to be able to actually use that every once in a while there are some you know you know use cases where you need to understand the the which of the auns to choose but in most cases I found for most use cases for most people the implementation that any library or database offers is probably going to be sufficient so let's say I invented my own social network and I wanted to add that feature where end degree friends I guess there's two ways I could do it I could do all pairs and some batch process and store you know uh person a person B in the distance or I could do some sort of real-time lookup as the page gets hit what are the pros and cons of those approaches I mean that that's that's similar problem we see all the time and this really comes down to a few things one how often is your data changing if your data isn't changing very often then being able to calculate all of these things ahead of time and store them into a table gives you a lot of advantages as far as it gives you a you know a constant time lookup you know if you store this data into a Cassandra table or you store it into uh you know Dynamo DB table or in a relational database table it's in one index lookup to find out this sort of information the downside of that is one you end up storing a lot of duplicated information I mean it's kind of a general nosql thing anyways but you know you're going to store you know your name is going to be associated with every single other person in the graph so this can kind of lead to a comb combinatorial explosion the other part of that is if your data is updating and changing it gets very difficult to do what's kind of called troop maintenance or what actually is the shortest path because you can't run this algorithm every single time you actually want to do that on the other hand if you have a data that is updating frequently something like that you may want to do a more uh you know you may want to calculate this at read time uh where you're where you're basically every time you're coming in you might have a little bit of a slower query you might not be able to do quite as much throughput but that data is going to be very fresh so it's really kind of a a a read versus write optimization and which one's right is very you kind of depends on what the use case you actually want to be able to do or you you know you need to be able to support there and in terms of scalability obviously there's managed Services if I hopefully want to make that somebody else's problem but even then I know some graph just the nature of graph some of the problems are NP complete those aren't going to scale very well what is a typical scalability I guess picture look like for a growing organization I mean I think there's a few different parts there there's the data scalability of just being able to actually store more and more data you know a million nodes 10 million nodes 100 million nodes all kind of bring their own different challenges but the other one that I probably run into more often than the data scalability problem is the query scalability problem when you're writing into a graph you're writing into this network and when you query a graph or or called you know traversing a graph moving from point A to point B in that graph you have to look at all of those different pieces in order to be all the different pieces that are connected in order to be able to find the answer so if you have U you know a branching Factor the number of connections uh that you kind of go out at each step is like 10 it means it goes you mean every single hop you're going out you're going up by an order of 10 so it goes from one to 10 to 100 to a th this get starts to touch a lot of data very very quickly and that's usually where I see more more challenges around the scalability is how much of the data you're touching in order to answer the question because the latency of any graph query really comes down to how much data you have to touch to be able to hit it to be able to answer it so there's a few other things but in is a general rule you know if I have to hit 10 times as much data it's going to take 10 times as long um so that's usually where the more problem comes in and it it leads to this sort of thing in a graph that's known as like a super which is a no that is has a disproportionate number of edges or number of connections to compared to others so you know the example I always kind of like to use is like you know you look at Twitter if you looked at my handle in Twitter I probably I don't know how many people I have you know that are follow me but it's going to be a relatively small number if you compare compare it to like Taylor Swift you know she going to have uh orders of many orders of magnitude more people so if I'm running the same query that says like find me all of my friends okay I'm going to go out you know a few hundred nodes or something like a few hundred edges to answer that question she's going to go out a few hundred million so even though it's the exact same query the starting location is going to impact the performance of that query quite drastically you know the other part we see with that is the number of times I've had people be able to say like but I'm only returning one value why is it so slow well because you had to touch hundreds of millions of things in order to return one value those are the sorts of scalability challenges you get the the the common place that that shows up is when you want to return results ordered and as people are expressing whether they're analysts or Engineers getting those queries written up I know there's a variety of languages but there's also a movement towards I guess gql a sort of a standard do you have a perspective on the state of graph query languages you know just kind of from a standard perspective there's two basic data models there's rdf which has a w3c standard query language called Sparkle um and then there's the property graph data model which is probably the one that I see more more people using that's the one we used in in in our book uh and there's a couple of different standardized or specifi open specification languages out there there's uh you know Tinker pops Gremlin there's open Cipher there's proprietary ones that that that uh you know neo4j Cipher things like that and then there is this movement towards gql um it was only ratified I want to say last year it's it's relatively new it's going to be very interesting to see how different vendors you know start to move towards implementing gql inside of their databases I'd say at least on the property graph side all of them kind of have their pros and their cons um you know Gremlin which is the the version we use in our book is a very powerful language but that power comes with complexity especially for new new users of that language I I kind of often think of it as the stored procedure language for graphs because you're really controlling how you move from A to B to C at that at a very low level um whereas you have something like a cipher open Cipher and and gql is more of a SQL style declarative language where you kind of say give me this pattern you know give me something that looks like this pattern and the engine itself handles all the optimization which is easier to write but harder to optimize if the engine isn't optimizing because you don't get to control it I mean as things start to move to gql I I hope they start to kind of give a little bit more of the control to the users in some in some scenarios because as I kind of mentioned some of the challenges with scalability those those show up in the query languages themselves too and broadly speaking how where are we on the adoption curve of all the organizations that should be using a graph database how many are I mean I would say we're still early on the adoption curve for a lot of us I think there's two parts to that there's the number of people that would use or number of organizations that are using a graph database versus the number of organizations that have data that they could get insights from that graphs can provide for them and I wouldn't even and I think this is also not just on the graph database side but also on those graph Frameworks those graph engines and the types of unique Insight you can get from a graph I would say we're relatively early in most organizations are being able to do that well the book is still pretty new but it didn't come out this morning what's changed in the time since you published it if anything I mean I I I I think actually you some of the pieces you've hit on there have changed quite a lot gql is now come out as a kind of an actual uh standard out there you know still waiting on people to implement this it's it's an ISO standard it's got to take a little while but that that's kind of very new I think also just the query languages themselves have really evolved since the book was published there's a lot of you know Gremlin specifically there's been a lot of work around around adding new features uh to Gremlin to make it more powerful for things like manipulating strings manipulating dates those sorts of things that weren't in it when I when the book was written but are there now um I also just think in a lot of ways the use cases around graphs have changed somewhat uh the rise of generative AI type use cases and being able to use you know graph Rag and things like that didn't exist when I wrote the book you know four years ago well I started writing the book four years ago you know under the covers the problems are still the same they're just different you know different views on on those same problems and could you give a just real quick lay definition of graph rag for listeners who aren't familiar with that yet I would say there's two definitions I think there's the definition of using a n a a graph I I don't like using the word Knowledge Graph because I but using a graph in conjunction with a rag or retrieval augmenta generation application to be able to provide kind of more complete and explainable answers is kind of one definition that I think you know most people that are more familiar with rag applications may think of but I also see graph rag used as kind of a catchall term for using graphs with llms and generative AI as a whole so it's sort of you know I I've heard you know people come and say I want to do query generation in My Graph rag application but like that's not really a rag there's no rag part of this but you know it's just kind of become that catchall term for graphs plus generative AI as well so and uh if you were to really put your evangelist hat on what's the best argument for an ambitious engineer trying to convince his organization it's time to do some R&D on graphs I mean I think the the best argument I have is to find some is to be able to show the data you know it's always about showing the data find a use case that actually benefits from being able to use a graph and show that you can get significant performance improvements using that most Enterprises have a lot of data sitting around and being able to unlock you know being able to use grass to help unlock inside of some of that data that's sitting in S3 buckets or snowflake or really you know data warehouses and be able leverage it in those sort of scenarios is an area where I really see a lot of potential I was working at a company once where we were working with family trees and we were able to show that you know not only were we able to scale better but we were able to take the latency of a query down from about 5 Seconds to about 500 milliseconds by using a graph database you know you start to show you know and if you start to take this in and you start to show the data of a 10x Improvement people will start to list it and and start to see okay maybe this there is something to this and what's next for you you know I'm very interested in where graphs are kind of going and what you can do you know being able to unlock the different types of use cases that you're actually able to be able to do with it so it's very you know I find it very interesting to be able to work with customers the very varying types of use cases that come out with this so I'm very interested to kind of keep working with that and see where we can start to take this sort of technology to improve you know improve developers productivity and is there anywhere listeners can follow you online probably the best place to follow me online is to just look uh at my LinkedIn uh profile which that's where I'm kind of posting most of the stuff I'm working on and things like that very cool we'll have links to all the above in the show notes along with the book and your homepage Dave thanks so much for taking the time to come on the show and share your expertise thank you for having me [Music] [Laughter] [Music]

Original Description

In this episode, Dave Bechberger, principal Graph Architect at AWS and author of "Graph Databases in Action", brings deep insights into the field of graph databases and their applications. Together we delve into specific scenarios in which Graph Databases provide unique solutions, such as in the fraud industry, and learn how to optimize our DB for questions around connections, such as "How are these entities related?" or "What patterns of interaction indicate anomalies?" This discussion sheds light on when organizations should consider adopting graph databases, particularly for cases that require scalable analysis of highly interconnected data and provides practical insights into leveraging graph databases for performance improvements in tasks that traditional relational databases struggle with.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 0 of 60

← Previous Next →
1 Data Skeptic book giveaway contest winner selection
Data Skeptic book giveaway contest winner selection
Data Skeptic
2 OpenHouse - Front end and API overview
OpenHouse - Front end and API overview
Data Skeptic
3 OpenHouse Crawling with AWS Lambda
OpenHouse Crawling with AWS Lambda
Data Skeptic
4 [MINI] Logistic Regression on Audio Data
[MINI] Logistic Regression on Audio Data
Data Skeptic
5 Data Provenance and Reproducibility with Pachyderm
Data Provenance and Reproducibility with Pachyderm
Data Skeptic
6 [MINI] Primer on Deep Learning
[MINI] Primer on Deep Learning
Data Skeptic
7 Big Data Tools and Trends
Big Data Tools and Trends
Data Skeptic
8 [MINI] Automated Feature Engineering
[MINI] Automated Feature Engineering
Data Skeptic
9 The Data Refuge Project
The Data Refuge Project
Data Skeptic
10 [MINI] The Perceptron
[MINI] The Perceptron
Data Skeptic
11 [MINI] Feed Forward Neural Networks
[MINI] Feed Forward Neural Networks
Data Skeptic
12 Data Science at Patreon
Data Science at Patreon
Data Skeptic
13 [MINI] Backpropagation
[MINI] Backpropagation
Data Skeptic
14 [MINI] GPU CPU
[MINI] GPU CPU
Data Skeptic
15 OpenHouse
OpenHouse
Data Skeptic
16 [MINI] Generative Adversarial Networks
[MINI] Generative Adversarial Networks
Data Skeptic
17 [MINI] AdaBoost
[MINI] AdaBoost
Data Skeptic
18 [MINI] The Bootstrap
[MINI] The Bootstrap
Data Skeptic
19 [MINI] Dropout
[MINI] Dropout
Data Skeptic
20 [MINI] Gini Coefficients
[MINI] Gini Coefficients
Data Skeptic
21 [MINI] Random Forest
[MINI] Random Forest
Data Skeptic
22 [MINI] Heteroskedasticity
[MINI] Heteroskedasticity
Data Skeptic
23 [MINI] ANOVA
[MINI] ANOVA
Data Skeptic
24 Urban Congestion
Urban Congestion
Data Skeptic
25 [MINI] The CAP Theorem
[MINI] The CAP Theorem
Data Skeptic
26 Unstructured Data for Finance
Unstructured Data for Finance
Data Skeptic
27 Detecting Terrorists with Facial Recognition?
Detecting Terrorists with Facial Recognition?
Data Skeptic
28 Predictive Models on Random Data
Predictive Models on Random Data
Data Skeptic
29 [MINI] Entropy
[MINI] Entropy
Data Skeptic
30 [MINI] F1 Score
[MINI] F1 Score
Data Skeptic
31 Causal Impact
Causal Impact
Data Skeptic
32 Machine Learning on Images with Noisy Human-centric Labels
Machine Learning on Images with Noisy Human-centric Labels
Data Skeptic
33 The Library Problem
The Library Problem
Data Skeptic
34 Stealing Models from the Cloud
Stealing Models from the Cloud
Data Skeptic
35 Data Science at eHarmony
Data Science at eHarmony
Data Skeptic
36 Multiple Comparisons and Conversion Optimization
Multiple Comparisons and Conversion Optimization
Data Skeptic
37 Election Predictions
Election Predictions
Data Skeptic
38 [MINI] Calculating Feature Importance
[MINI] Calculating Feature Importance
Data Skeptic
39 MS Connect Conference
MS Connect Conference
Data Skeptic
40 Music21
Music21
Data Skeptic
41 The Police Data and the Data Driven Justice Initiatives
The Police Data and the Data Driven Justice Initiatives
Data Skeptic
42 Studying Competition and Gender Through Chess
Studying Competition and Gender Through Chess
Data Skeptic
43 [MINI] Goodhart's Law
[MINI] Goodhart's Law
Data Skeptic
44 Trusting Machine Learning Models with LIME
Trusting Machine Learning Models with LIME
Data Skeptic
45 [MINI] Leakage
[MINI] Leakage
Data Skeptic
46 Predictive Policing
Predictive Policing
Data Skeptic
47 Mutli-Agent Diverse Generative Adversarial Networks
Mutli-Agent Diverse Generative Adversarial Networks
Data Skeptic
48 [MINI] Convolutional Neural Networks
[MINI] Convolutional Neural Networks
Data Skeptic
49 Unsupervised Depth Perception
Unsupervised Depth Perception
Data Skeptic
50 [MINI] Max-pooling
[MINI] Max-pooling
Data Skeptic
51 MS Build 2017
MS Build 2017
Data Skeptic
52 Activation Functions
Activation Functions
Data Skeptic
53 Doctor AI
Doctor AI
Data Skeptic
54 [MINI] The Vanishing Gradient
[MINI] The Vanishing Gradient
Data Skeptic
55 CosmosDB
CosmosDB
Data Skeptic
56 Estimating Sheep Pain with Facial Recognition
Estimating Sheep Pain with Facial Recognition
Data Skeptic
57 [MINI] Conditional Independence
[MINI] Conditional Independence
Data Skeptic
58 MINI: Bayesian Belief Networks
MINI: Bayesian Belief Networks
Data Skeptic
59 Project Common Voice
Project Common Voice
Data Skeptic
60 [MINI] Recurrent Neural Networks
[MINI] Recurrent Neural Networks
Data Skeptic

This video teaches how to customize graph solutions for specific problems, with a focus on graph databases and their applications. It highlights the importance of understanding the problem to be solved and choosing the right graph database, as well as optimizing data and queries for better performance. The video also discusses various graph query languages and their trade-offs.

Key Takeaways
  1. Identify the problem to be solved and determine if a graph database is the right solution
  2. Choose the right graph database for the problem
  3. Optimize data and queries for better performance
  4. Evaluate graph database performance and compare query languages
  5. Align graph database design with business goals and ensure security and ethics
💡 Graph databases are still early on the adoption curve for many organizations, but they offer significant benefits in terms of scalability and query performance. Choosing the right graph database and optimizing data and queries are crucial for success.

Related AI Lessons

Up next
Spreadsheet Guy Meets the CFO: "Define How Much"
Digital Transformation with Eric Kimberling
Watch →