SQL Database Optimization

Siraj Raval · Intermediate ·📊 Data Analytics & Business Intelligence ·7y ago

Key Takeaways

The video demonstrates the use of deep reinforcement learning to optimize SQL database queries, utilizing techniques such as Markov decision processes, Bellman equations, and queue learning algorithms, with tools like neural networks and gradient descent optimization algorithms.

Full Transcript

hello world it's Suraj and what's the most efficient way to interact with the database anytime we want to create read update or delete data in a database will usually do so in the form of sequel queries sequel stands for structured query language and it's the standard language used to communicate with relational databases what about Haskell dB each query takes a certain amount of time to compute and ideally we can order our queries in the most computationally efficient way there are several techniques to do this and usually reinforcement learning isn't considered one of them but a recent paper showed that using a deep queue network researchers were able to perform queries twice as fast as using standard techniques we'll learn how they did it in this video but first we'll need to understand why they used reinforcement learning as opposed to other more common techniques we know that reinforcement learning works great for video games games are the perfect testbed for implementing orell algorithms since they're constrained environments where time is an element that allows for rapid experimentation the Markov decision process is the mathematical framework of choice for framing this decision problem that our AI or agent is facing it consists of a few variables that define our environment our agent and how our agent interacts with this environment in reinforcement learning an agent tries to come up with the best action to take given a state in the video game pac-man the state would be the 2d game world were in that includes the surrounding items like enemies walls and packed dots the action would be moving through this 2d space which would be going either up down left or right so given the state of our game world our agent will need to pick the best action to take in order to maximize rewards we know that eating packed dots gives us positive rewards and getting eaten by a ghost gives us a negative reward and a possible temper tantrum which we want to avoid through trial and error our agent will accumulate knowledge of the environment through state action pairs meaning it can tell if there would be a positive or negative reward given a state action pair we can represent this using the cue function where Q stands for quality as in it assesses the quality of a given state action pair we can actually learn what the optimal Q value will be at any given point and this is called Q learning by using what's called the bellman equation we can write out an equation that relates the value of one state to the value of another state in our environment because we are able to relate states across time to each other mathematically using the bellman equation we can use any number of methods to iteratively approximate our Q function but in the case of our pac-man environment the state action space can get really big in fact at some point it will no longer be feasible to store all the state action pairs of course we could still perform Q learning but it'll get harder to approximate the Q function over time luckily for us there exists a universal function approximator called a neural network if we give it enough input data it can learn any function if we use a neural network as an agent that predicts the Q value based on the input state action pair then we have a much more tractable solution than storing every possible value like we did previously and to capture all the intricate details of this knowledge present in our Q table we'll likely need to add a few hidden layers to our network making it a deep neural network the extra hidden layers allow the network to internally come up with features that can help it learn complex functions that would have been impossible using a more shallow network we can call this whole process deep reinforcement learning and more specifically deep Q learning now let's bring this theory back to reality you and I have neural networks in our head networks of neurons are firing in endlessly different combinations to approximate functions that help us perform a wide variety of tasks as we go about our lives and we can consider our reality a Markov decision process as neural network agents given a state we take actions to maximize reward for whatever task we are achieving using our function approximation capabilities to make predictions in that way we can consider our reality deep reinforcement learning anytime we use a neural network to approximate some reinforcement learning function get a value function a policy even the model itself we can call that deep reinforcement learning so how does this apply to the problem of query optimization well we know that sequel statements are used to perform a wide variety of tasks related to the database they can update data retrieve data merge data delete data each sequel query has its own function assume we have a database consisting of three tables employees salaries and taxes let's say we want to calculate the total tax owed by all employees under manager one we can write out a sequel query that does that we'll compute the tax owed by each employee by selecting their specific attributes and summing them all up this query is going to perform a three relation join we can use J to help denote the cost of accessing a base relation the cost of each query is the percentage of the total batch cost it's the time needed to execute a query it's computed different ways but almost always takes into account several computation factors such as input output CPU and communication we want to minimize this cost so that we perform our queries as fast as possible using dynamic programming we can iteratively calculate the cost of optimally accessing the three-base relations after the first iteration we can build off of this information that we previously computed and enumerate all two relations when we compute the best cost to join two relations we'll look up the relevant previously computed results and in the third iteration we'll proceed through the other two relation sets eventually finding the final best costs for joining all three tables once complete will see that this algorithm has a space and time complexity exponential in the number of relations which is why it's usually only used for queries between 10 and 20 relations when we have more relations than that we'll need to use a different query optimization technique instead of solving this join ordering problem using dynamic programming what if we formulated this problem as a Markov decision process and solved it using reinforcement learning if we do that the states can be considered the remaining relations to be joined the actions would be the valid joins out of the remaining relations the next states would be the old remaining relations set with two relations removed and the resultant join added lastly the reward would be the estimated cost of the new join because we defined these Markovian variables we can define a queue function using the bellman equation to describe the long-term cost of each join and since we've defined a queue function we can order joins in a greedy way we start with the initial query graph find the join with the lowest queue value then update the query graph and repeat the process this queue learning algorithm has a computational complexity of n cubed although that's high that's still much lower than the exponential run time complexity of dynamic programming in reality though we don't have access to the optimal queue function so we need to approximate it to do that we can use a neural network which would make this deep queue learning to learn the queue function we need to observe past execution data we can use it as training data for our neural network and since we're using a neural network to represent our queue function when to feed the state and actions into the network as fixed-length feature vectors this helps our network perform matrix multiplications gracefully as it accepts this kind of format will use one hot vectors to encode the set of all attributes present in the query graph and the participating attributes from both the left and right side of the join will use a simple two layer fully connected network as our agent and train it using the standard gradient descent optimization algorithm once trained it will accept a sequel query in plaintext parse it into an abstract syntax tree form feature the tree and use a neural network whenever a candidate join is scored and because databases are real-time used in production environments constantly being updated and changed we can periodically retune our network using the feedback from live execution across all cost models deep queue is competitive with the optimal solution without a priori knowledge of the index structure we can safely say that learning-based optimizers are more robust than hand designed algorithms because they can adapt to changes in data workload or cost models for the largest joins deep queue wins by up to 10,000 x compared to exhaustive enumeration three points to remember from this video deep reinforcement learning involves using a neural network to approximate reinforcement learning functions like the cue function we can assess the quality or cue of state action pairs by computing a cue table and cue learning involves approximating the relationship between state action pairs and Q values in this table using neural networks please subscribe for more programming videos and for now I've got to take a meeting so thanks for watching

Original Description

We can use deep reinforcement learning to optimize a SQL database, and in this video we'll optimize the ordering of a series of SQL queries such that it involves the minimum possible memory/computation footprint. Deep RL involves using a neural network to approximate reinforcement learning functions, like the Q (quality) function. After we frame our database as a Markov Decision Process, I'll use Python to build a Deep Q Network to optimize SQL queries. Enjoy! Code for this video: https://github.com/llSourcell/SQL_Database_Optimization Please Subscribe! And like. And comment. That's what keeps me going. Want more education? Connect with me here: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology instagram: https://www.instagram.com/sirajraval More learning resources: https://rise.cs.berkeley.edu/blog/sql-query-optimization-meets-deep-reinforcement-learning/ https://mldb.ai/ https://docs.microsoft.com/en-us/sql/advanced-analytics/what-is-sql-server-machine-learning?view=sql-server-2017 https://towardsdatascience.com/machine-learning-in-your-database-the-case-for-and-against-bigquery-ml-4f2309282fda https://www.quora.com/Which-database-is-best-for-machine-learning Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ School of AI: https://www.theschool.ai And please support me on Patreon: https://www.patreon.com/user?u=3191693 Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hiring? Need a Job? See our job board!: www.theschool.ai/jobs/ Need help on a project? See our consulting group: www.theschool.ai/consulting-group/ Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 0 of 60

← Previous Next →
1 What is Bitcoin?
What is Bitcoin?
Siraj Raval
2 5 Ways to Use Bitcoin
5 Ways to Use Bitcoin
Siraj Raval
3 BTC Fever - Siraj [Music Video]
BTC Fever - Siraj [Music Video]
Siraj Raval
4 5 Reasons to Build Decentralized Apps
5 Reasons to Build Decentralized Apps
Siraj Raval
5 The Interplanetary File System
The Interplanetary File System
Siraj Raval
6 How to Build a Dapp in 3 min
How to Build a Dapp in 3 min
Siraj Raval
7 Life Before Smartphones
Life Before Smartphones
Siraj Raval
8 4 Ways to Use Smart Contracts
4 Ways to Use Smart Contracts
Siraj Raval
9 3 Dapps You HAVE to See
3 Dapps You HAVE to See
Siraj Raval
10 Char's Life as a BitTorrent Engineer
Char's Life as a BitTorrent Engineer
Siraj Raval
11 4 Reasons AlphaGo is a Huge Deal
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
12 Build a Neural Net in 4 Minutes
Build a Neural Net in 4 Minutes
Siraj Raval
13 Sentiment Analysis in 4 Minutes
Sentiment Analysis in 4 Minutes
Siraj Raval
14 The Hackathon Life
The Hackathon Life
Siraj Raval
15 Your First ML App - Machine Learning for Hackers #1
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
16 Build an AI Composer - Machine Learning for Hackers #2
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
17 Build a Game AI - Machine Learning for Hackers #3
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
18 Build a Movie Recommender - Machine Learning for Hackers #4
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
19 Build an AI Artist - Machine Learning for Hackers #5
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
20 Build a Chatbot - ML for Hackers #6
Build a Chatbot - ML for Hackers #6
Siraj Raval
21 Build an AI Reader - Machine Learning for Hackers #7
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
22 Build an AI Writer - Machine Learning for Hackers #8
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
23 Build a Chatbot w/ an API - ML for Hackers #9
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
24 One-Shot Learning - Fresh Machine Learning #1
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
25 Generative Adversarial Nets - Fresh Machine Learning #2
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
26 Tone Analysis - Fresh Machine Learning #3
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
27 Generate Rap Lyrics - Fresh Machine Learning #4
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
28 Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
29 Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
30 Build an Antivirus in 5 Min - Fresh Machine Learning #7
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
31 TensorFlow in 5 Minutes (tutorial)
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
32 Build a Recurrent Neural Net in 5 Min
Build a Recurrent Neural Net in 5 Min
Siraj Raval
33 Build a Simulation in 5 Min
Build a Simulation in 5 Min
Siraj Raval
34 Build a TensorFlow Image Classifier in 5 Min
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
35 Tensorboard Explained in 5 Min
Tensorboard Explained in 5 Min
Siraj Raval
36 Generate Music in TensorFlow
Generate Music in TensorFlow
Siraj Raval
37 Build a Game Bot (LIVE)
Build a Game Bot (LIVE)
Siraj Raval
38 Deep Learning Frameworks Compared
Deep Learning Frameworks Compared
Siraj Raval
39 Introduction - Learn Python for Data Science #1
Introduction - Learn Python for Data Science #1
Siraj Raval
40 Build a Neural Network (LIVE)
Build a Neural Network (LIVE)
Siraj Raval
41 Twitter Sentiment Analysis - Learn Python for Data Science #2
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
42 Recommendation Systems - Learn Python for Data Science #3
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
43 Predicting Stock Prices - Learn Python for Data Science #4
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
44 Pong Neural Network (LIVE)
Pong Neural Network (LIVE)
Siraj Raval
45 Deep Dream in TensorFlow - Learn Python for Data Science #5
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
46 Visualizing Data with D3.js (LIVE)
Visualizing Data with D3.js (LIVE)
Siraj Raval
47 Genetic Algorithms - Learn Python for Data Science #6
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
48 Enter Siraj [Music Video]
Enter Siraj [Music Video]
Siraj Raval
49 Build a Web Scraper (LIVE)
Build a Web Scraper (LIVE)
Siraj Raval
50 Why is P vs NP Important?
Why is P vs NP Important?
Siraj Raval
51 How to Make a Neural Network (LIVE)
How to Make a Neural Network (LIVE)
Siraj Raval
52 How to Make an Amazing Tensorflow Chatbot Easily
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
53 How to Make an Amazing Video Game Bot Easily
How to Make an Amazing Video Game Bot Easily
Siraj Raval
54 How to Make a Tensorflow Neural Network (LIVE)
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
55 How to Make a Simple Tensorflow Speech Recognizer
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
56 Joel Shor - Really Quick Questions with an Awesome Google Engineer
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
57 How to Make a Path Planning Algorithm Easily (LIVE)
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
58 The Best Way to Prepare a Dataset Easily
The Best Way to Prepare a Dataset Easily
Siraj Raval
59 Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
60 How to Make a Tic Tac Toe Neural Network Easily (LIVE)
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval

This video teaches how to use deep reinforcement learning to optimize SQL database queries, reducing computational complexity and improving performance. By applying techniques like Markov decision processes and queue learning algorithms, developers can create more efficient database systems.

Key Takeaways
  1. Formulate the query optimization problem as a Markov decision process
  2. Define the states, actions, and rewards for the reinforcement learning model
  3. Use a neural network to approximate the optimal queue function
  4. Train the neural network using standard gradient descent optimization algorithm
  5. Parse a SQL query into an abstract syntax tree form
  6. Feature the tree and use a neural network to score candidate joins
💡 Deep queue learning can be used to optimize SQL database queries, reducing computational complexity and improving performance, and can be competitive with optimal solutions without prior knowledge of the index structure.

Related AI Lessons

Up next
Spreadsheet Guy Meets the CFO: "Define How Much"
Digital Transformation with Eric Kimberling
Watch →