Using Natural Language to Query Postgres with Jacob
Key Takeaways
The video demonstrates how to use natural language to query Postgres databases with Jacob, utilizing tools such as L Chain, Postgres, LLM, RAG, and GPT 3.5 turbo instruct to build context-aware reasoning applications and generate SQL queries based on user prompts.
Full Transcript
hello everyone and good morning good afternoon or good evening depending on where you're tuning in from my name is Jacob Lee and I'm a software engineer at L chain I'm excited to be presenting at neon developer days and I'll be talking a bit about quering postgress databases with natural language and a quick aside before I get started if you're not already familiar with L chain we make it easy to build context aware reasoning applications and more concretely you can of what Lang chain offers as a four- layer cake that helps you create and deploy large language model powered applications we're most well known for open source libraries of models such as document loaders Vector stores prompts and llm wrappers that provide common interfaces for various tasks around orchestration these pieces parse format and retrieve data from various sources to present this context to an llm open source Frameworks available in JavaScript and python also include useful ways of prompting and chaining different model calls together hence the name to build sophisticated and Powerful applications using expression language and finally we provide tools such as lsmith that help debug and evaluate llm apps tools such as L serve that help with pushing apps to production as well as well as various templates that show off llm use cases that you can apply to your own projects so first off a bit of framing why do we want to query SQL with naal language well though most of you are probably quite familiar with writing queries SQL is still a skill that takes some time to learn particularly as queries become more complex furthermore people who do know basic SQL in your organization might know a bit generally about what your database contains but may not always be familiar with the specific fields and schema in your particular database so empowering non- analysts to have have some insights into Data that's behind this knowledge threshold has a potential to save engineers and data scientists time in creating complex and bespoke dashboards and given many popular llms already have SQL queries in their training data we have some powerful tools ready made to make it work so our strategy will be to use a form of retrieval augmented generation or rag for short rag is a general technique for guiding llm Generations if you think of Continuum where on one end you have pure retrieval like going to Google and quering for data or manually writing a SQL query and on the other end you have pure generation via llm which would be the equivalent of opening chat GPT and simply asking for the data you need um retrieval augmented generation sits in the Middle where you take the Best of Both Worlds and take data related to your query pass to the llm to give it context on your unique task and situation specific specifically for SQL you might start with a question like who works here you could find the answer by querying your database but this requires SQL knowledge schema knowledge and some formatting at the end to make it presentable to an end user they're not going to want to parse through rows our task then for allowing natural language interactions with our SQL database twofold we first need to find a way to map a natural language input like who works here to a SQL query that will return the correct information and second we need to pass the queried rows into an llm to synthesize as a final natural language output to the user the L part is somewhat simpler so we'll mostly be focusing on the first part of the problem generating a SQL query via lln so now that we've identified the the problem Where Do We Begin and well as everyone knows llms are magic and can do literally anything so why not try straight up asking the llm and directly prompting it with generate me a post query that finds the first and last names of everyone who works here and the result is actually surprisingly not too bad however it did assume that the table name was employees with the trailing ass while our sample database the popular chanuk demo database has a table called employee singular this is understandable mistake given that we haven't told the llm anything about our database schema and there's also a bunch of other output that will need parts out at the end that's the to find the first and last names of everyone Etc and the trailing bit where it's this query selects the first and last name Etc we just want the query itself so let's take a look at how we solve the first problem as I mentioned earlier llms are trained on a large set of data including code and SQL queries so they already have postest syntax internalized which is quite helpful since we don't need to explain to the llm how to craft a postr query and then Raj Kumar at all discovered that actually pretending to create each relevant table with literal create table syntax and then giving an example of some rows using a small select query generally similar to F shotting a technique called f shotting rather um gives noticeably boosted performance over other methods such as just directly pacing your raw schema to the prompt and for the second problem formatting or output we use a neat trick we ask the llm to structure its output in a format that actually doesn't have the queries last out and then we pass SQL result the next token in the output that we'd expect as something called a stop sequence that as the name implies Flags to the llm to stop generating once it would have generated that given token we then end our prompt with the beginning of the output structure we requested the reason this works is that llms even those tuned to chat settings such as the famous gp4 are essentially fancy prediction engines that predict uh that pick rather the most likely next token given the context by ending our prompt with the start of the OPA structure we hint the llm that it should complete the next line which we have already told it should be SQL query earlier in our prompt followed by a generated SQL query after generating the query it will try to generate the token's SQL result but since we've passed that as a stop sequence the llm will halt there and we can extract just the generated query though chat models can do this I've chosen to use a text prediction variant of GPT 3.5 turbo called gbt 3.5 turbo instruct that is more tuned to this style of prompting so putting it all together we end up with the following prompt I've added a few more lines to try to guard against large outputs and hallucinations and you can check out the prompt and pull it for yourself from the L chain prompt hub using the link at the bottom of the slide and there's actually one more piece to call out about the prompt and that's the initial line you are a postgress expert though it sounds a bit silly this actually has a noticeable impact on the quality of the final generation and to understand this intuitively llms are trained on the sum of all human knowledge including very bad SQL writing blog posts including a line at the start of this prompt a start of your prompt Keys the llm onto the the good parts of this training data set that you do want to use like the parts that know how to write good postgress queries it's a simple trick but one worth highlighting all right great we've got a way to generate our SQL query now of course our next step is to run it and given that many of you in the audience are probably experienced SQL developers if you didn't have an immediate visceral reaction to that statement I'm very very disappointed in you now depending on what you're building you may not have control over the questions your end user will ask your app think of it like sanitizing inputs except the entire query is in fact your input and you should expect in guard against the llm um generating queries like Drop tables or update customers set email you stink at smell and you should use this as much caution as you possibly can setting a special role for llm queries and strongly make considering making it read only is a great start um as that way you'll guard against dangerous mutations or deletions and going further and restricting that role to specific tables is also good practice since the llm could have a lucky hallucination and guess a sensitive table or field name even if you don't pass it in as context schema you might also consider using a separate database or Branch to ensure that if large queries do get generated they don't excessively consume resources from critical prod flows if you do need to allow insert update or delete access make sure you set your permissions carefully and don't just allow inputs or deletions to um arbitrary tables and finally though it's out of scope for this talk you could also try to do analysis of generated queries before running them on the database okay so hopefully that scared you a little bit but not too much after all this is still pretty cool um so the final piece of query generation I can talk about is using a lower level open AI parameter called logit bias and basically input cm are encoded and are then processes tokens it's a bit more technical but open AI actually allows you to manually tune the frequencies of those specific tokens negative 100 values never repeat appear while passing a value of 100 makes it the token exclusively appears not terribly useful and one thing to watch out for here is that tokens are actually K sensitive so something like the with a capital T will actually tokenize differently from the with lower case T now here's a concrete example of how this can be useful with the sample chinuk database I prepared for the demo app asking what are some good movies will surprisingly restrict outputed movies to the action and adventure categories as you can see in the generated sequel below there join there and you know you wouldn't expect it to necessarily only generate action and adventure movies but seems like that's what the llm and feels are good movies and you know it's again hallucinations arbitrary um anything can happen or be generated here adding a positive bias towards the token drama will make the llm generate a query that filters on drama while still allowing more specific queries that you'd actually want different categories so um what are some good action movies for example would still work and generate good outcomes and good queries there and the word of caution here while this is an interesting tool in your kit it's best not to treat it as a substitute for security measures and only use it for dunging outputs so for example tokens are case sensitive and as I mentioned earlier and furthermore putting a space with formost words will cause them to tokenize differently you can see in the examples to the right that I've put delete twice in caps and lowercase Elite once and results in three different tokens so you might while you might think you're safe by passing Capital delete as a forbidden token the llm could still generate space delete or some case insensitive version such as capital D all lowercase Delite and given an input that ass it to drop tables or delete from a table the bias is already so strong for the LM to generate query with that token that it's just not reliable to ban specific SQL Behavior it'll try to find a way around it or generate nonsense in the best case so you should think of log of bias as a similar Spirit prompting in that it'll guide the lm's output and um ideally generate better outputs for tokens you do want but you know relying on a single measure to ban bad behavior as a losing battle okay and now I've prepared a demo app built on a neon DB hosted version of the popular chuk demo database which mocks a digital media store and has tables for artists albums media tracks invoices and customers it's hosted live as a verell nextjs app and I'll go through it a little bit right now okay so switching to Chrome here I have a talk tosql with lank chain Js uh ver app posted on forell it's actually quing a serverless neon DB Post sance running the chuk DAT database behind the scenes and again that's the database uh simulating a media record store with various tables for artists albums media tracks invoices customers so let's just try quering something who works here as you saw earlier there's already an employee table so I'd expect some kind of Select query that looks over employees and let's see gener employee generated a query that looks pretty pretty much what I expect the employees who work here are Andrew Adams Nancy Edwards Jane peacock Mar Park Etc and I can actually go into lsmith and see what this Trace looks like so I'll go in now and I can see that yeah this is what I expect the outputs I saw in the from the app there but I do see one or two interesting things uh first of all it did ignore the query for most Five results using the limit Clause as post here example of how you can't necessarily uh exclusively rely on prompting even though it can be helpful for guiding outputs um you can't exclusively rely on prompting to prevent bad outputs or bad queries in this case but you can see the rest of the prompt uh the expert prompting which I showed off here you are a postc expert which is quite helpful some of the tables that are few shotted in so create table public actor exactly how you create the table if you're generating yourself and yeah F shot it with select queries and then scroll down farther there's quite a lot of information here in production you may not want to give it uh you may not even want to pass all these tables in as context but just for the demo purposes so we can sort of show everything I'm doing it here and then you can see the SQL query that gets generated cool so this a very simple example let's try a more complicated one so I think I showed earlier there's no chat history for this by the way yet but should theoretically be possible with smart prompting and I could try something like like what movies are good and let's see what happens so it's going to go through all the steps generate and yeah I see the movies airplane Sierra Amad holy American cus Etc are good options to watch as action or comedy I guess the LM has its own concept of taste but you can see it does actually an inter jooin on category and look specifically for action or comedy movies which is kind of cool and then in this case get the limit um is five so yeah let's look at this in lsmith I'll go back to my dashboard here I can see the new query here and yeah I can see again same prompt here and the output of query below which is what exactly what I saw in the initial database or sorry the initial uh output from the app there um so I think one thing to call out here is perhaps let's say okay it's a little bit weird that you know the only good movies are not there are good movies outside of action or comedy movies so let's say yeah we want to figure out why it's only jaring action in comedy movies and I think a common first step for this is to edit the prompt and update it so that the llm is more inclined to generate uh maybe movies from other categories so I can actually go into likith has a playground feature where you can actually go in and look at the prompts that uh sparked and kicked off specific runs so I can see exactly same PRP that I saw before I see the model that used generate so I could try different models if I want to for now I'll just stick with GPT 3.5 turbo instruct I can see different parameters and edit them so temperature is a common one Higher temperature is more creativity too high as you'll see in a minute there get myself out the way there um you get kind of garbage if you put it all the way up to two yeah you get kind of craziness so you don't want to set it too high I think for generally structured outputs setting it low is quite a bit more useful and then you can also see the stop sequence that I previously mentioned so let's try updating the prompt to say pay close attention to the movies or the category of movie you return you should return mostly drama movies cool and if I run it now let's see what I get Okay cool so yeah this time it didn't actually return any category based movie but instead R it on PG versus G so I guess good in this case was taken to mean um familyfriendly but how about return the only good movies are drama movies okay and this time I get in response to the question what movies are good a SQL query that restricts only good movies to drama category named drama cool so this is an example of how lsmith can be helpful to tweak prompts and generally get your llm apps um generating what you want them to the below link has a representative lsth Trace that allows you to see the various inputs and outputs to each step the project I looked at earlier wasn't publicly available but you can go to the link at the bottom of the slide there um to go and poke around the various steps and inputs and outputs at your leisure so lsmith is currently an open Beta but if you'd like to try it out you can sign up with lanks smithore partner 2023 all lowercase and that'll let you sign up and immediately start sending traces to projects and that's it thank you very much for listening and I really hope you enjoyed my talk if you'd like to keep in touch you can follow me at hakubo on X formerly Twitter and follow linkchain at linkchain aai for more goodies I've also included links for both the live demo app the source code and all the Technologies used in it below I hope you enjoy the rest of the conference
Original Description
by Jacob Lee for Neon Developer Days
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from LangChain · LangChain · 21 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
▶
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Chat With Your Documents Using LangChain + JavaScript
LangChain
LangChain SQL Webinar
LangChain
LangChain "OpenAI functions" Webinar
LangChain
LangSmith Launch
LangChain
LangChain x Pinecone: Supercharging Llama-2 with RAG
LangChain
LangChain Expression Language
LangChain
Building LLM applications with LangChain with Lance
LangChain
Benchmarking Question/Answering Over CSV Data
LangChain
LangChain "RAG Evaluation" Webinar
LangChain
Fine-tuning in Your Voice Webinar
LangChain
Tabular Data Retrieval
LangChain
Building an LLM Application with Audio by AssemblyAI
LangChain
Superagent Deepdive Webinar
LangChain
Lessons from Deploying LLMs with LangSmith
LangChain
Shortwave Assistant Deepdive Webinar
LangChain
Cognitive Architectures for Language Agents
LangChain
Effectively Building with LLMs in the Browser with Jacob
LangChain
Data Privacy for LLMs
LangChain
"Theory of Mind" Webinar with Plastic Labs
LangChain
LangChain Templates
LangChain
Using Natural Language to Query Postgres with Jacob
LangChain
Building a Research Assistant from Scratch
LangChain
Benchmarking RAG over LangChain Docs
LangChain
Skeleton-of-Thought: Building a New Template from Scratch
LangChain
Benchmarking Methods for Semi-Structured RAG
LangChain
LangSmith Highlights: Getting Started
LangChain
LangSmith Highlights: Debugging
LangChain
LangSmith Highlights: Datasets
LangChain
LangSmith Highlights: Evaluation
LangChain
LangSmith Highlights: Human Annotation
LangChain
LangSmith Highlights: Monitoring
LangChain
LangSmith Highlights: Hub
LangChain
SQL Research Assistant
LangChain
Getting Started with Multi-Modal LLMs
LangChain
Build a Full Stack RAG App With TypeScript
LangChain
Auto-Prompt Builder (with Hosted LangServe)
LangChain
LangChain v0.1.0 Launch: Introduction
LangChain
LangChain v0.1.0 Launch: Observability
LangChain
LangChain v0.1.0 Launch: Integrations
LangChain
LangChain v0.1.0 Launch: Composability
LangChain
LangChain v0.1.0 Launch: Streaming
LangChain
LangChain v0.1.0 Launch: Output Parsing
LangChain
LangChain v0.1.0 Launch: Retrieval
LangChain
LangChain v0.1.0 Launch: Agents
LangChain
Build and Deploy a RAG app with Pinecone Serverless
LangChain
Hosted LangServe + LangChain Templates
LangChain
LangGraph: Intro
LangChain
LangGraph: Agent Executor
LangChain
LangGraph: Chat Agent Executor
LangChain
LangGraph: Human-in-the-Loop
LangChain
LangGraph: Dynamically Returning a Tool Output Directly
LangChain
LangGraph: Respond in a Specific Format
LangChain
LangGraph: Managing Agent Steps
LangChain
LangGraph: Force-Calling a Tool
LangChain
LangGraph: Multi-Agent Workflows
LangChain
Streaming Events: Introducing a new `stream_events` method
LangChain
Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve
LangChain
OpenGPTs
LangChain
Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)
LangChain
LangGraph: Persistence
LangChain
More on: Agent Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Let 5 AI Agents Shop For Me in 2026. It Went About as Well as You’d Expect.
Medium · AI
The Governance Gap Nobody's Measuring
Medium · AI
My agent kept reading data it wasn't allowed to. The prompt was never going to stop it.
Dev.to AI
8 Must-Know AI Chatbot Tools That Actually Help Small Businesses
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI