Extractive Q&A With Haystack and FastAPI in Python

James Briggs · Intermediate ·⚡ Algorithms & Data Structures ·5y ago

Skills: LLM Engineering90%Prompt Craft80%Prompt Systems Engineering70%

Key Takeaways

This video demonstrates building an extractive Q&A stack using Haystack and FastAPI in Python, utilizing tools like BERT transformer, Elast, and Uvicorn to host the API.

Full Transcript

okay so we're going to go through putting together our reader model and the fast api wrapper around all of this so originally i was just going to do the reader model in this video but i realized it's actually super easy so we'd almost be done already if we were just doing the reader model so what we're going to do wrap everything up in the api and then we'll add the reader model into that so this is where we got to before we have this which is our jupiter notebook and this has everything that has our document store the retriever and we're just retrieving a few things so we've got a few sort of context here now obviously notebooks are fine for when we're putting things together and testing things but we need to switch over to actual python files so we're going to switch across to vs code over here so this is our project directory on the left here so we have the code and the labs in here which we've already put together we have our data which is meditations and i also put together this requirements text file so in here these are all the libraries that we need to pip and source so we have fast api from haystack which we already installed and you've gone they're the two that i want to focus on so fast api obviously we're using that to build our api and uvicorn we're going to use to spin up the server that our api will be hosted on so that's our requirements.txt i'm going to create a new folder in here now i'm just going to call it qa service i mean we can change these later on i'm not really sure what to call them all so we're not in code okay and then in here we'll create our first python file and this is just going to be our api so we'll call it api dot pi and now we'll start putting together our actual api so let's switch across to the window and the first thing we need to do is import fast api so do from fast api import fast api and then we want to initialize our api here so just call it app equals fast api let's need to fix that there and that initializes our api and then whenever we want to create a http method all we need to do is this so we do app and then in here we write the method that we'd like to add to our api so the only one that i think we're going to need at least at the moment is a get method so we'll add that and then we also need to add the path to this method and in this case we're going to be querying our q a model so we'll just create the query path so it's pretty good and then we do async and here we have our function so this is just going to be the query function and we want this query function to at some point accept a query but for now we'll just leave it and we're going to add that in a moment now this will just return [Music] no hello world so that we know that it's working so let's spin this up so i'm opening the terminal in this folder at the moment so at the moment we're at the high level of this so we'll go down a few items so we have we want to go into the code directory and then in the code directory it's qa service which we made just a moment ago and from inside here we initialize our guvicorn service which will host our api so for that we just write uvcorn and then we want our file name which is api and then we want whatever we've called our fast api instance in our code which is app up here and then we'll use the reload flag as well so this means that whenever we edit our code we don't need to re-initialize the api instance in order for changes to load okay so now we can see that here we have our instance so i'm going to copy that and open it in my browser it's over here enter that and we see this detail not found so that's because we just went straight to the index so let me just have the query endpoint and there we get hello world so this is the end point that we're going to use for our fast api so we know that's working and now let's add in the rest of our code so that we can begin querying our qa stack so if we just check what we did in our jupyter notebook all we actually need to do is basically copy loads of this across so let me just create a new cell here and i'm going to take all this and we're just going to copy it across and honestly this should really be all we need okay so we'll copy this and take it into our code here so before we initialize our api we're going to initialize the rest of our code so we have our document store we initialize that it accesses the aurelius index and then we have our retriever here do we so import them as well and then as well as the retriever we also have the reader model which is the next step so first obviously we need to import that and we do from haystack reader farm import farm reader and then when we put all these together we're also going to use something called a pipeline and to use that we just want to put from haystack pipeline import extractive qa pipeline so the reason it's extractive qa is because it's question answering and we're extracting the answer from the text rather than generating it with model so that's why we're using the extractive qa pipeline now that should be everything that we need to import and then all we need to do here is we initialize our reader model which is farm reader and then in here we want to include our model name or path and that's going to be equal to deep set let me do this on another line so deep set bert base case squad 2 this is a i don't know if we covered this already i don't think we did so this is a pretty standard q and a model that i think i think a lot of people use so it's it's pretty safe uh i want to start with and then after we've initialized these three parts of our stack we need to initialize our pipeline object so for that we do pipeline and then we're taking our extractive qa pipeline and then we have our reader which is going to be equal to our reader and we also need to pass the retriever which is equal to our retriever so that should be everything that we need to do in order to initialize our model so now what we want to do is we need to query our data so we do pipeline dot run and then in here i'm just going to put like a random query so i was the one we used before it was what what did your grand father teach you okay and just pass that as the query and then this will return this is going to return a dictionary so then we can just return it straight away now let's see if our api is still running i think it should be okay um yep so it's just starting up again now okay looks good so i think that's good and let's try and open it up over here we should get loads of text yeah so that's cool so now we're getting the query what did your grandfather teach you and this is pretty hard to to see what you see we get this list and then this goes all the way through here and get all of these different answers so the first like the top rated answer is what did your grandfather teach you and it's act and speak okay uh in sleep scenes act and speak it's all like children from their parents okay well it was interesting um okay so it's not amazing so let me try instead of using [Music] instead of using the dpr retriever let's try swapping it out for a bm25 retriever which is it's basically just an algorithmic retriever and it should actually run faster as well elastic search retriever and all we do is swap those around and i think the only parameter we need is the document saw yeah so okay let's let that start up again okay just waiting let's try this one see how it compares and then we'll go one of these i think so let's let's go and just refresh the page again okay yeah this is a lot better which is weird i kind of thought the other one would do better but that's fine maybe we need to fine-tune it a bit more than what we have so what we have here is what do your grandfather teach you and we get good morals and the government of my temper which is i think pretty exactly what i wanted it to return so that's good i see here there's like a new line character as well which i think we need to format the text before we put it in to remove those but that's cool it's pretty cool so that's really good it's working but we're not actually making our own queries so we need to add that and to do that all we need to do is add q in here and this is our query string and in here we just make that equal to query and now if we do that rather than opening in the browser i'm gonna i'm going to open insomnia here so insomnia what you can see now is a it's just a http client so it allows us to send requests really easily and just form everything in a nice way i should have opened this before i'm not sure why i didn't um i kind of forgot about it so localhost 8000 we're going to the query endpoint then in our query we have the q parameter and then here i want to say what did your grandfather teach you okay and then we send this and we should get a response on the right cool and this is a lot better so it's good and let's try something else um so i think i've read a lot of time he just like talks about the universe so what is the universe and send that takes a while to run at the moment cool so we get a query and then we get answers and it's a well-arranged universe the universe loves to make whatever it is about to be interesting that which knows beginning and end and knows the reason which pervades all substance and through all time by fixed periods revolutions administers the universe yeah it's pretty deep yeah that's cool so i think that's i think it's everything i don't want to include anything else so that's pretty cool so now if we have a look at this again we've just done like this entire section here and the api i mean we'll probably make changes in the future but that's that's kind of all this done so you can cross that off and then the next bit is going to be probably the most difficult bit at least for me which is going to be the front end in angular so should be pretty interesting um so let's see let's see how that goes but for now i think that's it so i'll see you again in the next one

Original Description

▶️ Stoic Q&A App Playlist: https://www.youtube.com/playlist?list=PLIUOU7oqGTLixb-CatMxNCO-mJioMmZEB In this video we work through building an extractive Q&A stack using Haystack, and embedding it within a FastAPI instance in Python. We use the BERT transformer for our reader model, alongside Elasticsearch and the BM25 retriever algorithm. 🤖 70% Discount on the NLP With Transformers in Python course: https://bit.ly/3DFvvY5 🕹️ Free AI-Powered Code Refactoring with Sourcery: https://sourcery.ai/?utm_source=YouTub&utm_campaign=JBriggs&utm_medium=aff

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from James Briggs · James Briggs · 31 of 60

← Previous Next →

Stoic Philosophy Text Generation with TensorFlow

Stoic Philosophy Text Generation with TensorFlow

How to Build TensorFlow Pipelines with tf.data.Dataset

How to Build TensorFlow Pipelines with tf.data.Dataset

Every New Feature in Python 3.10.0a2

Every New Feature in Python 3.10.0a2

How-to Build a Transformer for Language Classification in TensorFlow

How-to Build a Transformer for Language Classification in TensorFlow

How-to use the Kaggle API in Python

How-to use the Kaggle API in Python

Language Generation with OpenAI's GPT-2 in Python

Language Generation with OpenAI's GPT-2 in Python

Text Summarization with Google AI's T5 in Python

Text Summarization with Google AI's T5 in Python

How-to do Sentiment Analysis with Flair in Python

How-to do Sentiment Analysis with Flair in Python

Python Environment Setup for Machine Learning

Python Environment Setup for Machine Learning

Sequential Model - TensorFlow Essentials #1

Sequential Model - TensorFlow Essentials #1

Functional API - TensorFlow Essentials #2

Functional API - TensorFlow Essentials #2

Training Parameters - TensorFlow Essentials #3

Training Parameters - TensorFlow Essentials #3

Input Data Pipelines - TensorFlow Essentials #4

Input Data Pipelines - TensorFlow Essentials #4

6 of Python's Newest and Best Features (3.7-3.9)

6 of Python's Newest and Best Features (3.7-3.9)

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Building a PlotLy $GME Chart in Python

Building a PlotLy $GME Chart in Python

How-to Use The Reddit API in Python

How-to Use The Reddit API in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Q&A Models in Python (Transformers)

How to Build Q&A Models in Python (Transformers)

How-to Decode Outputs From NLP Models (Python)

How-to Decode Outputs From NLP Models (Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Unicode Normalization for NLP in Python

Unicode Normalization for NLP in Python

The NEW Match-Case Statement in Python 3.10

The NEW Match-Case Statement in Python 3.10

Multi-Class Language Classification With BERT in TensorFlow

Multi-Class Language Classification With BERT in TensorFlow

How to Build Python Packages for Pip

How to Build Python Packages for Pip

How-to Structure a Q&A ML App

How-to Structure a Q&A ML App

How to Index Q&A Data With Haystack and Elasticsearch

How to Index Q&A Data With Haystack and Elasticsearch

Q&A Document Retrieval With DPR

Q&A Document Retrieval With DPR

How to Use Type Annotations in Python

How to Use Type Annotations in Python

Extractive Q&A With Haystack and FastAPI in Python

Extractive Q&A With Haystack and FastAPI in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Transformers and PyTorch (Python)

Sentence Similarity With Transformers and PyTorch (Python)

NER With Transformers and spaCy (Python)

NER With Transformers and spaCy (Python)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

New Features in Python 3.10

New Features in Python 3.10

Training BERT #5 - Training With BertForPretraining

Training BERT #5 - Training With BertForPretraining

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

Building MLM Training Input Pipeline - Transformers From Scratch #3

Building MLM Training Input Pipeline - Transformers From Scratch #3

Training and Testing an Italian BERT - Transformers From Scratch #4

Training and Testing an Italian BERT - Transformers From Scratch #4

Faiss - Introduction to Similarity Search

Faiss - Introduction to Similarity Search

Angular App Setup With Material - Stoic Q&A #5

Angular App Setup With Material - Stoic Q&A #5

Why are there so many Tokenization methods in HF Transformers?

Why are there so many Tokenization methods in HF Transformers?

Choosing Indexes for Similarity Search (Faiss in Python)

Choosing Indexes for Similarity Search (Faiss in Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

How LSH Random Projection works in search (+Python)

How LSH Random Projection works in search (+Python)

IndexLSH for Fast Similarity Search in Faiss

IndexLSH for Fast Similarity Search in Faiss

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Product Quantization for Vector Similarity Search (+ Python)

Product Quantization for Vector Similarity Search (+ Python)

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

Metadata Filtering for Vector Search + Latest Filter Tech

Metadata Filtering for Vector Search + Latest Filter Tech

Build NLP Pipelines with HuggingFace Datasets

Build NLP Pipelines with HuggingFace Datasets

Composite Indexes and the Faiss Index Factory

Composite Indexes and the Faiss Index Factory

This video teaches how to build an extractive Q&A stack using Haystack and FastAPI in Python, covering topics like API design, LLM engineering, and prompt engineering. By following this video, viewers can learn how to integrate LLMs with FastAPI and develop effective prompt systems for extractive Q&A.

Key Takeaways

Import FastAPI and initialize the API
Create a GET method for querying the QA model
Add the path to the GET method
Spin up the Uvicorn server to host the API
Test the API endpoint by navigating to it in a browser
Initialize document store with Aurelius index
Initialize reader model with Farm Reader
Initialize pipeline with extractive QA pipeline
Query data with pipeline
Swap DPR retriever for BM25 retriever

💡 Using Haystack with FastAPI enables efficient extractive Q&A, and fine-tuning the model with different retrievers can significantly improve performance.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related Reads

Every Backtracking Problem Is the Same Three Lines. I Just Couldn't See the Tree.

Master backtracking problems with a simple three-line approach to improve problem-solving skills in coding interviews and challenges

Dev.to · Alex Mateo

DSA From Zero to Hero #3: Sliding Window (Fixed Size) Explained With a Java Example

Learn to solve subarray problems efficiently using the sliding window technique, a crucial skill for software engineers and data scientists

Medium · Programming

Prefix Sum: The Pattern Behind Most Subarray Problems

Learn the Prefix Sum pattern to confidently solve most subarray sum problems in coding interviews and real-world applications

Medium · JavaScript

Hash Maps: The Data Structure You’ll Use Most

Mastering hash maps can significantly improve code efficiency and is a crucial data structure to learn for any aspiring software engineer or data scientist

Medium · Programming

Stump Grinder Carbide Wheel Grinds Hardwood To Chips

Innoforge Studio