Understanding Basic Vector Search With KNN | Vector Databases for Beginners | Part 12

Data Science DoJo · Beginner ·🔍 RAG & Vector Search ·4mo ago
Skills: RAG Basics90%

Key Takeaways

This video covers the basics of vector search using the k-Nearest Neighbors (k-NN) algorithm, including how queries and documents are embedded as vectors, measuring similarity through distance, and finding relevant documents through k-NN.

Full Transcript

So the main algorithm I'll start with is KN&N or K nearest neighbors. And this is when you're calculating the distance between a query vector and a document vector for every query and document pair. So, we covered this in the last webinar, but when you turn a query or a document or anything into a vector, you're basically encoding its meaning into this multi-dimensional space. From that sort of string of numbers that represents its meaning, you can do calculations, mathematical calculations that can then represent how similar or not similar two vectors are. And so we can use this in vector search to basically find the most similar documents to a query. So what sort of happens behind the scenes in this is that you will store all your document embeddings. You'll embed them with an embedding model and store all of them in a database and then you'll convert your query when you get it to a vector embedding. And then you can calculate a similarity score for all your documents to the query by measuring the distance between every document in the query. And then basically you'll just return the documents with the highest score. So we can represent vectors in sort of a three-dimensional space. They're multi- many more dimensions than three dimensions usually. But you can see here like if our query was kitten, the most similar vectors would be like cat and dog and wolf. The vectors that are closer in distance, more different vectors like fruits would be farther apart. So closer distance represents higher similarity. And in this way we can do vector search by measuring distance between vectors. There are several different distance metrics that you can use in vector search and I'm not going to go into detail in a lot of them. We have a nice blog post on this if you're interested in details. But what I want to get from this is it's math, right? You're taking a vector and you're doing math things to it. And in this way, you're representing similarity with a score with a math score. Um, and that's basically it. So, keyword search is going to return results that have the exact keyword match, right? So if you're searching for cola, it's going to return all the Coca-Cola products. Vector search, on the other hand, is going to return results with related meaning. So if you search from cola, maybe it'll also return Pepsi or Fanta or things like this. Similarity through meaning, not from exact keywords.

Original Description

Now that we’ve seen the limits of traditional keyword search, let’s look at how vector search changes the game. In this part, we explore the foundation of semantic retrieval — the k-Nearest Neighbors (k-NN) algorithm. In this section, we cover: - How queries and documents are embedded as vectors in multi-dimensional space - What it means to measure similarity through distance - How k-NN helps find the most relevant documents to a query - The difference between exact keyword matches and semantic similarity - Why vector search captures meaning instead of just matching words At its core, vector search is math — but it’s math that understands meaning. By measuring distance between embeddings, we move beyond keywords and into semantic understanding — the foundation of modern search. #VectorSearch #KNN #SemanticRetrieval #Embeddings #SimilaritySearch #AIAlgorithms #MachineLearning #DeepLearningBasics #AIExplained #VectorDatabases #SearchEngineering #InformationRetrieval #SemanticSearch #TechEducation #AIForBeginners Learn data science, AI, and machine learning through our hands-on training programs: https://www.youtube.com/@Datasciencedojo/courses Check our community webinars in this playlist: https://www.youtube.com/playlist?list=PL8eNk_zTBST-EBv2LDSW9Wx_V4Gy5OPFT Check our latest Future of Data and AI Conference: https://www.youtube.com/playlist?list=PL8eNk_zTBST9Wkc6-bczfbClBbSKnT2nI Subscribe to our newsletter for data science content & infographics: https://datasciencedojo.com/newsletter/ Love podcasts? Check out our Future of Data and AI Podcast with industry-expert guests: https://www.youtube.com/playlist?list=PL8eNk_zTBST_jMlmiokwBVfS_BqbAt0z2
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science Dojo · Data Science Dojo · 0 of 60

← Previous Next →
1 Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Data Science Dojo
2 Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Science Dojo
3 Reading External Data Sources | Beginning Azure ML | Part 2
Reading External Data Sources | Beginning Azure ML | Part 2
Data Science Dojo
4 Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Data Science Dojo
5 Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Data Science Dojo
6 Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Data Science Dojo
7 Feature Engineering & R Script | Beginning Azure ML | Part 6
Feature Engineering & R Script | Beginning Azure ML | Part 6
Data Science Dojo
8 Building Your First Model | Beginning Azure ML |  Part 7
Building Your First Model | Beginning Azure ML | Part 7
Data Science Dojo
9 Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Data Science Dojo
10 Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Data Science Dojo
11 Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Data Science Dojo
12 Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Data Science Dojo
13 Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Data Science Dojo
14 Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Data Science Dojo
15 David Wechsler on the Impact of Data Science Bootcamp
David Wechsler on the Impact of Data Science Bootcamp
Data Science Dojo
16 Andrew Choi on the Impact of Data Science Bootcamp
Andrew Choi on the Impact of Data Science Bootcamp
Data Science Dojo
17 Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Data Science Dojo
18 Michael DAndrea on the Impact of Data Science Bootcamp
Michael DAndrea on the Impact of Data Science Bootcamp
Data Science Dojo
19 Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Science Dojo
20 Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Data Science Dojo
21 Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Data Science Dojo
22 Scale R to Big Data with Hadoop & Spark | Community Webinar
Scale R to Big Data with Hadoop & Spark | Community Webinar
Data Science Dojo
23 Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Data Science Dojo
24 Ryan DeMartino on the Impact of Data Science Bootcamp
Ryan DeMartino on the Impact of Data Science Bootcamp
Data Science Dojo
25 Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Data Science Dojo
26 Wade Wimer on the Impact of Data Science Bootcamp
Wade Wimer on the Impact of Data Science Bootcamp
Data Science Dojo
27 Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Data Science Dojo
28 Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Data Science Dojo
29 Lance Milner on the Impact of Data Science Bootcamp
Lance Milner on the Impact of Data Science Bootcamp
Data Science Dojo
30 Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Data Science Dojo
31 Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Data Science Dojo
32 Michael Atlin on the Impact of Data Science Bootcamp
Michael Atlin on the Impact of Data Science Bootcamp
Data Science Dojo
33 Amina Tariq's In-Person Experience at Data Science Bootcamp
Amina Tariq's In-Person Experience at Data Science Bootcamp
Data Science Dojo
34 Ceo's Revelation about Data Science Bootcamp
Ceo's Revelation about Data Science Bootcamp
Data Science Dojo
35 Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Data Science Dojo
36 Kevin Hillaker on the Impact of Data Science Bootcamp
Kevin Hillaker on the Impact of Data Science Bootcamp
Data Science Dojo
37 Marko Topalovic's Experience with Data Science Bootcamp
Marko Topalovic's Experience with Data Science Bootcamp
Data Science Dojo
38 Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Data Science Dojo
39 Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Data Science Dojo
40 Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Data Science Dojo
41 Vang Xiong on the Impact of Data Science Bootcamp
Vang Xiong on the Impact of Data Science Bootcamp
Data Science Dojo
42 Data Scientist's Experience at Our Data Science Bootcamp
Data Scientist's Experience at Our Data Science Bootcamp
Data Science Dojo
43 Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Data Science Dojo
44 Introduction To Titanic Kaggle Competition | Part 1
Introduction To Titanic Kaggle Competition | Part 1
Data Science Dojo
45 Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Data Science Dojo
46 Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Data Science Dojo
47 How To Do Titanic Kaggle Competition in R | Part 3.1
How To Do Titanic Kaggle Competition in R | Part 3.1
Data Science Dojo
48 How to do the Titanic Kaggle competition in R | Part 3.1
How to do the Titanic Kaggle competition in R | Part 3.1
Data Science Dojo
49 Delve Deeper into Data Science with Data Science Bootcamp
Delve Deeper into Data Science with Data Science Bootcamp
Data Science Dojo
50 Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Data Science Dojo
51 Shaena Montanari on the Impact of Data Science Bootcamp
Shaena Montanari on the Impact of Data Science Bootcamp
Data Science Dojo
52 Types of Sampling | Introduction to Data Mining | Part 12
Types of Sampling | Introduction to Data Mining | Part 12
Data Science Dojo
53 Sampling for Data Selection | Introduction to Data Mining | Part 11
Sampling for Data Selection | Introduction to Data Mining | Part 11
Data Science Dojo
54 Data Aggregation | Introduction to Data Mining | Part 10
Data Aggregation | Introduction to Data Mining | Part 10
Data Science Dojo
55 Data Cleaning | Introduction to Data Mining | Part 9
Data Cleaning | Introduction to Data Mining | Part 9
Data Science Dojo
56 Missing & Duplicated Data | Introduction to Data Mining | Part 8
Missing & Duplicated Data | Introduction to Data Mining | Part 8
Data Science Dojo
57 Data Noise | Introduction to Data Mining | Part 7
Data Noise | Introduction to Data Mining | Part 7
Data Science Dojo
58 Graph and Ordered Data | Introduction to Data Mining | Part 5
Graph and Ordered Data | Introduction to Data Mining | Part 5
Data Science Dojo
59 Document Data & Transaction Data | Introduction to Data Mining | Part 4
Document Data & Transaction Data | Introduction to Data Mining | Part 4
Data Science Dojo
60 Data Quality | Introduction to Data Mining | Part 6
Data Quality | Introduction to Data Mining | Part 6
Data Science Dojo

This video teaches the foundation of semantic retrieval using the k-NN algorithm, allowing viewers to understand how vector search captures meaning instead of just matching words.

Key Takeaways
  1. Embed queries and documents as vectors in multi-dimensional space
  2. Measure similarity through distance
  3. Use k-NN to find relevant documents
  4. Understand the difference between exact keyword matches and semantic similarity
💡 Vector search captures meaning instead of just matching words by measuring distance between embeddings

Related Reads

📰
RAG Is Not a Feature. It's a System, and These Are the Parts Nobody Demos.
Learn how RAG is a system, not a feature, and understand its key components beyond demos
Dev.to · Marketing wizr
📰
What Is RAG? The AI Technology That Makes ChatGPT Smarter Without Retraining
Learn about RAG, the AI technology that enhances ChatGPT's capabilities without requiring retraining, and why it matters for advancing language models
Medium · RAG
📰
Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On
Learn the limitations of linear RAG pipelines and how agentic workflows are becoming a popular alternative for more efficient and effective AI workflows
Medium · AI
📰
Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On
Learn why linear RAG pipelines have limitations and how Agentic workflows are becoming a preferred alternative in the industry
Medium · Machine Learning
Up next
RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python
Professor Py: AI Engineering
Watch →