Understanding Basic Vector Search With KNN | Vector Databases for Beginners | Part 12
Skills:
RAG Basics90%
Key Takeaways
This video covers the basics of vector search using the k-Nearest Neighbors (k-NN) algorithm, including how queries and documents are embedded as vectors, measuring similarity through distance, and finding relevant documents through k-NN.
Full Transcript
So the main algorithm I'll start with is KN&N or K nearest neighbors. And this is when you're calculating the distance between a query vector and a document vector for every query and document pair. So, we covered this in the last webinar, but when you turn a query or a document or anything into a vector, you're basically encoding its meaning into this multi-dimensional space. From that sort of string of numbers that represents its meaning, you can do calculations, mathematical calculations that can then represent how similar or not similar two vectors are. And so we can use this in vector search to basically find the most similar documents to a query. So what sort of happens behind the scenes in this is that you will store all your document embeddings. You'll embed them with an embedding model and store all of them in a database and then you'll convert your query when you get it to a vector embedding. And then you can calculate a similarity score for all your documents to the query by measuring the distance between every document in the query. And then basically you'll just return the documents with the highest score. So we can represent vectors in sort of a three-dimensional space. They're multi- many more dimensions than three dimensions usually. But you can see here like if our query was kitten, the most similar vectors would be like cat and dog and wolf. The vectors that are closer in distance, more different vectors like fruits would be farther apart. So closer distance represents higher similarity. And in this way we can do vector search by measuring distance between vectors. There are several different distance metrics that you can use in vector search and I'm not going to go into detail in a lot of them. We have a nice blog post on this if you're interested in details. But what I want to get from this is it's math, right? You're taking a vector and you're doing math things to it. And in this way, you're representing similarity with a score with a math score. Um, and that's basically it. So, keyword search is going to return results that have the exact keyword match, right? So if you're searching for cola, it's going to return all the Coca-Cola products. Vector search, on the other hand, is going to return results with related meaning. So if you search from cola, maybe it'll also return Pepsi or Fanta or things like this. Similarity through meaning, not from exact keywords.
Original Description
Now that we’ve seen the limits of traditional keyword search, let’s look at how vector search changes the game.
In this part, we explore the foundation of semantic retrieval — the k-Nearest Neighbors (k-NN) algorithm.
In this section, we cover:
- How queries and documents are embedded as vectors in multi-dimensional space
- What it means to measure similarity through distance
- How k-NN helps find the most relevant documents to a query
- The difference between exact keyword matches and semantic similarity
- Why vector search captures meaning instead of just matching words
At its core, vector search is math — but it’s math that understands meaning.
By measuring distance between embeddings, we move beyond keywords and into semantic understanding — the foundation of modern search.
#VectorSearch #KNN #SemanticRetrieval #Embeddings
#SimilaritySearch #AIAlgorithms #MachineLearning #DeepLearningBasics
#AIExplained #VectorDatabases #SearchEngineering #InformationRetrieval
#SemanticSearch #TechEducation #AIForBeginners
Learn data science, AI, and machine learning through our hands-on training programs: https://www.youtube.com/@Datasciencedojo/courses
Check our community webinars in this playlist: https://www.youtube.com/playlist?list=PL8eNk_zTBST-EBv2LDSW9Wx_V4Gy5OPFT
Check our latest Future of Data and AI Conference: https://www.youtube.com/playlist?list=PL8eNk_zTBST9Wkc6-bczfbClBbSKnT2nI
Subscribe to our newsletter for data science content & infographics: https://datasciencedojo.com/newsletter/
Love podcasts? Check out our Future of Data and AI Podcast with industry-expert guests: https://www.youtube.com/playlist?list=PL8eNk_zTBST_jMlmiokwBVfS_BqbAt0z2
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data Science Dojo · Data Science Dojo · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Data Science Dojo
Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Science Dojo
Reading External Data Sources | Beginning Azure ML | Part 2
Data Science Dojo
Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Data Science Dojo
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Data Science Dojo
Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Data Science Dojo
Feature Engineering & R Script | Beginning Azure ML | Part 6
Data Science Dojo
Building Your First Model | Beginning Azure ML | Part 7
Data Science Dojo
Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Data Science Dojo
Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Data Science Dojo
Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Data Science Dojo
Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Data Science Dojo
Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Data Science Dojo
Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Data Science Dojo
David Wechsler on the Impact of Data Science Bootcamp
Data Science Dojo
Andrew Choi on the Impact of Data Science Bootcamp
Data Science Dojo
Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Data Science Dojo
Michael DAndrea on the Impact of Data Science Bootcamp
Data Science Dojo
Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Science Dojo
Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Data Science Dojo
Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Data Science Dojo
Scale R to Big Data with Hadoop & Spark | Community Webinar
Data Science Dojo
Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Data Science Dojo
Ryan DeMartino on the Impact of Data Science Bootcamp
Data Science Dojo
Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Data Science Dojo
Wade Wimer on the Impact of Data Science Bootcamp
Data Science Dojo
Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Data Science Dojo
Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Data Science Dojo
Lance Milner on the Impact of Data Science Bootcamp
Data Science Dojo
Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Data Science Dojo
Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Data Science Dojo
Michael Atlin on the Impact of Data Science Bootcamp
Data Science Dojo
Amina Tariq's In-Person Experience at Data Science Bootcamp
Data Science Dojo
Ceo's Revelation about Data Science Bootcamp
Data Science Dojo
Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Data Science Dojo
Kevin Hillaker on the Impact of Data Science Bootcamp
Data Science Dojo
Marko Topalovic's Experience with Data Science Bootcamp
Data Science Dojo
Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Data Science Dojo
Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Data Science Dojo
Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Data Science Dojo
Vang Xiong on the Impact of Data Science Bootcamp
Data Science Dojo
Data Scientist's Experience at Our Data Science Bootcamp
Data Science Dojo
Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Data Science Dojo
Introduction To Titanic Kaggle Competition | Part 1
Data Science Dojo
Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Data Science Dojo
Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Data Science Dojo
How To Do Titanic Kaggle Competition in R | Part 3.1
Data Science Dojo
How to do the Titanic Kaggle competition in R | Part 3.1
Data Science Dojo
Delve Deeper into Data Science with Data Science Bootcamp
Data Science Dojo
Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Data Science Dojo
Shaena Montanari on the Impact of Data Science Bootcamp
Data Science Dojo
Types of Sampling | Introduction to Data Mining | Part 12
Data Science Dojo
Sampling for Data Selection | Introduction to Data Mining | Part 11
Data Science Dojo
Data Aggregation | Introduction to Data Mining | Part 10
Data Science Dojo
Data Cleaning | Introduction to Data Mining | Part 9
Data Science Dojo
Missing & Duplicated Data | Introduction to Data Mining | Part 8
Data Science Dojo
Data Noise | Introduction to Data Mining | Part 7
Data Science Dojo
Graph and Ordered Data | Introduction to Data Mining | Part 5
Data Science Dojo
Document Data & Transaction Data | Introduction to Data Mining | Part 4
Data Science Dojo
Data Quality | Introduction to Data Mining | Part 6
Data Science Dojo
More on: RAG Basics
View skill →Related Reads
📰
📰
📰
📰
RAG Is Not a Feature. It's a System, and These Are the Parts Nobody Demos.
Dev.to · Marketing wizr
What Is RAG? The AI Technology That Makes ChatGPT Smarter Without Retraining
Medium · RAG
Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On
Medium · AI
Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On
Medium · Machine Learning
🎓
Tutor Explanation
DeepCamp AI