Offline vector search with SQLite and EmbeddingGemma

Chrome for Developers · Beginner ·🔍 RAG & Vector Search ·7mo ago

Key Takeaways

The video demonstrates how to build an offline vector search system using SQLite and EmbeddingGemma, allowing for retrieval augmented generation (RAG) to run in the browser offline. It covers the use of local models, vector stores, and SQLite extensions for vector support.

Full Transcript

[music] How's it going? My name is Rodie and I am a developer relations engineer at Google working on the AI workflows team. Super excited to be here today at the WebAI summit. Something I'm very passionate about is ondevice and local first types of applications. And so before we get started, let's talk about vectors and databases. Vectors, as you know, can be generated um with both hosted and local models. There's a lot of trade-offs that go for each one of those and typically uh will require specific needs of your application. So, vector stores can grow quite large and often require an API to access. And while that's fine for certain types of applications, uh it may not be ideal when you have intermittent network connectivity, for example. Uh one important thing about vectors uh if you're not familiar with them is you have to have the same encoder and decoder uh both wherever you use to query and to update documents. And this is really important because you can't take advantage of like a really powerful um encoder and then a very lightweight decoder. You have to use the same um which was kind of frustrating when I uh was first getting into it. And then another thing that uh server side can really take advantage of is they can be so much faster because they have a lot of RAM. They're optimized for NVME storage. But I've listed a lot of pros on the server side, but why would you even want them on the client? Well, first of all, you can store the vectors just for the user. You never have to worry about running a query and getting some sort of uh dimensionality for uh content that's not theirs. You can also just have the advantage of it already being partitioned for that user on the client. So with most things that require trade-offs, usually a hybrid approach is more appropriate. And in this case, we can use the server side to have some nice parallel compute to be able to batch encode a bunch of vectors. And we can store them inside of Firebase using Fire Store vector support which was added. And of course there's vector databases that already work um a lot better for vectors, but one of the nice things about fire store is it gives us a nice syncing uh modality that we can use on the client that we can store everything in a bucket per user. Um and really it's meant to be a fallback when the model isn't downloaded. I'm a huge fan of SQLite and one of the cool things about SQLite is you can load extensions including vector support. So we can actually pull in those vectors from fire store into SQLite and then you can query them directly on the client. But here's where the magic really starts to happen. When you go to update models, you can use that local encoder and decoder to incrementally regenerate new documents. This makes it really nice to pull down a massive data set and then as the user is making changes and edits, you get to keep that up to date without having to do that round trip and requiring internet always. So, embedding Gemma is a super awesome encoder and decoder uh that we have launched and I really love it. It's about 38 million parameters. It's meant to be run on mobile devices but just because you can do that doesn't mean you can't use it on the server which is really awesome including support for things like cloudr run where we make it really easy to launch it on uh with Olama. So you can have a nice fallback API when the model isn't downloaded yet and you just want to have this kind of ad hoc experience. One of the reasons I like using it is it has 768 dimensions. So it has a very significant uh quality for the types of uh tasks that you can throw at it. It's still configurable and uh the just the whole Gemma family is really awesome. Uh but well I know a lot of people today have talked about Jimma 3N and you can totally use that with this but for this talk it's just going to be on the the database side and vector support uh without LLMs. Another cool thing about uh these models is you can use transformers.js which was uh talked about many times today. It allows us to use the CPU and GPU to run inference on these uh encoder models and it also supports the 768 dimension space that embedding Gemma can use and output for the vectors. Here's a uh code snippet on how you would get this running with embedding Gemma. I'm using the uh Onyx runtime for the embedding Gemma version of it, the 300 uh million parameter option. And here we can just create a simple pipeline that uses feature extraction as well as being able to take that embeder give it the correct task type which it can be query or document or others listed on the documentation and then we just kind of uh normalize the vectors before we return it back and since we're on the web it's important to return it as a float 32 array because that's what SQLite's also going to expect um for the storage as well as fire store. So like I said, fire store supports vectors which is awesome. It makes it really easy to sync. Uh when fire store will first load into your application, it'll pull down the documents that you have queried for that user. And as you make updates, fire store takes care of all of the work of if you update a single document on the server side, it will just pull down the incremental uh patches as well as making updates can send it back up to the server. So you don't have to manage any complex sync logic on your side. But they also launched vector support which means you can literally add the vector type directly into those documents keeping it collocated with that user and their collections. So here's just a simple snippet of how you might do that in fire store uh using the the modular JavaScript SDK. You can just create a fire store application using the uh the app that you initialize and in this case it's an emoji application and you have the embedding which you can then add the doc and then use the vector type which you can import as well from the SDK. So SQLite huge fan uh there's a really cool project called SQLite VEC if you're not familiar with it I definitely suggest you give it a look. uh it allows us to use low-level KN&N queries uh directly inside of SQLite by extending the syntax. Uh this project has also expanded a lot since the first version. It now has metadata filtering, partitioning and virtual columns and so much more. But this allows us to create those embeddings directly into SQLite. Now, you can also store the the blobs of the float 32 directly inside of um regular tables, but one of the cool things about the virtual tables is it's optimized for those queries. So, it doesn't have to scan do a full table scan every time you uh do a query. Also, SQLite compiles to Wom and you can add any extensions that you have inside of that. So, um, in this example that I'm going to share on GitHub later, uh, it has SQLite vec pre-installed, but you can totally add your custom ones as well. So, here's an example of how you might do that in SQLite. Uh, we're importing the official uh, SQLite package here uh, from SQLite.org as well as just pulling down the Wom module. You can just create it like another table using the VEX0ero uh, table syntax. And this allows us to have that float 768 dimension syntax. And you would obviously change this for the type of uh encoder and decoder you're using. But that's it. You just work with it like a normal SQLite database if you're familiar with that. But this is all happening on the client. It can do massive data sets. It's often that you can run millions of queries in just like a second on uh on the browser. So definitely suggest giving it a look. So when it comes to querying, it's also very similar to SQL. I know this may not be familiar for everyone, but uh as a a mobile developer and someone who likes to build applications, writing SQL queries on the client, knowing it's just the data set makes it really easy to create the types of views that I want. And in this case, I just query from the emojis embedding table. I join on that foreign key and then here's where the magic comes in with the match keyword, which is using the ve0 functions as well as the KN&N queries with the limit. and then we can order it by the distance and then uh grab it out and present it to the user later. So, time for a demo. This is a uh a little bit different take than the other demo from earlier which was about using embedding Gemma for um emojis. I want to create a better vector search for emoji emojis. So, I took the entire Unicode data set and I vectorized each of the descriptions with the emojis. So as you're typing it returns the emojis that are closest to that embedding space based on what your query is and each time you on key press it will actually vectorize the query itself. So this model once it gets downloaded onto the browser this can happen completely offline. So you can obviously expand this to other applications where you can have uh documents that you pull in for your business data or just specific uh types of tool calls like for example you can vectorize a thousand tool definitions and only provide maybe five to the model at any given time. It really opens up and expands the types of use cases that you can build. Um this code is available uh on my GitHub. You can check it out at emojis search. Uh I am uh usually pretty available on GitHub and Twitter and LinkedIn. So uh definitely feel free to reach out. But uh thanks so much. [music]

Original Description

Learn from Rody Davis, Senior Developer Relations Engineer at Google, how to query and embed documents using SQLite and embeddings with EmbeddingGemma and Gemma3. Create an offline RAG system that runs in the browser offline. Resources: Github → https://goo.gle/4p2b3b1 See more Web AI talks → https://goo.gle/web-ai Subscribe to Chrome for Developers → https://goo.gle/ChromeDevs Event: Web AI Summit 2025 Speaker: Rody Davis Products Mentioned: AI for the web, Gemma 3 #ChromeforDevelopers #WebAI
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Chrome for Developers · Chrome for Developers · 0 of 60

← Previous Next →
1 Polymer Performance Patterns (The Polymer Summit 2015)
Polymer Performance Patterns (The Polymer Summit 2015)
Chrome for Developers
2 Polymer Power Tools (The Polymer Summit 2015)
Polymer Power Tools (The Polymer Summit 2015)
Chrome for Developers
3 Chrome Dev Summit 2014 – Chrome Case Studies
Chrome Dev Summit 2014 – Chrome Case Studies
Chrome for Developers
4 Web Directions Code 2015 round up
Web Directions Code 2015 round up
Chrome for Developers
5 Maintainable Code - HTTP203
Maintainable Code - HTTP203
Chrome for Developers
6 iron-ajax… wat?! -- Polycasts #26
iron-ajax… wat?! -- Polycasts #26
Chrome for Developers
7 The Guardian - Supercharged
The Guardian - Supercharged
Chrome for Developers
8 ES2015 (next version of JavaScript), Totally Tooling Tips (S2 Ep1)
ES2015 (next version of JavaScript), Totally Tooling Tips (S2 Ep1)
Chrome for Developers
9 #AskPolymer: Rob answers all the questions ever -- Polycasts #27
#AskPolymer: Rob answers all the questions ever -- Polycasts #27
Chrome for Developers
10 The Future of JavaScript - HTTP203
The Future of JavaScript - HTTP203
Chrome for Developers
11 Data Binding 101 -- Polycasts #28
Data Binding 101 -- Polycasts #28
Chrome for Developers
12 The Guardian part 2 - Supercharged
The Guardian part 2 - Supercharged
Chrome for Developers
13 The Future of Web Audio: with Chris Wilson and Chris Lowis
The Future of Web Audio: with Chris Wilson and Chris Lowis
Chrome for Developers
14 Chrome 46: New motion-path animations, client hints and service worker improvements
Chrome 46: New motion-path animations, client hints and service worker improvements
Chrome for Developers
15 Sublime Snippets, Totally Tooling Tips (S2 Ep2)
Sublime Snippets, Totally Tooling Tips (S2 Ep2)
Chrome for Developers
16 #AskPolymer: How do you make the show? -- Polycasts #29
#AskPolymer: How do you make the show? -- Polycasts #29
Chrome for Developers
17 Critical Path CSS, Totally Tooling Tips (S2 Mini Tip #1)
Critical Path CSS, Totally Tooling Tips (S2 Mini Tip #1)
Chrome for Developers
18 Binding to Objects -- Polycasts #30
Binding to Objects -- Polycasts #30
Chrome for Developers
19 Player FM - Supercharged
Player FM - Supercharged
Chrome for Developers
20 Where’s the Designer? #AskPolymer -- Polycasts #31
Where’s the Designer? #AskPolymer -- Polycasts #31
Chrome for Developers
21 Jake Beats Wikipedia - HTTP203
Jake Beats Wikipedia - HTTP203
Chrome for Developers
22 Supercharged Observers! -- Polycasts #32
Supercharged Observers! -- Polycasts #32
Chrome for Developers
23 Jai's Web blog - Supercharged
Jai's Web blog - Supercharged
Chrome for Developers
24 Windows Command-line Tooling, Totally Tooling Tips (S2, Ep4)
Windows Command-line Tooling, Totally Tooling Tips (S2, Ep4)
Chrome for Developers
25 What about internationalization? #AskPolymer -- Polycasts #33
What about internationalization? #AskPolymer -- Polycasts #33
Chrome for Developers
26 Developing for Billions (Chrome Dev Summit 2015)
Developing for Billions (Chrome Dev Summit 2015)
Chrome for Developers
27 Google+ Performance Improvement Comparison
Google+ Performance Improvement Comparison
Chrome for Developers
28 Deploying HTTPS: The Green Lock and Beyond (Chrome Dev Summit 2015)
Deploying HTTPS: The Green Lock and Beyond (Chrome Dev Summit 2015)
Chrome for Developers
29 Progressive Web Apps (Chrome Dev Summit 2015)
Progressive Web Apps (Chrome Dev Summit 2015)
Chrome for Developers
30 Instant Loading with Service Workers (Chrome Dev Summit 2015)
Instant Loading with Service Workers (Chrome Dev Summit 2015)
Chrome for Developers
31 Increase Engagement with Web Push Notifications (Chrome Dev Summit 2015)
Increase Engagement with Web Push Notifications (Chrome Dev Summit 2015)
Chrome for Developers
32 Engaging with the Real World: Web Bluetooth and Physical Web (Chrome Dev Summit 2015)
Engaging with the Real World: Web Bluetooth and Physical Web (Chrome Dev Summit 2015)
Chrome for Developers
33 Asking for Permission: respectful, opinionated UI (Chrome Dev Summit 2015)
Asking for Permission: respectful, opinionated UI (Chrome Dev Summit 2015)
Chrome for Developers
34 Polymer - State of the Union (Chrome Dev Summit 2015)
Polymer - State of the Union (Chrome Dev Summit 2015)
Chrome for Developers
35 Building Progressive Web Apps with Polymer (Chrome Dev Summit 2015)
Building Progressive Web Apps with Polymer (Chrome Dev Summit 2015)
Chrome for Developers
36 Introduction to RAIL (Chrome Dev Summit 2015)
Introduction to RAIL (Chrome Dev Summit 2015)
Chrome for Developers
37 DevTools in 2015: Authoring to the max (Chrome Dev Summit 2015)
DevTools in 2015: Authoring to the max (Chrome Dev Summit 2015)
Chrome for Developers
38 RAIL in the real world (Chrome Dev Summit 2015)
RAIL in the real world (Chrome Dev Summit 2015)
Chrome for Developers
39 #ChromeDevSummit talks are up - W00T! -- Polycast #34
#ChromeDevSummit talks are up - W00T! -- Polycast #34
Chrome for Developers
40 V8 Performance from the Driver's Seat (Chrome Dev Summit 2015)
V8 Performance from the Driver's Seat (Chrome Dev Summit 2015)
Chrome for Developers
41 Quantify and improve real-world RAIL (Chrome Dev Summit 2015)
Quantify and improve real-world RAIL (Chrome Dev Summit 2015)
Chrome for Developers
42 Owning your performance: RAIL (Chrome Dev Summit 2015)
Owning your performance: RAIL (Chrome Dev Summit 2015)
Chrome for Developers
43 HTTP/2 101 (Chrome Dev Summit 2015)
HTTP/2 101 (Chrome Dev Summit 2015)
Chrome for Developers
44 Leadership Panel (Chrome Dev Summit 2015)
Leadership Panel (Chrome Dev Summit 2015)
Chrome for Developers
45 Build Processes, Totally Tooling Tips (S2, Ep 5)
Build Processes, Totally Tooling Tips (S2, Ep 5)
Chrome for Developers
46 Accessibility (Chrome Dev Summit 2015)
Accessibility (Chrome Dev Summit 2015)
Chrome for Developers
47 Binding to Arrays -- Polycasts #35
Binding to Arrays -- Polycasts #35
Chrome for Developers
48 HTTP2 - HTTP203
HTTP2 - HTTP203
Chrome for Developers
49 Chrome 47: Splash Screens, requestIdleCallback and better desktop notifications (New in Chrome)
Chrome 47: Splash Screens, requestIdleCallback and better desktop notifications (New in Chrome)
Chrome for Developers
50 Call For Submissions - Supercharged
Call For Submissions - Supercharged
Chrome for Developers
51 Cross Device Testing, Totally Tooling Tips (S2 Ep6)
Cross Device Testing, Totally Tooling Tips (S2 Ep6)
Chrome for Developers
52 Testing AJAX with Web Component Tester -- Polycasts #37
Testing AJAX with Web Component Tester -- Polycasts #37
Chrome for Developers
53 Slack: Extended Xmas Special - Supercharged
Slack: Extended Xmas Special - Supercharged
Chrome for Developers
54 Browser testing with Travis & Sauce Labs -- Polycasts #38
Browser testing with Travis & Sauce Labs -- Polycasts #38
Chrome for Developers
55 Optimize for production with Vulcanize -- Polycasts #39
Optimize for production with Vulcanize -- Polycasts #39
Chrome for Developers
56 Highlights from Chrome Dev Summit 2015
Highlights from Chrome Dev Summit 2015
Chrome for Developers
57 Chrome 48: Custom buttons in notifications, DevTools Security panel, and Presentation mode
Chrome 48: Custom buttons in notifications, DevTools Security panel, and Presentation mode
Chrome for Developers
58 Crisper: Protecting your Polymer app with CSP -- Polycasts #40
Crisper: Protecting your Polymer app with CSP -- Polycasts #40
Chrome for Developers
59 How do I use Sass with Polymer? #AskPolymer -- Polycasts #41
How do I use Sass with Polymer? #AskPolymer -- Polycasts #41
Chrome for Developers
60 Colors – DevTools Tonight #0 (Pilot)
Colors – DevTools Tonight #0 (Pilot)
Chrome for Developers

Learn how to build an offline vector search system using SQLite and EmbeddingGemma, enabling RAG to run in the browser offline. This system allows for local model usage, vector storage, and query execution directly on the client.

Key Takeaways
  1. Load SQLite extensions for vector support
  2. Use EmbeddingGemma for vector generation
  3. Create an emoji embedding table
  4. Query the table using SQL-like syntax
  5. Join on foreign key
  6. Use match keyword with ve0 functions and KN&N queries
  7. Order results by distance
  8. Present results to user
💡 By using SQLite and EmbeddingGemma, developers can create offline vector search systems that run in the browser, enabling RAG capabilities without requiring an API or internet connection.

Related Reads

📰
RAG on Google Cloud in Regulated Environments: A Lifecycle Playbook from Inception to…
Learn to implement RAG on Google Cloud in regulated environments with a lifecycle playbook
Medium · Machine Learning
📰
Solving One of the Hardest Problems in Code RAG: Context Retrieval
Learn to solve context retrieval in code RAG systems, a crucial challenge in automation code generation, and improve your skills in RAG and code analysis.
Medium · RAG
📰
Practical RAG, Part 1: The Simplest RAG That Actually Works
Learn to build a simple Retrieval-Augmented Generation pipeline from scratch in Python and understand its limitations
Dev.to · Suman Nath
📰
What Is Retrieval-Augmented Generation (RAG)? A Complete Guide for Businesses
Learn how Retrieval-Augmented Generation (RAG) helps businesses get accurate answers from AI, moving beyond confident guesses
Medium · AI
Up next
RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python
Professor Py: AI Engineering
Watch →