Word2Vec & Friends with Bruno Gonçalves -#48

The TWIML AI Podcast with Sam Charrington · Beginner ·🔍 RAG & Vector Search ·8y ago

Key Takeaways

This video features an interview with Bruno Gonçalves, a Moore-Sloan Data Science Fellow at NYU, discussing Word2Vec and its applications in natural language processing. The conversation covers the basics of Word2Vec, its architecture, and its uses in capturing semantic relationships between words. Bruno also touches on other topics such as graph embeddings and the challenges of training high-quality word embeddings.

Full Transcript

[Music] hello and welcome to another episode of we'll talk the podcast where I interview interesting people doing interesting things in machine learning and artificial intelligence I'm your host Sam Cherrington a big thanks to everyone who joined us last week for our second twimble online Meetup led by Nicola coochie Java we discussed the paper learning long-term dependencies with gradient descent is difficult by yoshua bengio and company i had a great time and learned a ton for those who weren't able to attend live we have the video posted for you on twilly i comm / Meetup and if you're interested in joining us next month please head on over to that site to get signed up we're also accepting presenters so feel free to shoot me a note with your ideas next up the time has come for the artificial intelligence conference brought to you by O'Reilly and Intel nirvana I'll be in San Francisco the entire week next week and we have a ton of great interviews lined up so this should be another awesome series if you haven't had a chance to check out our series from the New York event you can find it at Twilio comm / o'reilly AI + y also if you'll be at the conference please send me a shout-out on Twitter or via email I'd love to connect see you then okay about our show this week I'm bringing you an interview with Bruno Gonzalez a Morse loan data science fellow at NYU as you hear in the interview Bruno is a longtime listener of the podcast we were able to connect at the nya I conference back in June after I noted on a previous show that I was interested in learning more about word Tyvek Bruno graciously agreed to come on the show and walk us through an overview of word embeddings were Tyvek and a bunch of related ideas he provided a great overview of not only were Tyvek but related natural language processing concepts such as skip Graham continuous bag-of-words no Tyvek tf-idf and much more by the time you hear this it'll be too late to catch it live but Bruno is doing a half-day tutorial on word Tyvek and Friends on Monday at the AI conference if you'd like to see it I'm sure it'll be available via the O'Reilly Safari website and now on to the show alright everyone so I am here on location at the O'Reilly AI conference and I have the pleasure of being here with Bruno Gonzalez who is apparently a longtime listener of the podcast and reached out to me after hearing me note that I wanted to learn more about word Tyvek and embeddings and made it happen so we're here to talk about that welcome Bruno thank you it's a pleasure to be here as I've been listening for a very long time and I'm very happy to be here and be able to participate awesome awesome I was just asking you you mentioned that that you've been listening since before the interview format and I don't know I'm still stuck on like the transition from the news format to the interview format how was that for you like that what what's your take on kind of the before and after no I I like the new interview format actually I mean yeah but this might be a personal bias and uh-huh when it was news a lot of the news I'd already seen or listening somewhere else mm-hm so it wasn't as as new in this way since you're interviewing people that I don't necessarily listen to it I know personally why it's good to have the fresh view from them that's great that's great like I said I was you know again I guess I still am hung up on the fact that I switched the format but I've been told by people that things along the same lines that the news is essentially commoditized and it's hard to you know beat out TechCrunch or whatever that people count on for their news but the folks really seem to be enjoying the interview format and so you know again it's great to have you on for it anything instead of talking about the news so tell us a little bit about kind of your background and what you're up to so I'm a-saying I'm originally a physicist but that model has been involved in optimization problems originally in spin glasses optimization and data in glasses in glasses so it's this it actually has interesting connections with neural networks and how they work but okay basically a disordered magnetic system okay to put it very very simply but the mathematical structure is very rich and it leads itself to exploring different types of optimization problems okay from there I've evolved very quickly and this was because I noticed that in spin glasses a lot of the behavior you see has to do with the way that the different elements are connected mm-hmm I moved very quickly towards networks and studying basically connections between components of a system okay more deeply and when you working with network and complex networks in general what you do a lot is basically data mining your your crawling website your parsing Apache log to look for connections between URLs for example mmm you basically doing in a sense applied graph theory okay but applied to real data so the transition towards data came very early and very naturally even before I mean people were talking about data science this was back in maybe 2006-2007 okay and then more recently I've been moving more towards more data science intensive aspects and hence my interest in the world to vacant that type of of algorithms okay you know I think the best way to do this is to just jump in and have you kind of walk us through were Tyvek but the kind of the foundational thinking that led to were to Veck and embeddings as i mentioned the other day it's it's an area that I've been meaning to dive into and I'm glad you're you're here to tell us about it exit so the original ideas actually as many things doing this field comes from outside the field so it comes from linguistics okay and it was expressed more or less very in a very general way by James first if I'm not mistaking around in 1953 okay and basically what he said is you know the meaning of a word by the company it keeps mmm so you can understand whatever means you can look look at the context and the context gives clues about what the word says what intends to say so we're to vectorize to take advantage of this by saying that the embedding of a word is is defined by the context in which it appears so words that appear in the same context mm-hmm are more somatic equivalent mm-hmm so they will have vectors that are close to each other while Rohr is the tend to appear in very different contexts will be very different and have very different vectors mm-hmm or vector representations in this case and so how do you get to the word vectors I guess my impression is that you're it's all relative to a specific corpus right there's not like a you know we haven't done word to back on everything like some grand unification of word Tyvek is that the right way to think about it and then you're doing some math on that corpus to get you to the vectors can you talk a little bit about that process yes so basically what it does it's a very simple neural network so okay it's just as one hidden layer the activation function is linear so it just passes through it's very simple what the the actual wording beddings are basically the weight on this hidden layer mm-hmm depending on which model can be skipped grammar can be continuous bag of words it can be the weight on the leading to the hidden layer can be the weights outside heading out of the hidden layer but in practice it's a mathematically and so the Skip Ram referred to one of those and continuous bag of words refer to the escaped program is leading in with staining in and yes so the way you look at what first was saying that you know a word by the community keeps is if you have the word you can kind of guess the context in which it appears right or if you have the context you can kind of guess what word would fit in the middle hmm so the Skip Graham in the continues bag of words basically look at these two approaches on one case your input is the word and you're trying to guess in the output layer what the context is on the other case you have the context words as input and you're trying to guess what is the word that might go in the middle okay and here just to clarify the context is usually defined by your window of worlds before that word and the words after that word mm-hmm so if you're looking at word I so you you context would be for instance I minus 1 I minus 2 and I plus 1 I plus 2 towards before towards after or five words before five words after okay and so the window is what you used to compute the vector difference like that presentation for a given work yes so that window will give you what is the context and the context would define what is the word okay now like you're saying people haven't done were to back in all the text in the world every time you run this you get different vectors right also if you run it twice in the same corpus the vectors would be different the reason for this is you're initializing all the weights or the vectors random randomly right and then you're adjusting them okay so you will end up with something different however if you do it well enough and it converges and you get something that's reasonable the vectors will be different but they'll basically be a rotation so you can always get vectors trained into corpuses and align them so that they match mm-hmm because though what the work algorithm and similar algorithms are trying to do is learn the relations between vectors so you basically you're trying to define the differences of vectors but not actual vectors right so so there's some distance metric or something if you if you rotate the distances or remain the same so they're still valid okay and it's because the distances are preserved right it's the the distance that preserves the semantics meaning that give you the semantic relations right and the idea is very simple so if the same word rather if two words appearing in the same context they have to be defined by similar vectors mm-hm and this also means that if the relation between this pair of words and ration between this pair of words is similar the difference between them in terms of vectors will also be similar okay so this is why you can do word arithmetic and say the vector for Italy minus the vector for Rome has to be equal to the vector for France manage the vector for Paris right right so if you have three of these you can calculate the other one right and basically is one of the ways they use to calculate or to measure how good the embeddings are they used this call that analogies right so okay Paris is to Frances Romy's - and you give you Italy if you look for the vector that's closest to that difference okay interesting so one of the questions that well two questions come up from me one is if the context is defined by this narrow window you said it's usually used to word for two words after it's an example of tours in the official implementation I think the default is set to five but it depends you can vary it it's an input parameter and also in practice if you want to go into the details when you're pre-processing the corpus you will sometimes remove some words because they're too common kind of like you what you do with stop words right and that effectively changes the size of the window right because you do this before you calculate the context mm-hm so it's almost as if sometimes your window is a little bit bigger okay another because you remove the word all right so you're catching more information okay so the question is do folks experiment with long with bigger windows or is it possible to do this on the entire corpus and get I guess in my mind what I'm hearing is the word relationships are only relative to this very small window and wouldn't we have wouldn't the vectors capture more information if we were somehow creating them based on bigger windows or the entire corpus is that the right way to think about it yes and no so if if you make the window too big you're you're basically including information that's not relevant for the world mm-hmm you can have a very long sentence the word at the end of the sentence doesn't necessarily have anything to do with the world at the beginning of the sentence right so you're trying to keep nearby words that help you define what what that word means well adjectives verbs the tire directly related to what what the concept is or what the word is in particular right I guess I'm thinking about it in the context of like running tf-idf on on a Wikipedia article right if I'm looking at a Wikipedia article that's talking about neural networks and CNN's and RN ends and you know artificial neural networks all these these are all related terms and I'd want to capture that relatedness but they may be in you know very different they may be far from one another spatially in my document but I still want to capture that context can let us not use word Tyvek for that you can so basically what you're doing is you're scanning through the entire document right right and and embedding the that you you learn for each word with all the contexts it appears in so it's not just one mm-hmm so I gave the example of artificial neural network right unofficial who appear next to neuron in context but in an maybe few paragraphs later it will appear next to intelligence right or to approach right so that means that artificial will be defined by all of these contexts mmm and to look very differently then then the world maybe like convolutional that always appears next to neural network and doesn't appear in other context right so it does take the whole information of the of the corpus into account but if for example I'm running it on a Wikipedia article on neural networks and there's one section on convolutional neural network and in another section on RN ends and another section on L STM's since those individual terms are separated by these sections will it capture those relationships yes yeah yes the neural network part or yes so one thing maybe it's even clearer so when I say corpus in this case mm-hmm I mean all of Wikipedia I don't mean one page of Wikipedia okay so these are very large corpus got it and the reason why use a very large corpus is because you want to learn what is the meaning of the word hmm in a sense okay so this is what you're trying to capture it's not necessarily what does this word mean in this document got it so it's more of it it's a more generic thing okay and this is why actually people have started publishing high-quality embeddings mmm google has published some Facebook aspiration that are trained on billions of words or rather corpora that are billions of words long because they're trying to capture what is a site meaning so you can use for instance these vectors one very simple application is for example for queries ambigú action is if you know the word you know what the vector is you can look around that word to see what are words that are related to that and maybe you can show results that use those other words okay you can use the vectors and this translation invariance distance preservation as a way of mapping the world for instance to a verb version of the world or a noun version of the word or past tense version so you can use all of these relations in a sense all of this linear algebra the way of getting more knowledge about what the text okay interesting so the other question I had was the length of the vector like how do you know what the right dimensionality is for there is vector as far as I know there is no well-defined way of measuring what is the optimal size okay in practice people tend to use dimensions between about a hundred and five hundred okay which is relatively small right for corpora or rather dictionaries so the number of individual words of the order of hundreds of thousands mm-hmm so in a sense what one of the things you're also doing is you're doing very much dimensionality reduction mapping this from this very large high dimensional space which dimension is a word into this very small space in a way that is to preserving the meaning of the semantic value of the words okay okay do you recall the show that I did with Francisco ever maybe he was from cortical that I already talked about kind of neural representations I actually remember that wanted to look into that more carefully okay mentioned like that this is actually correlated with this type of vector representations because yeah it seems that each word is actually represented in different parts of the brain at the same time right so it's something I'm more curious to look more deeply into it I tried looking at their website but I couldn't get the technical details - okay I was curious if you had looked into that at all and if you were familiar with that model and any thoughts you might have on I don't know how that compares I found some things online but it seemed to be more marketing oriented so there wasn't like the scientific articles behind them okay so embeddings has been around since the 50s word Tyvek now is a did you say 1950 sometimes the idea is this idea that the meaning of the word is coming from our context carriers so this is called the distributional hypothesis in English statistics okay when bearings themselves have been around for maybe 10 years maybe maybe a little bit more okay in practice or lacking wide adoption they became very popular and get a lot of attention with word - back when the rhetoric was published in 2013 by Thomas McAuliffe if I'm not mispronouncing his name horribly okay and so you know we're a few years and to work to beg and now there are a bunch of other to vex right have you looked into any of those as well yes so there's some that are very interesting there's one for instance that is DNA 2pac and he tries to find in beddings four sequences of DNA okay and see how that might relate to protein structure and genome organization which I find fascinating there is one by you're a let's go back in Stanford who it where it's called no Tuvok and it tries to use the out or no node like back okay like in a graph where it's basically trying to do that to find embeddings for nodes and graphs as a way of measuring relations between nodes and basically the way it does it since it doesn't have a sequence of words it doesn't have a sequence of nodes so that is it basically runs a random walk process on the net on the river and that generates the sequence and then based on that sequence then you can treat each node as if it was a word it appears next to other nodes because there's some house their neighborhood and that can in from there you can define any parent that is able to capture some of the structure of the network which is kind of is that your I guess I would have thought that that a graph is its own kind of representation of all this information it is and it isn't in a sense right the point is it's not necessarily easy and this is a well-known problem to compare graphs and tell you the nodes here are words now give you another graph nettly two nodes here are people so you can just match the labels directly right it's very hard to see if the structure of the graph is the same get all the permutations at all the things so if i'm not mistaken i things like an np-complete problem okay so people have been trying this idea of graphing beddings for a while where i'll try to map the graph into some set of points that does not depend on the details of the graph or the labels that you give to nodes or anything like that okay or any permutations you do one hmm interesting so I've been fascinated by basically how versatile this idea of world to work is you have are there others I'm trying to write a loss for I know I've seen it he's like a dozen of these different ActiveX yes late but it sounds like anytime you have you know a space with some complex structure this is a tool that you can use to allow you to compare both individual points within that space to other spaces is that a general way to think about it more it's more whenever you have a sequence of tokens let's say you can all get it as a way of finding a representation of these tokens that then you can use to find the relationships okay in this sequence right so it works very well for text because you have sequences of words mm-hmm like in the case of DNA to vac you works very well because you have a sequence of nucleotides in in DNA for node to vac you have a sequence of nodes that generated by this random process random walk processes on on the graph and they actually try different definitions of the random walk different rules mm-hmm and so that that's somehow able to capture different aspects of those nodes of the network structure hmm interesting and the sequences have to have I guess the the relationships between the the tokens and the sequences are defined by the corpus I guess then I was thinking of I wonder if you can do like transaction Tyvek and like do word of embedding on the sequence of transactions and use that for a fraud detection or something like that I have actually saw something like this yes yesterday only something like stalked affect upon that somebody polished on medium yesterday it's on my reading list but I read it doing it I just saw the title but notionally there's there's something in there somewhere yes so in principle every time you have a sequence you can do something like this mm-hmm I've also been getting more into learning about blockchain and how that all works it is I'm wondering now if there are some applications that as well possibly I have not seen anything with blockchain using this I've looked in detail at the blockchain a couple of years ago mmm those interested in basically these transaction networks it might be possible to do something I said haven't seen it yet so interest maybe an idea for some of your listeners yeah yeah what are some of the most interesting or what are some of the interesting applications you've seen of word to beckon friends I'm fascinated by basically how much power this word vectors have because they are capturing these semantic relations means that you can use them for disambiguation query expansion analogies like like the original metric that they use so basically they're a very powerful tool to map text into a vector representation or to a numerical representation the then is very powerful in how you can manipulate it mm-hmm because it everybody knows computers have a hard time understanding text mm-hmm but they're very good at understanding vectors right so here you're mapping text to vectors so you're making computers life much easier so people have used this for for machine translation mm-hmm because you can translate vectors into different do cockpit vector embeddings for two different languages mm-hmm then you align them and you can the relationships have to be the same mm-hmm mostly and then you can use them basically to match so you find the word in what embedding in one language right you find what is the most similar word not similar vector in the other language and you can find the translation also an idea that came up in Francisco Weber conversation so yeah we both need to dig into that one and try to figure out what they're doing that's different than regular embeddings I mean and then there's at the variation is also glove the glove global vectors the tries to do a different formulation who did record another very interesting application which is actually the the reason why I started getting interested and we're to back in particular is this paper I saw which is actually tracking linguistic change over time so they trained using word to back they trained word vectors using Google and Graham's data for different decades okay and then they aligned the vectors and then they look for the the words that are changing over time oh that's interesting you can track the basically how the meaning of a word is is evolving hmm is there a metric for measuring the I guess like dispersion of the vectors in a given embedding like a morning if you in example where you train and imbedding on this Engram data can you look at you know something analogous to like the standard deviation of the words or like the spread of the degree to which the not really I don't think I mean if you look at the entire set of vectors that by itself I don't think so because what you're doing is you're starting with a bunch of random vectors mmm then you're slightly adjusting them so that specific pairs have specific relations between them mmm mmm mmm and that specific group said Pacific relations between them so tempers they're more or less at the same distance from each other it's an but all of these groups and all of these words can be arbitrarily rotated with respect to each other okay so I suspect that if you just get even very high quality embeddings right and you just start measuring distances between them it will look random okay because you're putting in too much noise all right the distance between I don't know bed and blue my material right but then more generally there's not a set of you know statistics or metrics or things that are relevant at the at the aggregate level for the embedding not that I know of now okay usually what people do is they look for specific relations between small words so they do this analogy problem yeah yeah okay that's kind of how you you check that what you're capturing is is what you think it is inter actually capturing the semantic meaning of the words mm-hmm and the semantic relations between between words mm-hmm why do we end up using neural nets to do embeddings it seems like pretty basic math that you can do like more deterministically yes so I think that's one of the one of the motivations for people to do to invest in glove and the glove is variation for global vectors and this is I wanna say it's coming from Stanford okay and basically that they try to do it basically in a deterministic optimization defining an organization process by specifically saying I want factors that where the relationships are given in the specific way right the reason I think one of the motivations for using neural networks for this is because now you have very powerful to optimizing and train neural networks in large-scale mm-hmm so you can use all of that machinery because when you actually look at the mathematics and the network structure is actually using it's actually very simple mm-hmm so it's an extremely simple neural network right where it's mostly vector multiplications and then a softmax at the end with some Exponential's so it's nothing particularly sophisticated if you look at networks like imagenet or alex night or something that have dozens of layers very complex right so you have all the technology to develop for these very complex things and this being applied something that's very simple so it makes it very efficient mm-hmm so single layer the matrix multiplications are just applying your weights coming in and coming out and then remind us what softmax is so soft box is just a symptom eyes way basically to calculate what is the maximum value of the vector okay and in a very simplified way while at the same time basically turning a vector of numbers into something that's normalized so the only thing you do is for each element of the net of the vector just take the exponential of that value and then you divide by the sum of all the Exponential's that's it that softmax okay and what that means is basically makes the any value that is like that is slightly larger than the others will become much larger so it's easier to capture okay the maximum value and then of course there's many optimizations on top of that that's the general idea because you don't necessarily want to calculate all the Exponential's you know for the entire vector because these vectors are basically one-hot embeddings of the words in that in the corpus so this may be a hundred thousand or a million world okay so you don't necessarily want to have to do that at every iteration so then there's hierarchical softmax there's all sorts of of tricks to try and optimize some computational tricks for us okay but the concept is very simple and then there's of course optimizations that people apply to it to make it more efficient and more more robust okay and so what's the relationship between the one hat encoded 10,000 dimensionality vectors and the 100 500 vectors so like I was saying before the computers have a hard time understanding text but are very good at understanding numbers yep so what you do is you map in the beginning the first approximation is you map words to numbers so just have a dictionary this is what number one this is what number artists su-100 and that's not know you're a one hunting calling so it's just the way you represent the world so that then you can manipulate them numerically so that you can calculate these vectors mm-hmm so you I'm imagining that's your very first step then your your input layers and your well it's just one layer but your inputs your sending in the inputs to the hidden network and that is the one hot encoded so basically corpus you have the input values here and these are let's say in the case off of skip Graham this is just the context word that you have in it and this would be if your dictionary is ten thousand words this would be a one hot encoded 10,000 dimensional vector right this gets fed into the hidden layer in the center which is the dimension of the embedding say 300 mm-hm and then from this hidden layer you're trying to copulate the context mm-hmm in the context can be depending on the window size can be let's say 10 times the size of the dictionary okay so to be ten times ten thousandths or the hundred thousand because you try to predict ten words mm-hm five words before five words after okay so then you've got your so dimensionality 10,000 or so vector coming in your network is your the dimensionality of your hidden vector that your hidden layer and then the output side is 100,000 years it's like 10 times and if your window size is 5 you're trying to predict all the context from the single word then you're trying to go from ten thousand to three hundred to a hundred thousand mm-hmm but you're not really using the hundred thousand you're just using that as kind of an optimization to get at the weights for the 300 and that's what you use yes so basically you can think of over to back actually as unsupervised learning problem because you're feeling it the you're feeling at the inputs and the output so that you can learn something about the system so you're not really practice you're never really trying to predict the output in the set I guess you could use that if you're trying to generate that use generic text without movement unusual okay so you're trying to basically see what the network is learning from the text that you're giving it and you're giving it the world in the context and you say okay figure this out figure out what is the right representation that from this input can generate this output okay and then in the end what you're interested in is actually this internal representation this word embeddings recurrence vectors awesome awesome this has been super helpful like I feel like I finally understand them it's easier to explain with pictures and drawings like that I can send you the link to my slides actually I was actually just gonna ask like if someone wants to learn more what's the best way for them to learn more I mean all of these papers are online you have enemies Lee I'm preparing this tutorial for O'Reilly a guy in San Francisco in September okay I will post all the slides in all the code in my github okay right now there is already the slides for a shortened version that I presented a couple of weeks ago also I can I can share that with you and then eventually after O'Reilly I are will update it with all the the newest stuff okay great yeah so send that over and we'll get that in the show notes xn yeah great thanks so much for stopping by thank you for inviting it's a pleasure awesome all right everyone that's our show for today thank you so much for listening and of course for your ongoing feedback and support for the notes for this episode head on over to Twilio comm slash talks last 48 if you liked this episode or have been a listener for a while and haven't done so yet please please please take a moment to jump on over to iTunes and leave us a five star review we love reading these and it lets others know that the podcast is worth tuning into one last note you've probably heard me mention Strange Loop a great technical conference held each year right here in st. Louis we're a bit over a week away from that conference so I encourage you to check them out so these strange lucam I'll be there let me know if you'll be there too for more info on any of these events check out the show notes thanks again for listening and catch you next time you

Original Description

This week i'm bringing you an interview from Bruno Goncalves, a Moore-Sloan Data Science Fellow at NYU. As you’ll hear in the interview, Bruno is a longtime listener of the podcast. We were able to connect at the NY AI conference back in June after I noted on a previous show that I was interested in learning more about word2vec. Bruno graciously agreed to come on the show and walk us through an overview of word embeddings, word2vec and related ideas. He provides a great overview of not only word2vec, related NLP concepts such as Skip Gram, Continuous Bag of Words, Node2Vec and TFIDF. Notes for this show can be found at twimlai.com/talk/48.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from The TWIML AI Podcast with Sam Charrington · The TWIML AI Podcast with Sam Charrington · 53 of 60

1 Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
The TWIML AI Podcast with Sam Charrington
2 How to Build Confidence as an ML Developer with Siraj Raval - #2
How to Build Confidence as an ML Developer with Siraj Raval - #2
The TWIML AI Podcast with Sam Charrington
3 Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
The TWIML AI Podcast with Sam Charrington
4 Interactive AI, Plus Improving ML Education with Charles Isbell - #4
Interactive AI, Plus Improving ML Education with Charles Isbell - #4
The TWIML AI Podcast with Sam Charrington
5 Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
The TWIML AI Podcast with Sam Charrington
6 Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
The TWIML AI Podcast with Sam Charrington
7 Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
The TWIML AI Podcast with Sam Charrington
8 Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
The TWIML AI Podcast with Sam Charrington
9 Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
The TWIML AI Podcast with Sam Charrington
10 Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
The TWIML AI Podcast with Sam Charrington
11 Building AI Products with Hilary Mason - #11
Building AI Products with Hilary Mason - #11
The TWIML AI Podcast with Sam Charrington
12 Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
The TWIML AI Podcast with Sam Charrington
13 Understanding Deep Neural Networks with Dr. James McCaffery - #13
Understanding Deep Neural Networks with Dr. James McCaffery - #13
The TWIML AI Podcast with Sam Charrington
14 Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
The TWIML AI Podcast with Sam Charrington
15 Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
The TWIML AI Podcast with Sam Charrington
16 Machine Learning in Cybersecurity with Evan Wright - #16
Machine Learning in Cybersecurity with Evan Wright - #16
The TWIML AI Podcast with Sam Charrington
17 Interactive Machine Learning Systems with Alekh Agarwal - #17
Interactive Machine Learning Systems with Alekh Agarwal - #17
The TWIML AI Podcast with Sam Charrington
18 Location-Based Intelligence for Smarter Marketing with Klustera - #18
Location-Based Intelligence for Smarter Marketing with Klustera - #18
The TWIML AI Podcast with Sam Charrington
19 AI-Powered Customer Support with HelloVera - #18
AI-Powered Customer Support with HelloVera - #18
The TWIML AI Podcast with Sam Charrington
20 Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
The TWIML AI Podcast with Sam Charrington
21 Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
The TWIML AI Podcast with Sam Charrington
22 Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
The TWIML AI Podcast with Sam Charrington
23 From Particle Physics to Audio AI with Scott Stephenson - #19
From Particle Physics to Audio AI with Scott Stephenson - #19
The TWIML AI Podcast with Sam Charrington
24 Selling AI to the Enterprise with Kathryn Hume - #20
Selling AI to the Enterprise with Kathryn Hume - #20
The TWIML AI Podcast with Sam Charrington
25 Engineering the Future of AI with Ruchir Puri - #21
Engineering the Future of AI with Ruchir Puri - #21
The TWIML AI Podcast with Sam Charrington
26 Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
The TWIML AI Podcast with Sam Charrington
27 Introducing Psycholinguistics into AI with Dominique Simmons- #23
Introducing Psycholinguistics into AI with Dominique Simmons- #23
The TWIML AI Podcast with Sam Charrington
28 Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
The TWIML AI Podcast with Sam Charrington
29 Offensive vs Defensive Data Science with Deep Varma - #25
Offensive vs Defensive Data Science with Deep Varma - #25
The TWIML AI Podcast with Sam Charrington
30 Global AI Trends with Ben Lorica - #26
Global AI Trends with Ben Lorica - #26
The TWIML AI Podcast with Sam Charrington
31 Intelligent Autonomous Robots with Ilia Baranov - #27
Intelligent Autonomous Robots with Ilia Baranov - #27
The TWIML AI Podcast with Sam Charrington
32 Reinforcement Learning Deep Dive with Pieter Abbeel  - #28
Reinforcement Learning Deep Dive with Pieter Abbeel - #28
The TWIML AI Podcast with Sam Charrington
33 Robotic Perception and Control with Chelsea Finn  - #29
Robotic Perception and Control with Chelsea Finn - #29
The TWIML AI Podcast with Sam Charrington
34 Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
The TWIML AI Podcast with Sam Charrington
35 The Power of Probabilistic Programming with Ben Vigoda - #33
The Power of Probabilistic Programming with Ben Vigoda - #33
The TWIML AI Podcast with Sam Charrington
36 Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
The TWIML AI Podcast with Sam Charrington
37 Video Object Detection at Scale with Reza Zadeh - #34
Video Object Detection at Scale with Reza Zadeh - #34
The TWIML AI Podcast with Sam Charrington
38 Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
The TWIML AI Podcast with Sam Charrington
39 Expressive AI-Generated Music With Google's Performance RNN with Doug Eck  - #32
Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32
The TWIML AI Podcast with Sam Charrington
40 Smart Buildings & IoT with Yodit Stanton - #36
Smart Buildings & IoT with Yodit Stanton - #36
The TWIML AI Podcast with Sam Charrington
41 Deep Robotic Learning with Sergey Levine - #37
Deep Robotic Learning with Sergey Levine - #37
The TWIML AI Podcast with Sam Charrington
42 Deep Learning for Warehouse Operations with Calvin Seward - #38
Deep Learning for Warehouse Operations with Calvin Seward - #38
The TWIML AI Podcast with Sam Charrington
43 Cognitive Biases in Data Science with Drew Conway - #39
Cognitive Biases in Data Science with Drew Conway - #39
The TWIML AI Podcast with Sam Charrington
44 Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
The TWIML AI Podcast with Sam Charrington
45 Web Scale Engineering for Machine Learning with Sharath Rao - #40
Web Scale Engineering for Machine Learning with Sharath Rao - #40
The TWIML AI Podcast with Sam Charrington
46 Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
The TWIML AI Podcast with Sam Charrington
47 Machine Teaching for Better Machine Learning with Mark Hammond - #43
Machine Teaching for Better Machine Learning with Mark Hammond - #43
The TWIML AI Podcast with Sam Charrington
48 LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber  - #44
LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44
The TWIML AI Podcast with Sam Charrington
49 Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
50 Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
The TWIML AI Podcast with Sam Charrington
51 Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
The TWIML AI Podcast with Sam Charrington
52 Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online  Meetup
Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
Word2Vec & Friends with Bruno Gonçalves -#48
Word2Vec & Friends with Bruno Gonçalves -#48
The TWIML AI Podcast with Sam Charrington
54 Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan  - #49
Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49
The TWIML AI Podcast with Sam Charrington
55 Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
The TWIML AI Podcast with Sam Charrington
56 Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
The TWIML AI Podcast with Sam Charrington
57 AI-Powered Conversational Interfaces with Paul Tepper - #52
AI-Powered Conversational Interfaces with Paul Tepper - #52
The TWIML AI Podcast with Sam Charrington
58 Topological Data Analysis with Gunnar Carlsson - #53
Topological Data Analysis with Gunnar Carlsson - #53
The TWIML AI Podcast with Sam Charrington
59 ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
The TWIML AI Podcast with Sam Charrington
60 Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
The TWIML AI Podcast with Sam Charrington

This video teaches the basics of Word2Vec and its applications in natural language processing. The viewer will learn how to build word embeddings using Word2Vec and how to use them for text analysis. The video also covers more advanced topics such as graph embeddings and the challenges of training high-quality word embeddings.

Key Takeaways
  1. Install the necessary libraries for Word2Vec
  2. Prepare a corpus of text for training
  3. Train a Word2Vec model on the corpus
  4. Use the trained model to build word embeddings
  5. Evaluate the quality of the word embeddings
💡 Word2Vec is a powerful tool for capturing semantic relationships between words, but training high-quality word embeddings can be challenging and requires careful consideration of the corpus and model architecture.

Related AI Lessons

Your AI Keeps Making Things Up. RAG Is How You Make It Use Real Facts Instead.
Learn how to use RAG to make your AI provide accurate answers based on real facts instead of making things up
Medium · RAG
Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…
Learn to evaluate RAG models using metrics that measure retrieval, generation, and end-to-end quality
Medium · AI
Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…
Learn to evaluate RAG models using metrics that measure retrieval, generation, and end-to-end quality
Medium · Data Science
When Does HyDE Help RAG? I Tested 3 Query Types and It Failed on Two
Learn when HyDE retrieval helps or hinders RAG performance across different query types, and why it matters for improving search accuracy
Medium · AI
Up next
RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python
Professor Py: AI Engineering
Watch →