Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49
Key Takeaways
This video features an interview with Jonathan Mugan, co-founder and CEO of Deep Grammar, discussing symbolic and subsymbolic natural language processing, including attention mechanisms, sequence-to-sequence models, and ontological approaches like WordNets, FrameNet, and SUMO.
Full Transcript
[Music] hello and welcome to another episode of we'll talk the podcast where I interview interesting people doing interesting things in machine learning and artificial intelligence I'm your host Sam Charrington this past week I spent some time in San Francisco at the artificial intelligence conference by O'Reilly and Intel Nirvana I had a ton of fun and got a bunch of great interviews from some amazing people doing awesome work in ml and AI I got to talk to folks like gunner Carlson Avaya Z and Stanford who's applying topological models to machine learning like Ian Stojko of UC Berkeley who's rhys lab is building ray a distributed computing platform for reinforcement learning and like mo Patel and Laura frolic of think big analytics who shared a bunch of great use case stories with me I'm super excited about my interviews from the conference and I'm looking forward to sharing them with you make sure you check back with us on October 9th to catch the full series in the meantime I've got a great interview for this week like last week's interview with Bruno Gonzalez onward to Vic and Friends this week's interview was also recorded at the last a reilly AI conference back in New York in June also like last week's show this week's is focused on natural language processing and I think you'll enjoy it I'm joined by Jonathan Mugen co-founder and CEO of deep grammar a company that's building a grammar checker using deep learning and what they call deep symbolic processing this interview is a great complement to my conversation with Bruno and we cover a variety of topics from both sub symbolic and symbolic schools of NLP such as attention mechanisms like sequence to sequence and ontological approaches like word nets in sets frame net and sumo I'm looking forward to your feedback on this show jump over to the show notes at Twilio comm slash talks last forty nine to let me know what you liked and learned finally before we dive into the show the details for the upcoming Tomo online meetup have been set on October 18th at 3:00 p.m. Pacific time we'll discuss the paper visual attribute transfer using deep image analogy by Jing Liao and others from Microsoft Research the discussion will be led by Duncan stutters for anyone who's missed the last two meetups or for those who haven't yet joined the group please visit to Malaya comm slash meetup for more information there you'll find video recaps of the last two meetups along with a link to the paper we'll be reviewing next month if you'd like to present your favorite paper we'd love to have you do it just shoot us an email at team at twill Malaya comm to get the ball rolling and now onto the show all right everyone I am here at the O'Reilly AI conference and I am with Jonathan mukhin who's the founder and CEO of deep grammar Jonathan welcome to the podcast oh thanks for having me I'm excited to get into this conversation so you are speaking later today I am at the conference and I'm looking forward to having you walk us through your presentation but why don't we start by having you tell us a little bit about your background and how you got into AI yeah sure so I started off in psychology went and got my undergraduate in psychology and I wanted to understand the human mind and as my adviser used to say the interesting parts weren't scientific and the scientific parts weren't quite interesting I love that yeah we didn't quite have a firm grasp on concrete principles that we could that we could use to really understand what was going on so I became a little disillusioned and got my MBA and company called PricewaterhouseCoopers trained me up in computer programming and Saturday off in the consulting world join the ranks and then as I was I was programming I was like you know you have to tell computer exactly what to do this might be the kind of the kind of rigor that we need if we're gonna we're gonna do psychology uh-huh and then of course AI is is a mix of psychology and computer science so it seemed natural so I decided to go back and get my PhD but my undergraduate was a bad one did you do that well I went back in 2003 I went back to get my masters because my undergraduate was in psychology so I couldn't get into a ph.d program straight away so I had to take calculus and all that kind of stuff little kids yeah so I got my masters at UT Dallas and then got into the ph.d program at UT Austin and started working with Ben Kuiper's okay and so what was the focus of your research there yeah my focus was on developmental robotics so how can you get a robot to learn about the world in the same way a child does and so the idea is the robot just wakes up or is born and has some knowledge built in but not much and it wants to build it all up from the beginning and the robot pushes objects around and learns relationships between his hand and the object and it learns how to form actions and how to build up perceptions like oh my hand is to the left of the block that's actually significant because that determines when I can hit it to the right mm-hm and so it built it up that way and then I finished that up and graduated in 2010 and at that time AI was not hot like it is now and I had it's amazing the change and so I got a postdoc at Carnegie Mellon okay and I studied under norman's today and at the intersection between kind of computer science and and human-computer interaction so we studied if you have this location device that gives you your location or excuse me that broadcasts your location to to family and friends when exactly do you want to share your location so at the time this was somewhat novel being able to share your location right and there were a lot of privacy issues around it then like of course there still are now and so the device would learn through interaction with you when you want to share based on who's asking and where you are and time of day hmm interesting interesting my wife actually is a Texas gal and when I went to Pittsburgh she didn't come with me every weekend oh wow yeah it was quite a deal and so I flew back home every weekend and eventually says you know we're not going to Pittsburgh I got a job in Austin at a small company called 21 CT we're gonna do defense contracting work for the Department of Defense so data mining okay and at that job I pushed into natural language processing because one problem I found with the development of robotics is it was really hard to get funding unless you're at a university because it's so far off you know mmm we have robots and factories all over the place but we don't want them staring at your navel and wondering about life and so most of the funding most of the push is towards robots that can do actual concrete things right now right and I'm more interested in the fundamental concepts below that the fundamental concepts that enable a child to just grab the world and understand it mm-hm and so I saw that language might be a good kind of in-between so language is very important right now it's very useful chatbots or a thing language interfaces are important computers have to read tons of documents okay well language might be a way that I can both feed my family and study this stuff I care about and then of course along the way I found a deep grammar co-founded with Cristian storm but that ties into my talk today because my talk today is about how can we go from natural language processing down to these fundamental concepts into real understanding okay because right now natural language processing is kind of sad because we're just at the surface we just we treat these Tok I was amazed when I first got in the field we treat these these words as we tokenize in the words and maybe we parse but we just have this string of tokens and then we do stuff with the tokens mm-hmm like the computer has no idea what these tokens mean we just look for patterns in the tokens and so in my talk I start with a you know tf-idf which you take a document and convert it into a vector if your vocabulary is 50,000 words it's a 50,000 long vector and you lose you lose the word ordering so Man Bites Dog is the same as dog bites man and tf-idf is term frequency MC intosh document documents right so right so the term frequency is you know if aardvark shows up twice then you get a 2 in the aardvark salon and then you scale that with the inverse document frequency by how often aardvark shows up in your corpus mm-hmm so the less frequently it shows up the more important it is in the context of the document is that's the idea that's right that's right and then that way that vector helps discriminate that document better because Meza has that that's scaling and it's interesting that you you start off by talking about you know the fact that NLP is not you know it's not based on a lot of inherent structure because previous conversations I've had with folks on might the general kind of understanding I've come in to is that that's where you know that lack of structure meaning taking a statistical approach as opposed to a linguistic approach it has been the source of all of the advancements in or much of the advancements in NLP over the last few years do you disagree with that generally or it's definitely true that we've been able to do a lot of cool stuff and so when I talk I talk about two paths the symbolic path okay and the sub simple path which is the deep learning stuff that everybody's doing now okay and yeah what deep learning we're able to generalize across token so one problem we had before if you said I got into my car and went to the store versus I got into my truck and went to the supermarket those look like very different sentences and tf-idf mm-hmm and you had to manually go in and say truck and car are pretty similar yeah and store and supermarket are pretty similar and you can do that for a few things but you just can't think of all these possibilities and deep learning is really great for that the word Tyvek so everything's a vector and it turns out that course car and vehicle are going to be very similar in car and truck and supermarket and store and so if you instead of do the TF idea if you do like a you could just even average the word vectors or you can do an RNN where the last state is the meaning of a sentence you're able to really capture similarity across sentences in a way you can't do as well with symbolic methods mm-hmm but you still don't have any understanding there so when you do word to Veck what you're doing is you're learning a vector for a word based on the words that typically go around it and so the algorithm is is you go through your whole corpus and for every word in the corpus you know you go through one by one you take the vector for that word and you push the vectors for the other words closer to it mm-hmm and you put all the vectors for the other words that aren't close to it away and then you move to the next one you keep doing that over and over again until you've converged and that's great but it only captures what people say so most of the knowledge that's needed to understand language is so obvious that we never mentioned it and so that kind of stuff just doesn't show up in word vectors okay and so even when you get this vector at the end you stood still not clear what to do with it mm-hmm and so you think about some of the biggest advances have been our most exciting ones have been in machine translation mm-hmm the machine still has no idea it's just spitting out tokens right you know it encodes it with the encoding RN N and then the decoder it spits out the next word based on the previous state the previous word and then if it has attention all of the previous and codings in the end coder but it's just a softmax putting out tokens it doesn't have any understanding of what it's doing which is in some degree why it's so applicable in so many different domains you can you can create a parse tree with it you can even encode a picture into a vector using using a CN n and then run the decoder and that's how you get this captioning work that's really exciting but there's still no understanding there and so you end up with this vector so now we're honest this the sub symbolic path but what can you do and so the next thing that people started doing was well so attention what you're doing when they added attention to the encoder decoder method you're when you're about to generate a word and you're translation you're looking at all of the previous words in the sentence are there they're encoded representation the hidden state representation of the of the sequence and so what it's doing is it's looking at facts about the world to figure out which ones are relevant for generating the next word and so what people started doing was they said well what if I just feed it in a story and so I can feed in a story where like Tom went to the store Tom came home Tom picked up a jar Tom went to the airport and now the question is where is the jar mm-hmm and if you feed it enough of these stories it's pretty amazing the the computer can answer it sits at this it's at the airport mmm presuming that you never put down this jar that I just carry with you for life and that's really cool but you have to generate these stories automatically and the reason you have general automatically cuz you need so many stories that it needs to be able to find these statistical patterns underneath okay and it's mechanism that's enabling this is attention can we maybe double click on that to talk about how that's implemented to maybe get a I've heard attention come up a bunch of times but I haven't dug into it in any level of detail and I'm wondering how that manifests itself in some of these deep networks and and stuff like that yeah so I'm thinking in that Google just came out with this new tensor two tensor mmm thing which is and I'm thinking of how they do attention so you have like a set of keys and a query and keys and values and what you're doing is for some query you're looking at all of the keys to find the most similar key and then you take that value and the similarity between the query and the key is the weight that you use for the value so you know doing a weighted average of the values is that implemented in a neural network are we talking about there's an external structure like a database or a key value store or something no neural network and so when when I say key and query and value these are all vectors okay yeah got it and so in the sequence of sequence model what you're looking at is these maybe that the keys and values are one in the same but you're looking at your query is my current state when I'm trying to generate so you know you could have I went to the store and then you're translating in Spanish and yo if we ask super mercado and when you're trying to migrate accent when you're trying to look back at I went to the and you went to you look at those encoded representations along the way and you take the your state at super mercado or did that at the state before he generates Mikado and you compare how similar those are and then you take the weighted average of those values and then that value comes in to where you would normally generate super mercado and that value is taken into account it's just another vector along with the vector for a previous state in your decoder and the vendor for the previous word you generated okay and then it freaks vector you have another matrix which you multiply it by and then you add those things up and you throw that in the softmax and then that's your output okay yeah and so it a neural network at each point is generally a very often multiplication of a matrix by a vector mm-hm and then you put some non-linearity on down the result mm-hm so okay so attention basically is is you're storing kind of you're kind of storing up these vectors and referencing them from the past essentially to be able and including those in your your end calculation that's right and so what they're doing in the story generation or excuse me the story question answering is they're encoding the parts of the story as vectors mm-hmm and then when they want to ask answer a question they go back and look at the parts of the story and figure out which parts of the story are most relevant to answering that question and they do a little computation on top of that and that's that's what your answer comes from interesting and are there limitations to the amount of memory that you're able to refer back to well so generally there's not limitations in the amount of memory but you're generally taking a weighted average mm-hmm and you do that because if you just take kind of a hard attention then you can't do the backpropagation as well and so you take a weighted average so it things kind of get watered down a little bit but I don't think that's a huge problem more the problem is it's just a very simple mechanism right and you can only do so much and I think that's where you were going before I kind of interrupted you to push into this attention is starting to approximate things that look and feel like meaning but it's not it's still not quite there yeah you're going back over your previous experiences and sorry oh this one's relevant right and then pulling it in yeah which is cool but the robot doesn't have any previous experiences so this the story generating or the story question answering systems are really cool but there's no built-in knowledge right so when we answer questions about stories we bring a whole lifetime of knowledge and these all start from scratch and what we need to do I think the next step on the sub symbolic path is we need to have systems that interact in our world with the objects and relationships in our world and so you can imagine like a little robot that can pick up objects and move them around and then it knows what a bottle is because a bottle is partially the hand fixture that it needs in order to pick it up a bottles partially that if it knocks it off a table with breaks a bottle is partially that it turns it to the side water comes out all these things are part of the definition in the bottle and so when it's pulling up memory it's not just pulling up parts of a story it's pulling up huge banks of things that it is experienced before mm-hm and then you can make inference from that that you wouldn't be able to otherwise now we don't quite know how to do these advanced inferences based on experience other than the kind of basic models we have now which is like sequence of sequence and and CNN and some other ones but it's gonna be exciting to see one at one of my one of the things I really enjoy about the deep learning is every time a new configuration comes out or yeah a new one that goes into zoo I'm like oh cool we're getting a little closer alright I envision in the brain you know there's just thousands upon thousands of different kinds of configurations of neurons and at least to some approximation and one of them might be a sequence of sequence and another one might be a CNN but there's you know hundreds more that we haven't discovered right all right and it'll be cool as we get better and better with each new one so I think most of what we covered now I mean it sounds like a lead-up to you know a specific area of research or interest that you have that kind of promises to help address this issue like where does where do we go from the sub symbolic maybe another way to ask this is is it your observation that a more symbolic approach is kind of the answer to the ills of the sub symbolic approach or do you think the path forward is still sub symbolic but extending it to incorporate more understanding I don't know so in my talk I cover both approaches as if they're separate approaches and there's been surprisingly little overlap in the approaches and well we've talked about the sub symbolic mostly if we talk to have you should we take a few minutes talking about the symbolic stuff and what's happening there sure sure we can do that so we can yes we were talking about a tf-idf vector and that is it throws out word order mm-hmm but you can do a lot of stuff with it you can say this document is similar this other document you can even throw it in the machine learning classifier and do sentiment analysis or document classification and that's pretty neat and I mentioned sentiment analysis so the next step in sentiment analysis is getting a little closer to actual meaning is a cinema dictionary okay so often you'll have this dictionary that says okay the word terrible has a negative sentiment and the word good has a positive sentiment and you have some simple mechanism that says well not terrible you have to inverse Unversed a word and that can get you pretty far and but you're for each symbol now you're going into your dictionary and you're assigning some very simple meaning mm-hmm so that's kind of the first step to assigning meaning or one you could consider it one first step but there's also a whole set of representations that people have built and so when you build a representation what you're doing is you're taking symbols and you're creating relationships between symbols and then presumably if you set up this symbol system you can map what people say to the symbol and then you could map what the computer should do based on whatever symbol got lit up okay so if you had like Alyssa your entire company and you want to watch Twitter to see who you should try to sell tires to you could mention anybody you could have set up a symbol system where it says okay a car has tires a truck has tires you know Toyota is a kind of car and therefore anybody mentions a Toyota it links to in the tires okay and then you can just that helps you when you take a sentence you say given the sentence can I find tires linked anywhere in there even though they don't mention tires explicitly but of course this is kind of brittle and there's been a lot of work in setting up these symbol systems the most famous is probably word net okay where you have the set of sin sets which is a sin set is a group of words that all mean the same thing it's like a meaning and so vehicle might be one sin set in that vehicle you'd have well maybe car has one since it and then car you'd have motor car car and you could have you know a car in all different languages but it means car and then you set up relationships between these things like car is a kind of vehicle and then you would have sports car as a subset of that and so word net is really popular and it's really good it kind of gives it it kind of gives a sense of a definition for most words and you can also have a word be in two different sin sets so Bank would be in the sin set for riverbank but also a bank where you deposit money okay and so that's one another one is frame net and so frame net builds up a little bigger situation so word net is about individual words frame net is about situation so one example is the frame commerce by which means somebody buy something from someone else and so that frame is triggered by some set of key words like bought purchased sold and when that frame is triggered what frame that does or at least an implementation that uses frame that goes and it tries to find the rolls who was the buyer could be Bob bought a car from Tom okay buyers Bob the seller is Tom and the thing purchases car and so you've converted this sentence into a frame with rolls and now that's machine understandable okay and you it's kind of nice because you move up from individual words into kind of meaning of situations yeah but frame that doesn't go very deep you know a frame that doesn't say you don't have like the fundamental things going on like forces and and while there's a little bit of that but you don't have the things that a child knows a very young child and it turns out in in AI that's the hardest part you know we started off thinking that chess was the pinnacle of rack of intelligence and now it turns out that picking up a bottle of water is really hard and so all the things you know they say all the things a kid knows by three or four if we could get those into a computer that would just be amazing and that's what I would really love to try to do and so if we're gonna build that with a simple system we have to go deeper and so one simple system that does go deeper is sumo and so what sumo is is is a full ontology meaning that goes all the way down and so you look at like cooking what does the word cooking mean well cooking is and I remember the exact the exact thing but cooking is a process which is in a thing which is an entity it goes all the way takes it all the way down and so so that's really useful but what we need to do now is figure out how we can get how we can get things like sumo tied in the word net and there has been some some linkages assume already does time when were Dannette but how we can get all these different representations together because what we want to do next is build if we're staying in symbolic land build a causal model of how the world works hmm and here at this conference Josh Tenenbaum yesterday was talking about that and so we need you know an entity needs to understand that when it pushes a table all the things on top of the table are gonna move right right and if you try to put that in logic it's hard and you need like a model where you can just read it off the model so in some sense framenet is kind of like that so if if there's a frame where Bob sold a car to Tom and then you ask well who has the car afterwards it's Tom you could just read it right off the frame or you can have that associated direct with the frame you could put that right in with the frame and so what we need to do is build deep causal models that go all the way down to these things called image schemas or image schemas are the language independent concepts that we use to understand everything in a world so like lay coffee Johnson and and these kind of guys manler and so you put a she's a psychologist and you put developmental psychologist and so like one is containment so when you have a bottled water the water is contained in the bottle right which means that you move the bottle the water goes along with it another support sort of bottles on the table hmm and so you need these concepts before you can understand language because the language understanding is built on all this stuff when we talk to each other we never we never say these things sure so yeah what am i one joke I like to say is you can imagine a romance novel where there's a table in between two lovers and the man pushes the table aside and then the novel they would never say and as we push a table aside all the objects on the table moods because there were sports writing down sounded like scraping across the floor that produced gashes in the floor we make a lot of assumptions when we talk we do I mean if we didn't we'd never get anything done right yeah and and so so what we need to do is build up calls and models of the world onto which we can put these symbols that we define I can't decide if it would be fun or extremely boring and tedious to run these models in Reverse and generate that romance novel it's like you know you have the you have different editions of books a big print and this is the computer I did yeah yeah so those are those are basically the two paths and to get from where we are now in natural language processing which is just working at the symbols either with the some symbolic where we're able to learn vectors for these individual things or the symbolic where we just write down what they are mm-hmm in the computer that can really understand because it understands the fundamentals so it's hard to say where to go from here because one problem is you need to find a commercially viable need for the simplest possible common-sense knowledge so we all have chatbots I mean they're everywhere now but they require already way too much knowledge to be good if you go out out of if you stay within a particular domain and you basically just hard code everything and you can and you can have chatbots where it's learned using sequence of sequence models but that's just gibberish back and forth them it's no different from Eliza really we have a chatbot that actually is general if you ask it things off script and can answer your questions we're gonna need these fundamental concepts in fact one of my dreams is to build a chatbot for children that you get it like age three or four and it lives in your mom's phone and it teaches you concepts about the world and it's also your friend and it learns about you and the cool thing about being a teacher is that if it teaches you then it knows what you know so then it explains to other things to you it can explain to man things you understood in terms you already understand and and it can also make things interesting because let's say it knows that your favorite animal is a giraffe and they can say when it's teaching math and say if you have six giraffes and you buy two more how many do you have hmm and that's that's the kind of thing that engaged parents do mmm and it would be cool to do that an app and I have this dream that you put that in for children and you have your developers feverishly working behind the scenes making better and better and better technology so that when the child gets older the app just turns into the operating system for the child and so the child now uses this app to interface with its whole world and since the app has been with the child since the beginning the app really knows a child it can be the ultimate and customized and when it's you know and then as an adult when it's guiding me through how to fix my dishwasher it knows that I know nothing and it literally has to tell me lefty-loosey righty-tighty remember on Wikipedia on the web you just have no idea yeah and then even when you're old you know if you become your faculty start to go and if you're standing in the kitchen you can't remember how to make coffee you know the app can then be in the cameras in the room and say hey you make coffee this time of day the filters are in the cupboard over there that's the first step and it guides you through and maybe we could stay independent longer that's what a great a great vision friend that would be that would be awesome yeah so now how much of that is deep grammar trying to take on deep grammar is trying to take on when I write I make a lot of really dumb mistakes okay it's just the human in me right I think one thing in my hands just outputs something different and I've always been amazed that grammar checkers couldn't capture that mm-hmm and you know spell checkers came along and they were amazing they really I don't know how many people around remember days for spell checkers but it was it was a huge advance yeah and you know I always told my teachers I'm not gonna have to know how to spell and turns out I was right about one thing very few times was that right but that I was right so the grammar checkers I was always a you know they were word for a long time and they just would miss obvious wrong stuff and it really bugged me and I always thought machine learning would be the way to go and so I started working with engrams okay which is sequences of tokens this is a few years ago and it turns out if you think about it for five minutes it turns out that you know for engrams of like sequences of three or four then you take the you know I went to the that's like four words and then you have a distribution over the next word and so if you write a word that is not in that distribution or not you know it doesn't have a lot of weight in that distribution like donkey but it's similar to a word that should have high weight like store although donkey in the store aren't similar then you probably made a mistake so if you say I went to the stored right that's a mistake it should be obvious stored is very different it's very similar to store and store it is gonna have a very low probability mmm the problem with engrams is you can't its that similarity probably talked about for because I went to this I went to the store is if you know I went I drove I me and 'red I walked all those are very different to an Ingram probability thing and so in order to train such a thing you would have to have seen all these things mm-hm and you'd have to store the vocabulary size to the 4th to store all this probability and and so when I started working in deep learning and I said oh oh secret of sequence is the way to go for this you encode this thing and then you decode it and you get the power of deep learning that power we talked about before that similar words are gonna have similar vectors and similar sentences are gonna have similar vectors to other similar sentences you know first thing is okay I got to write a patent on this this is this is gonna be how we're gonna do grammar checking and that that's how we got started yeah and so what we do is we encode it and then we decode and if the thing we decode is different from what you wrote then there's a problem mm-hmm especially if what you wrote is different than it's similar to something that would have high probability okay and then we also said how do you capture that similarity a with you is just typical typical you know the easiest thing you can do is like with the levenshtein distance which is edit distance on the letters mmm you can do that with similarity but there's also a bunch of other little similarity things we do okay that we take advantage of a lot of the acquired knowledge over the years and grammar so we yeah we have a kind of sophisticated similarity measure and then in addition to sequence sequence we throw the kitchen sink of deep learning at it a bunch of different cnn's and stuff and so we've got it pretty good now so sometimes it still fails in a way that's disturbing but if you make a mistake like the wrong version of there or two to two is really good at that if you catch it better than anything oh wow so there's a lot of different markets so that the biggest market as you might imagine is English as a second language mm-hmm but and we get people all over time email me please get this thing going please please please you know and the English to the second language is particularly challenging because sometimes when you're not familiar in language what you write is so far from correct that the machine just can't it just doesn't know where to start yeah and so that's a particular challenge and then sometimes there's even a bigger fundamental challenge that the whole sentence has to be rewritten mm-hmm and the only way to do that is to understand what the person said and we've been talking this whole whole interview about how computers just can't do that right all right so that's gonna be a problem for a lot of a lot of time to come mmm we can finally these dumb mistakes I look at and I can't believe that the grammar checker didn't catch that now it can catch those mmm so so that's really exciting do you offer this as a service for folks like I use a service for writing called grammarly which you may be familiar with that does a decent job for some things some aspects of their implementation are kind of bad I don't just they use it the UI user experiences kind of wonky but I can imagine that as a go to market model I can imagine more of a platform ish approach where you're offering api's to developers to build things around how are you guys going at it yeah we are in the process of trying to decide where exactly we're gonna focus cuz grammarly is now really big they got a lot of smart people working mm-hmm and it's gonna be hard to go head-to-head with them I think we've got some ideas that are really good and I think we do some things better but you know they're just hiring like crazy and and what you can start by plugging into any of the editing apps on the Mac which they don't support or yeah I don't think that they plug into Google Docs or anything like that yes there are some holes there now what kind of moat that gives you back to that initiative and also bran always pretty expensive things like ten dollars a month and there's a lot of people around the world who really need this but ten dollars a months is a lot of money so we can you know if we have a service that's really good better than things that used to be around before but maybe we don't have all the bells and whistles of grammerly especially if we can fix those things that are really hard for a computer to catch so grammarly it looking at what they've done it looks like they spend a lot of time implementing a lot of rules like Gama rules and but we can catch the subtle things that's just pure learning confined and that's what a lot of people need because when you're English as second language you don't have the ear that that we do yeah here for the language and the yes all market is really interesting I language is a hobby of mine and one of the apps that I've been using recently that I really enjoy is this app called tandem that basically allows you to kind of a global language learning community and there have been a bunch of these but it's the best implemented by far you basically go on this app you tell it what languages you're learning and it'll match you with people the people that speak those languages natively that are trying to learn your languages that you know but you'll get in these conversations with folks and you know depending on their level you know you've got the I think the interesting conversations are when folks are beyond the hey I'm going to Google Translate everything I want to say right because you know the failure mode like you can spot those yeah you know really quickly but then there are folks that know enough English that they're just typing what they think you know is right and sometimes it's a little hard to decipher but most of the time you can kind of get what they're trying to say they're just not saying it wrong and if your stuff was plugged into this process yeah as like a kind of a side Channel you know a trainer or coach or something like that I think it would you know the big challenge for language learners is like decreasing the cycle time of you know learning and iteration and accelerating the process yeah something like that could be really interesting ya know I hadn't I hadn't thought of that as like a coach you know that you said this and you know maybe change it to this other thing mmm that's a good idea another place that's like video transcription mmm it's big now and a lot of times you have to pay a human to do it well it's done automatically but didn't need to pay a human to make sure it's done right mmm because you get this text out and sometimes it doesn't quite hear mm-hm and so that's very much like a grammar correction problem so we could do that but yeah we're trying to save what actly what niche you know we should go make it cheap or make it at API do transcription maybe there's there's but some publishers that have reached out to us they say look we write we sent out all these books and we have to pay people to go in and read each one mm-hmm and so if we use you guys then we have to pay them you know we can have them do more books per person because they would have less you know less tedious stuff to do you catch this stuff and so that's another option so we're kind of standing at the crossroads right now trying to figure out what we're gonna do with it does the technology get into or give you the ability to address stylistic issues as opposed to correctness it kind of does both at the same time but it doesn't help you rewrite things so it's basically gonna help you write the way it was trained so we started out trading on Wikipedia but then there was it everything that it wanted wanted to fix everything to be very Wikipedia yeah well you know the thought that came to me was you know the artistic style transfer stuff where you know take this picture and make it Picasso asked ya like you know I'd love to take my writing and you know make it you know it's the form of some other author right that would be cool now so it doesn't work as well and languages it doesn't vision because in language you're making a set of discrete decisions mm-hm and so in vision you have pixels which are much more amenable to to small gradients hmm and that's why they've had such huge success with vision and language is harder they're starting to get some work in that area so some of the new stuff is applying Gans to sequence of sequence models and so what you do is instead of using the cost of generating each token while you're training in the decoder you use some other measure of the sentence and its own again it would be the probability based on some discriminator function the probability that is generated by the computer or by a human right and then you have to back prop or well you have to get that answer back into the the system so it can learn and that's usually done like with reinforcement learning right and that's not very efficient now for language and it it kind of works and there's a lot of advancements but still got a ways to go okay I just finished a report on industrial applications of AI and ended up being like 30 pages and I'd love to put that through like the Hemingway will be awesome yeah I'll get there I can I can assure you the sentences I think for Hemingway would be a lot shorter after yeah or one on sentence kind of guy yeah yeah that would that would be great and and there is some of that so if you you know you train the system on Hemingway it's gonna want to generate tokens that are Hemingway's mm-hmm so you can you feed your sentence in and it's gonna translate it into be shorter and something about a fish problem mmm-hmm nice awesome well what's the best way for folks to kind of keep tabs on what you're up to and you know follow along as you guys iterate on this model and figure stuff out yeah so we have a website do grammar comm on that website you can try it out type in a sense only does one sentence at a time mm-hm right now just because we have a cheap server up on Amazon mhm and then you can join our mailing list and I tweet my life out at Twitter at JM UGA in okay awesome well thanks so much it was great chatting with you oh thanks it's been fun [Music] all right everyone that's our show for today thank you so much for listening and of course for your ongoing feedback and support for more information on Jonathan and the topics we covered in this episode head on over to Twilio comm slash talks last forty nine if you liked this episode or you've been a listener for a while and haven't yet done so please take a moment to jump on over to Apple podcasts or your favorite podcast app and leave us that five-star review we love to read these and it lets others know that the podcast is worth tuning in to if you've already done this then thank you so much we greatly appreciate it one last note you've probably heard me mention Strange Loop a great technical conference held each year right here in st. Louis I'll be attending later this week and I encourage you to check it out also the following week on October 3rd and 4th I'll be at the Gartner symposium IT Expo in Orlando where I'll be on a panel on how to get started with AI if you plan on being there send me a shout thanks once again for listening and catch you next time
Original Description
Like last week’s interview with Bruno Goncalves, this week’s interview was also recorded at the last O’Reilly AI Conference back in New York in June. Also like last week’s show, this week’s is also focused on Natural Language Processing and I think you’ll enjoy it. I’m joined by Jonathan Mugan, co-founder and CEO of Deep Grammar, a company that is building a grammar checker using deep learning and what they call deep symbolic processing.
This interview is a great complement to my conversation with Bruno, and we cover a variety of topics from both the sub-symbolic and symbolic schools of NLP, such as attention mechanisms like sequence to sequence, and ontological approaches like WordNet, synsets, FrameNet, and SUMO.
You can find the notes for this show at twimlai.com/talk/49.
Subscribe!
iTunes ➙ https://itunes.apple.com/us/podcast/this-week-in-machine-learning/id1116303051?mt=2
Soundcloud ➙ https://soundcloud.com/twiml
Google Play ➙ http://bit.ly/2lrWlJZ
Stitcher ➙ http://www.stitcher.com/s?fid=92079&refid=stpr
RSS ➙ https://twimlai.com/feed
Lets Connect!
Twimlai.com ➙ https://twimlai.com/contact
Twitter ➙ https://twitter.com/twimlai
Facebook ➙ https://Facebook.com/Twimlai
Medium ➙ https://medium.com/this-week-in-machine-learning-ai
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from The TWIML AI Podcast with Sam Charrington · The TWIML AI Podcast with Sam Charrington · 54 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
▶
55
56
57
58
59
60
Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
The TWIML AI Podcast with Sam Charrington
How to Build Confidence as an ML Developer with Siraj Raval - #2
The TWIML AI Podcast with Sam Charrington
Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
The TWIML AI Podcast with Sam Charrington
Interactive AI, Plus Improving ML Education with Charles Isbell - #4
The TWIML AI Podcast with Sam Charrington
Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
The TWIML AI Podcast with Sam Charrington
Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
The TWIML AI Podcast with Sam Charrington
Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
The TWIML AI Podcast with Sam Charrington
Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
The TWIML AI Podcast with Sam Charrington
Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
The TWIML AI Podcast with Sam Charrington
Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
The TWIML AI Podcast with Sam Charrington
Building AI Products with Hilary Mason - #11
The TWIML AI Podcast with Sam Charrington
Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
The TWIML AI Podcast with Sam Charrington
Understanding Deep Neural Networks with Dr. James McCaffery - #13
The TWIML AI Podcast with Sam Charrington
Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
The TWIML AI Podcast with Sam Charrington
Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
The TWIML AI Podcast with Sam Charrington
Machine Learning in Cybersecurity with Evan Wright - #16
The TWIML AI Podcast with Sam Charrington
Interactive Machine Learning Systems with Alekh Agarwal - #17
The TWIML AI Podcast with Sam Charrington
Location-Based Intelligence for Smarter Marketing with Klustera - #18
The TWIML AI Podcast with Sam Charrington
AI-Powered Customer Support with HelloVera - #18
The TWIML AI Podcast with Sam Charrington
Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
The TWIML AI Podcast with Sam Charrington
Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
The TWIML AI Podcast with Sam Charrington
Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
The TWIML AI Podcast with Sam Charrington
From Particle Physics to Audio AI with Scott Stephenson - #19
The TWIML AI Podcast with Sam Charrington
Selling AI to the Enterprise with Kathryn Hume - #20
The TWIML AI Podcast with Sam Charrington
Engineering the Future of AI with Ruchir Puri - #21
The TWIML AI Podcast with Sam Charrington
Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
The TWIML AI Podcast with Sam Charrington
Introducing Psycholinguistics into AI with Dominique Simmons- #23
The TWIML AI Podcast with Sam Charrington
Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
The TWIML AI Podcast with Sam Charrington
Offensive vs Defensive Data Science with Deep Varma - #25
The TWIML AI Podcast with Sam Charrington
Global AI Trends with Ben Lorica - #26
The TWIML AI Podcast with Sam Charrington
Intelligent Autonomous Robots with Ilia Baranov - #27
The TWIML AI Podcast with Sam Charrington
Reinforcement Learning Deep Dive with Pieter Abbeel - #28
The TWIML AI Podcast with Sam Charrington
Robotic Perception and Control with Chelsea Finn - #29
The TWIML AI Podcast with Sam Charrington
Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
The TWIML AI Podcast with Sam Charrington
The Power of Probabilistic Programming with Ben Vigoda - #33
The TWIML AI Podcast with Sam Charrington
Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
The TWIML AI Podcast with Sam Charrington
Video Object Detection at Scale with Reza Zadeh - #34
The TWIML AI Podcast with Sam Charrington
Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
The TWIML AI Podcast with Sam Charrington
Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32
The TWIML AI Podcast with Sam Charrington
Smart Buildings & IoT with Yodit Stanton - #36
The TWIML AI Podcast with Sam Charrington
Deep Robotic Learning with Sergey Levine - #37
The TWIML AI Podcast with Sam Charrington
Deep Learning for Warehouse Operations with Calvin Seward - #38
The TWIML AI Podcast with Sam Charrington
Cognitive Biases in Data Science with Drew Conway - #39
The TWIML AI Podcast with Sam Charrington
Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
The TWIML AI Podcast with Sam Charrington
Web Scale Engineering for Machine Learning with Sharath Rao - #40
The TWIML AI Podcast with Sam Charrington
Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
The TWIML AI Podcast with Sam Charrington
Machine Teaching for Better Machine Learning with Mark Hammond - #43
The TWIML AI Podcast with Sam Charrington
LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44
The TWIML AI Podcast with Sam Charrington
Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
The TWIML AI Podcast with Sam Charrington
Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
The TWIML AI Podcast with Sam Charrington
Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
Word2Vec & Friends with Bruno Gonçalves -#48
The TWIML AI Podcast with Sam Charrington
Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49
The TWIML AI Podcast with Sam Charrington
Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
The TWIML AI Podcast with Sam Charrington
Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
The TWIML AI Podcast with Sam Charrington
AI-Powered Conversational Interfaces with Paul Tepper - #52
The TWIML AI Podcast with Sam Charrington
Topological Data Analysis with Gunnar Carlsson - #53
The TWIML AI Podcast with Sam Charrington
ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
The TWIML AI Podcast with Sam Charrington
Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
The TWIML AI Podcast with Sam Charrington
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI