The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Latent Space · Advanced ·📰 AI News & Updates ·1y ago

Skills: LLM Foundations80%LLM Engineering70%Fine-tuning LLMs60%

Pod: https://www.latent.space/p/yitay It’s easy to get de-sensitized to new models topping leaderboards every other week — however, the top of the LMsys leaderboard has typically been the exclusive domain of very large, very very well funded model labs like OpenAI, Anthropic, Google, and Meta. OpenAI had about 600 people at the time of GPT-4, and Google Gemini had 950 co-authors. This is why Reka Core made waves in May - not only debuting at #7 on the leaderboard, but doing so with all-new GPU infrastructure and 20 employees with no more than 5 people on pre-training and a relatively puny $60m in funding.

What You'll Learn

The video discusses the 10,000x Yolo Researcher Metagame with Yi Tay of Reka, covering topics such as language research, model architecture research, efficient Transformers, and long-range Arena, with tools like Palm 2, U-AL2, FLAN, Raa Flash, R Core, Edge, Vial Y, T5, and BERT.

Full Transcript

welcome e to l space uh it was this is a long time coming but I'm so excited to have you here yeah thanks for thanks for inviting and excited to be here chat about a lot of stuff yeah um so you're interesting to research and introduce um you are now Chief scientists of Rea um which is uh which is super interesting model thatb but before that you were at Google brain um you were architecture co-lead on Palm 2 you inventor of U all2 you're core contributor on flan um you member of bcore team and you also did some work on generative retrieval um that's very very illustrious three-year career at Google brain yeah thanks thanks thanks yeah uh and then since then Reika you joined in March 2023 announced a $58 million series a in June 2023 I don't know if you know the uh post money valuation or the pre-money valuation is public um so it's crunch basis is is 200 I I did not know something milon so you don't even have to leak it's it's on the internet um in February uh so so the record stated goals were to work on universal intelligence including general purpose multimodal and multilingual agents self-improving Ai and model efficiency um in February you release Raa flash um in April you release R cor and Edge and then most recently you release vial Y is that a good summary of the last six years we can go deeper into like the the specific papers six no it's not five four years four years yeah oh my God okay this we're talking about AI yeah I was like wondering like since I like St into a time machine or something yeah okay so um can we just talk about like your your transition into you know you did your PhD and we can talk about your PhD um transition into into brain and research and and all that you know I saw you do some work on recommender systems I saw you do some work on um quion what the was that let's let's forget about that just describe your path into modern llms right because you you you didn't start there yeah yeah sure I I think the world also didn't start start there right I mean I think in uh so I joined Google in 2019 end of 2019 and the world looked like really different at that time right uh and I think like uh I think that was around the time the first GPT was released by gpt1 or something was released by opening ey so research like like ml research and and NLP research like um like looked very different at that time uh so I was mostly I I ID I identify as like a like a language researcher uh I don't like to use the word NLP Jason would kill me if I use the word NLP but like I was like okay a language researcher I you know like but I was more like a architecture mod architecture kind of researcher uh and when I joined Google I was also I I contined on this like as a model architecture research I did I worked a lot on like efficient Transformers uh that was your first viral paper uh yeah yeah and like uh you know I wor like long range Arena I I spent quite a lot of time looking of like could we do without attention like there was the synthesizer paper back in 2020 I think that was like my early days in Google there wasn't like like a at that point of time like transformal research was mainly like like WMT like machine translation and like perity and stuff like that it's not really about you know there wasn't like like I think F short learning and F shot in context learning came only about like you know when gbd3 came out and Beyond right and then uh so I think that at that time the The Meta I would say the meta look very different and uh at that time a lot of the work will focus on like fine tuning things like T5 or bird or some something like that right so uh I think a lot of research not not only myself but like around me or like even uh the broader Community were working on those kind of things uh and I think like yeah so I think that that was which I feel that in hindsight today is actually pretty useful to like kind of uh think about because uh a lot of people came into like Ai and and into like right after like CH Beauty came out right so they they they saw AI as like kind of like uh they you know I think there's a lot of benefits of like like uh you know understanding how you know Transformers and like like if you if you like I've broken this thing apart like so many times trying to it's like these things actually you know help to improve intuition and you know I think like uh it's not like totally disconnect like I think a lot of things are still uh relevant today and and it's just the scale has gotten uh much larger and also like the the paradigm shift a little bit from like single task fine tuning to like generally do everything kind of universal Foundation models Foundation models right I think it's just a slight change in Paradigm but fundamentally I don't think like the like the the the stuff has actually like the underlying principles of research hasn't really changed that much except for like compute the the the compute data uh yeah so basically algorithm stay put and then compute and data scaled so I I have I have I have some some thoughts about this right so I think like back back then like uh a lot of the academic research I think people have talked about this like like Sasha rash has talked about this or other people have talk about this it's like the the the conferences were always organized by like applications right they were always organized by like oh like question answering this kind of thing and even in 2019 which some like like it was it was always by this right I think there was there's like a bit of a transpose going on Things become Universal and then becoming like like okay there's a Data Book stream there's a mod architecture work stream and then people work on improving like a a universal model and and and general purpose algorithms to improve this model rather than finding domain specific tricks I think this has I think for even in 2019 I think I've really been like focusing on like works that are like like you know you could improve on General architecture at that time it was like like maybe lsdm in 2017 or something and then you try on like 10 different task and the kind of thing right but like a lot of the research Community have been focused more on like um you know like how do I get that extra 2% on question answering or like and then or sentiment analysis I think there was this phrase of like in 2017 2018 where this start work was like still very fashionable in in in in Academia and conferences right and then I think the the big thing about the the chat gbd moment of like 20122 the the thing that changed drastically is like it completely like it was like a this Shar like like make all this work like kind of like obsolete if if uh like uh you so November 2022 you're saying the chat exactly chat gbt launch because I feel like if you're in the research Community this was coming yeah yeah that's why I'm saying I'm saying that like uh in in like the big labs and stuff like like people have already been moving toward General even T5 was already like general purpose and the thing right but like there was is like there's a bit of a time like like lack to like for like okay like places like Google and and and meta open ey we will be working on things like three years ahead of everybody else and then suddenly like then Academia would be like still working on like these task specific things and then like the I I think the the fing function was the the chil moment actually really like like it was coming it was coming it was just like the the final the last last STW and then it's finally like yeah yeah now it's serious yeah now now it's really the the the thing completely changed but yeah so I think that that was that was I I don't know how it turned from my from my background to like talking about the meta uh I think that you navigate The Meta very well and and part of my goal here is to also isolate how you think about the meta for other people to reflect on cuz I think obviously you do it very well uh oh thanks yeah it's somewhere around so I'm looking at your papers published somewhere around 2021 you had a hard cut to 20 to you all two and palm and you did you all two Palm emergin abilities DSi Rec recitation augmented Generation all in the same year is um so like there was did you change teams did you did you like have a research Focus um like when did you become oh so you saying that like mych my research became iment right IM it was like it's very obvious no I I I I I don't think I I I don't think I I don't think I'm like a person that like like I'm not like super super great at like forcing a trend like two years ahead and then like like and then like speci specially like plan for that right I think I smoothly and as like kind of like as the few Mo like like you know it's it is if didn't felt like I never actually had a time where I said I'm going to Pivot myself into like this this this this uh uh you know you know this I I never actually really thought about this this way I just did like at every step I just like optimize for like what I found to be most impactful and most promising and then that gradually and also it's also a lot of influence by by talking to people right I think uh at the time I started working more with I had some close collaborations with Jason and and other people like I mean Google is a very you can work with anybody you want basically so so you're kind of like also like butly is like the environment shift and I think the environment shift like like very quickly but like I I was also always very like I was always like pulling for like the in the environment I was not like uh I think it's always good to like have open mind and then move along with the field rather than like okay this is my research area I'm going to get stuck in it two years I think I just move along to find like things that interest me and naturally I think like that turned out to be like the things that were most impactful at that time uh I mean I I think I okay I mean if you put it that way it's like okay I kind of in retrospect kind of did well but like I never actually really like saw it as like the intentional sure I I didn't like do anything like really intentional except that's like doing what I find like interesting actually uh yeah uh cool well well we'll we'll just talk about the the main work at Google brain and then we'll we'll move to to Rea um so out of your two Palm emerging abilities which which of these came first I this FL FL I I I can't really actually remember talk about your two then okay so so so uh you do and DSi uh the differentiable search index I I I was working on it like uh in the December of uh 2021 uh and uh so like at Google they were like projects that are like big efforts that like uh you know like uh you like a researcher will be like part of the effort and then this would be uh kind of top downish uh to some extent right and then they they will also like bottom up like research that one could do like I I I can't speak for the Google Now for sure but like at at least at that time right uh so U2 and DSi differentiable search index were like works that I kind of tinkered with in the December break where like nobody was around okay and then I was just like working on so U2 and DSi were like kind of pal like pal also there's this uh kind of uh uh differentiation because there's palm one and there's palm two right so pal two I was actually like the CATE of one of the work streams but like pal pal one I was more of a contributor and pal two I was like so so they like now I have to think back of like okay what's the timeline which came first right oh yeah you don't have to no no it's fine it's it's not like a but but I mean in general they were like like kind of three categories of works one is like broader efforts that that that are like maybe like a level efforts and then there are some that like You2 and the were like my own projects like projects I use the compute that that I had and then I just played with it you accidentally left your two running for yeah yeah yeah that was in the paper yeah it was fun I it was really fun I think uh and then there was also like a third category where those were like the efforts that my good friends were driving and I contributed so flan was just one of the I I know like maybe on on I would like to just maybe say this publicly a lot of people like like because I I I I talk a lot about FL you're FL show number one but but like yeah but like the the first author is actually hungan who is great and then like another guy I was a CO contributor but I mean I just because I'm a little more visible so I kind of accidentally like took a little bit more crage for that but I think like I was a co- contributor but I was not like the lead authors are obvious yeah yeah so so so I just you know sometimes I get like accidentally uh uh but I think in general like yeah so the third categories was like projects that my friends like emergence was also like uh Emer abilities no actually it was the that paper was actually supposed to be only me and Jason on the paper actually and I I actually became friends with Jason from that from that paper and then that led to like this STP of like I don't know 10 papers or something together with Jason and now we like super good friends the ultimate Bromance uh but that was like the emerging emerging paper but uh emerging paper was also like a belong to be like a a bottom up kind of like a thing and and yeah I think yeah fun times yeah it was fun yeah okay um yeah all right so maybe I'll pick on Palm two because I feel like I'll pick on pal P two and emergence is I really want to make sure I tell those stories those are important stories P two I think it's a career story that you uh effectively became a co-lead on on the second version of a very high-profile um companywide effort um how did that happen um you know I think I think people would like to know to to know how to uh you know what's like the the sort of career strategy there uh so like uh to be like I was one of the colit but there were a lot of kits so so I I don't want to take like too much credit for that but like I think uh so my involvement with pal 2 came from the like after ul2 was working well and then uh it was gaining some visibility within Google uh and then uh just just a just a documented rote uh was your two the largest model that Google had released at the time 20 thece yeah I think so that was the largest and you just it was a personal project it was in the project yeah yeah isn't that unusual like I'm just like how can it be like one person's decision to like suddenly release something that you know effectively changed the trajectory of how it work wor that I mean 20b is not that much larger from 11b the 11b uh T5 actually at the time there was 13B mt5 right uh so I think you do is encod decoder 20b model I think when we got it approved it was like uh kind of you know it was released as like kind of like a like the big brother of T5 you know kind of like a okay we we updated T5 with like a new objective and train this new model and 20b and we want to and it uses the same like pre-training data set and everything right so like from Pure C4 yeah from yeah that was the easiest because there was precedence right it was like okay but but yeah there's some AR architecture like the mixture of the noiser yeah uh so so back to Pal 2 I think my involvement with pal 2 came from the the uh the work to to to to add You2 to P to uh and then like I I I mean it was from the top down point of view I I mean the leads were decided in a top down uh manner it's not like like there was not much like fighting or like or any uh major things right it was like uh it it was a mixture like bottom up top down each like half half situation and then like like from the top it was like okay like like uh these are the people who are the most visible in in contributing to this work stream and then okay how about uh e and this other guy becomes like will be in charge of this like modeling work stream and something like that right so I think it was just it just happened that way uh like organically uh and and uh and yeah I think that that was like how how I I I I kind of uh was co-leading the the modeling stream of pp to yeah I think in retrospect you understand now that this is very valuable experience and I think now today it would be much more competitive to to get the the job that you got whereas you didn't you know two years ago you didn't have to try that hard to to get it or or like you kind of lock into it with you all too and then and then like it just compounded from from the initial good decision uh do you think that do you agree with that or I I think it's very hard to like counterfactually analyze these type of things uh like uh I it's it's it's hard to okay I think it's definitely true that there are more people like working on generative AI now and and you know if you are in a big company it's way harder to navigate like these type of things right uh I wouldn't say that there were like nobody also like wanting to work on this like at the time uh in fact they were like actually but you were the obvious choice uh they were less there were less people there were there was definitely less people but I think it was also like uh uh uh how how do I how do I put it like is I would say that maybe it's slightly harder now but like it's also not like it was easy at the time yeah yeah yeah I imagine it's sensitive but also like you know in my mind like this is now the most valuable on the job training in the world and so people want to know how to how to get it this is why I'm trying to uh figure out it might may not be may not I just agree that like actually indiv like individually like like you can you we also cannot take like somebody else like like experience and then try to replicate it on because everybody's like circumstances their initialization Point yeah the thing is kind of also like in different yeah uh uh in in different I think this is not only true for for LM in general right because a lot of times like like oh okay you you you you you did this in this position and then because of this like it's very hard to to to to trace all this down to like to find the causal path so uh yeah I think everything in life there's some luck involved yeah there there is um emerging abilities uh very influential paper uh subsequently contested by the Mir paper oh yeah yeah so before we get to the Mirage uh was there a story behind emerging abilities yeah you know I'm sure it's Jason's like thesis or like just tell just tell more about like the behind the scenes like was was was there a discussion that led to it that you know okay I I I have to I have to like really be like this this one was like this the idea the Inception of it was like mostly Jason okay right I think uh like I I helped out to like you know uh shape up a little bit of the paper uh get some stakeholders involved and stuff I was like like discussing quite a bit of Jason but this the idea itself was like like like like Jason itself so actually when the Mirage thing and everything came out I didn't like okay I was just like hot t for the sake of hot t i I didn't like feel but I I believe in emergence like I just like go on record and just say I mean I I believe in emergence and then like but like I was not like feeling very strongly because I think that like uh I can't speak for Jason but I would just imagine that he would be maybe personally offended because because I know Jason is is a person that takes a lot of like feedback like very well he's a very like like he's not offended by harsh feedback and he he he rebuts well like online as well right but but he he he he like I would just imagine he would be the the one that is the most like uh uh actually the most affected by by criticisms of emergence I was like believing in in in it but I have to say that the I mean that's why he's the first author and I'm the second but but like that was mostly uh Jason's Theses and I have to really say that like uh uh like uh Jason has really good ideas and uh and I was more of like a support role for for that for that for that people yeah sure yeah um yeah cool you know lots more to to discuss there but you believe in emergence that's that's enough enough for me to to work with um no I I also think I also think that I also think that the the the Mirage paper is mostly like I don't know who actually I don't even remember who wrote it like R schaer yeah I I covered him on on my Europe's podcast okay okay he's a very good speaker and the paper was well done it's just that people drew the wrong conclusions from the paper because he had a very good title do you believe in emerence of course okay five I mean how can you read any paper read any the progress of llms and not believe in emergence it's so stupid like just because you Rec you can rep reparameterize some benchmarks and evals and you know make it linear uh doesn't mean like Emergen is completely gone uh and even in the Mirage paper they they acknowledged that there were some uh metrics that were true genuine emergence according to them I think it was something like 25ish per in the ballpark that's not the exact number in ballpark so I was like okay fine like some some benchmarks you disagree with but on the whole there is emergence it's just now we're just talking about the magnitude yeah yeah yeah for sure for I I I think I think I I I I I don't think like the authors of the B had really like very like they didn't I mean nobody we should just assume people don't have bad intentions right but like they they definitely were just doing this but like the I think I was more like like annoyed by like the the new best paper I mean okay best paper War just take take a grain of sort right but like there were people come at me like like oh you should care about this because it's the nearest best like because it's like they were like okay because it's the nearest best paper I'm like does does best paper Awards like mean anything actually it doesn't mean anything right like but like I think that was more of like my my like where my angs was coming from I don't I don't think like I really had I don't even remember who are the authors of that that that right I'm sure they do they're doing well for themselves and uh you know it's yeah we don't have to dwell too much on that okay okay uh okay so couple more things on Google and then we can go to Raa um qu was a manager yeah yeah uh what is I I had another manager called like dawn like I I had two managers during my time uh so I'm just basically going to ask for quick hits from like what did you learn from quar what did you learn from Jason what did you learn from hillan and okay very interesting like uh like your mental embeddings of like who they are who who they represent to you like how they advise you and all that so so qu as a manager he was more like a friend then uh and then we like talk a lot about I think cor is a very researchy person he has a lot of like good like he's more of like intuition uh uh person I I learned a lot about like like from him about like uh you know like uh it's not very like explicit or like uh it's not like exactly like uh there was no like concrete like it was more like overtime and it was very implicit soft kind of feeling but I think like a lot of research science we were like brainstorm a lot about like you know like uh uh I quite quite like that you know when we were there was this up paper that that didn't like get much like as much attention that they that that I feel it deserves but like I think that was one of the works that I kind of like like discussed with C quite a bit and like and that time we releasing the flun to stuff and everything and then like I think cor has a lot of good sense about like what makes a work uh uh a good hit and like you know publicly a good hit and like a lot of research sense about like what like uh what makes like like research like cool you know so I think he has good like intuition as a researcher and I learned quite a little bit about and I also I say that I think Jason also probably learn like quite a bit from qu and and this also influence his his taste uh so I I would guess like like more of like it was not only just like me getting influenced but like there was like a like Jason getting influenced and then Jason influenced me and then there was this like so I think uh overall what I learned from C probably is more of like intuition research taste we would like chat about AGI sometimes Singularity and stuff like this like it was like you know I I I learned like quite he he's he is nice to talk to as as as a as a friend manager kind of uh uh uh like he's like kind of a friend figure to me and and a research like researcher he was he was very he was he's very much a researcher more than like a uh like a corporate you know manager I totally expect that it was fun it was fun uh since you mentioned AGI uh we actually don't cover AGI on on this podcast mostly because it's very hard to be precise or make falsifiable claims um do you perceive differences in the way that AI researchers discuss AGI compared to the regular population uh so I just I don't think that we were making any progress in quantifying it okay iip that question there was a lot of like uh this you know fun fun chatter around it but it was not exactly like a uh yeah yeah yeah um Jason way uh what what did you what do you find what you learn from him what is your distillation of the Jason Jason Jason Jason okay Jason is very interesting so uh I I learned like in my career I learned like two or three things I uh major things from Jason right so uh I think the first thing I learned from him is that like so Jason was actually okay I'm going to talk about the more casual more fun stuff first Jason was the like uh was like uh more spicy on Twitter first before me I I there was an ERA where I was like a goody do shoes I only had my main account I only tweet like my only tweets will be like newspaper alert you know right and then Jason was starting to post like hotex right and I just thought to myself oh damn like like you know and there were there were types that I was like Jason you should not post this you're going to get cancelled right and he he he was fine he he always break through the storm everything and I looked at him and I'm like okay like uh maybe it's not that bad after all to to to to just be right uh so that was like kind of like the the which is very interesting because Jason is much younger than me and I saw this uh and the thing also we out the accounts right we created them around the same time right and interesting story behind it was that like so uh Jason's all account and my account has our own our original IDE like it was not like an anime character out that nobody like like know who is it we have our identity pseudonymous right and then I asked Jason like why do you want to like have a a pseudo like why why don't you just make like and he told me this thing which was quite true was that like if you cannot like okay you can post a t that spicy and it's hot but if you cannot stand by the opinion then you should not have the opinion in the first place right wow right so there was something that oh okay I thought that was profound because so far this I mean there are times where okay I post something and it's spicy and then okay it gets a little bit back and then I okay I kind of agree that okay this is bad then I will retract it but if I could stand by the opinion then I would just stand by it because like that's the point of making it like like it should be said right it should be said because I I can put my name behind it right so that was a like the this is part of the first bucket about like like uh like how uh uh you know uh like kind of influence like my my my online Persona like a little bit like and then uh I I mean then it turns out that like now a hippo is so much more spicy than than the cola is Cola is just hibernating somewhere it's not even around right so uh I think that that was something that that that uh I mean Jason also is more constrained because he he works for like he has like like an actra like employer right and he he has to be a little my God I the worst thing about Twitter is that you know anytime time anyone from open ey tweets anything they're like did you see this researcher from open ey said something blah like and they they read tea leaves that are not there and it makes you very cautious to tweet anything and so it kills the Golden Goose is what I say right there was one tweet I mean at the time when somebody was people are speculating the GP gb2 chatbot right and then Jason put just posted something like on his main account like something like uh I can't I I'm like excited about like new experiments being run like just like a random like just and then people like screenshot that and like post like like I I hate that yeah I think I now I think like for for his all account is mostly like personal like personal stuff like you know like very like I I I think he would stay away from likeor things like like a nonwork thing so the Golden Goose has been killed because people on Twitter cannot control themselves from like drawing random conclusions from you know all these like hints and all that like it's yeah it's okay but but but like going through like the actual like this is like fer fer this is fer it's not canon is fer right uh I I think the second thing I learned from Jason is more about like the like as from my uh you know kind of like from my own career is like the importance of like like uh like marketing and and PR so Jason is actually like super good at like I mean I would just like he was actually like really you know the emergence like how many blog posts he wrote about the imerg abilities and how many talks he's given about about imer like a lot you know like probably like the other day I was just at this uh uh uh webcom keyote and he was giving a keynote again about about imerg abilities and it's been two years right so I I think one big success of him is that like he he does the work he he he he he he thinks a lot about like marketing the work itself right I I I did not like in my early parts of my career as early Parts in Google right I was uh I think I I was putting out a lot of work but I didn't put in a lot of like effort in like like thinking about the like how the work is going to be receive I'll just be like here's a paper here's a paper here's a paper right but J will be like I'm going to write his paper and I'm going to like Market the out of it uh so I I I think I learned a lot about like like uh like every single so every single first author paper that that like Jason writes in the last year has like 1,000 citation in one year oh my God like like no I mean not every but like most of it that he leads so his hit rate is very high his hit rate like impact density like is very high right so it's it's pretty interesting like it's pretty interesting like I I I kind of uh so Jason is way more like young yeah he's way younger than me more like like technically like so-call more Junior but I kind of see him as like a PE and I learn a lot from his uh uh basically some some people are just like talented in in in uh different ways and and I think that like I I looked at how he he markets his own work and markets himself actually right uh I think that's such a such a uh something that that that I could learn from from from from from from that if someone is starting from zero like no Twitter presence what is the second best thing to do if if you don't have a Twitter presence me as a researcher for marketing yeah I I think you would like the the the most obvious thing to do like if you're like a research like say hypothetically you're like a researcher in like a place without visibility or without and then you have no personal visibility the the first goal is always to uh try to to find a mentor or co-author that is like within this circle and then you start from there right because and then you you get people from like who has a visibility following to to retweet so you you like work with them like the the the the big goal is not about like uh I I learned this I mean this this is like probably a career mistake in in my my early days was that like you know instead of like focusing on like so people like okay if you do good work it's more of like okay how am I going to like say uh say I I see this uh visible researcher from Deep Mind Right or how can I collaborate with this person and then like kind of uh do something that like they feel is cool and like I I can win their respect and that they would like uh you know they would be willing to co Alor for me because the exercise itself also about how to you're not trying to please reviewers or anything you just if if you can find one semi visible if you don't even have to be like a famous person that's like a semi like few tens of uh not tens of like thousands of followers has a good reputation of research and then you collaborate with this person and then like uh like when you post the work you are co-author of this person person and then like you get the person to like like uh vouch for you or like just rrate over time this would like it could be from internships it could be from like it could be from uh uh you know just DMs I think you know people people are nicer than like than like some people they they seem scary but like if you DM them they actually willing to collaborate actually uh I was scared of you actually and when I dm' you it turned out a lot nicer than than I than I feared so uh thank you for being nice yeah okay okay I'm sorry for good advice no no no I mean obviously I I I uh we didn't know each other before and then you know now now I think we're we're getting a bit more uh friendly um cool that's that's that's really great advice for for people I just want to leave that out there for people um for others who follow the work that the career advice that I give uh the title topic of this is pick up what others put down uh specifically pick pick up what your mentors put down like mentors always have more work to do than they have personally time for uh the high visibility mentors and if you can uh show that you're a good collaborator with them they will lift you up accordingly and you know that's uh that's a that's a pretty good formula for career growth um should I ask about hungan or I I I don't know how close you are oh still we still good good friends so again like you know one thing that one thing that you learn from huan hwan is a great engineer and he's very systematic in the way he thinks uh I think Konan is uh uh uh I without going into detail too much like I I I I spent a lot of time talking to H even like in the even after we we both are different places about like very interesting altic ways to think about life like you know he would even think about things like okay I should not like diverge too much about stuff but like I think he's he's like like H himself I learned a lot about about his way of thinking uh like more of like very interesting like like perspectives on life rather than research but one is a great uh engineer and the the the one thing that scares me about himan is that like he doesn't he he he doesn't have multi monitors he just CES with one small screen and he does everything with like very hyper optimize uh and then this is like one of those UK curve where like one screen one screen and then many screens so so I think scares me because it's like I think that was an new 2022 like we we were doing some work at at at the New Orleans and then he would be like coding like perfectly fine with like this you know 13in MacBook with like one terminal and then he'll be like he keeps telling us like okay it's more optimal to like like like like key like using key like key keyboard is more optimal than moving your head because if you can switch your screen fast enough it's faster than your head like moving to different screens and stuff that I I I did not like actually distill that because it's too painful to to do that but like I mean he he's very uh interesting in a way that like uh he belongs to one of those like hardcore people with like one Monitor and like uh maybe this is a relevant question to to just close out the Google site um what do you think is a good programmer for for AI research like um mean like setup or like you think no not not setup just no not even lifestyle it's more about skills like what what should people have uh what do you interview for maybe right uh what do you see the the high performers do differently than the the less High performers I mean okay like generally there's like I think like for for AI researchers like being a strong IC is like probably like the the the thing that I feel like uh is like important for for AI researchers like like not not like I think like uh you know there are people who like like there's certain level of like sacrifice to be like a like a like a AI engineer AI research especially if you're training like L and because you cannot really be detached from from like your your jobs are could die on the Saturday at 4:00 a.m. right and then uh you there are people who like would just leave it de until like Monday morning and then or like but there will be people who will crawl out a bit at 4:00 a.m. to restart the job or to check the you know tensor Bo or something like that right uh I I think like a lot of like being a successful AI researcher is like about like uh how like how much you're willing to go to like and it needs to come naturally because you cannot be like if you're not like you don't have like this like inductive by you you're not like the kind of person but you cannot if you force yourself to do this you become miserable right like uh I I think a lot of it is about like like uh uh uh I I want to say like passion is like the entire thing but it's more of like just the a kind of personality that that uh like all like just the ability that's maybe that's the ability of like if you're if something there's a bug at like 3:00 a.m. on like Saturday night or something right and then you would like be like you you couldn't go back to sleep unless you you I'm I'm not this is very unhealthy by the way like people should not do this for for for for for for for a long time uh but I think it's it's it's like uh uh and you know I think these kind of things actually like like uh allows people to make progress uh uh faster but it's unhealthy so I I'm also not even sure what like the I think I don't okay I that's on the record I don't recommend this St of Life St I don't want people to to to uh but I think like a lot of people who who are uh uh like not okay not a lot not everybody like but I just think this this kind of uh attitude is like uh important to make progress I mean you you cannot be like checking out on like Friday Saturday Sunday and like work 9 to 5 if you want to like make progress or like some people just so good at detaching like okay like like you know like 8:00 p.m. I'm not going to my job can die and then the the chips can stay idle for like the whole night but I want to watch Netflix right you can you cannot you cannot like I I I I think there's a level like it's it's like a spot right it's not like like if you you cannot win an Olympic goal if you want to like have like per like super ultra good work life balance right yeah so I mean I I just think this is kind of like passion intensity dedication int yeah intensity right but uh I think uh the the the thing like also need to know how to like kind of regulate and make sure that like people don't like die from this typ of like yeah not die per se but like actually like burn out from this typ of things yeah so uh those are really good uh personal qualities um just technical qualities wise how much of the stack should people know you know if I okay so there was the question no no no but that was important as well right it's just harder to interview for because you really just see it on the job you know I think St is not like not stack is not that like should I know Cuda kernels I don't know Cuda kernels exactly right okay good for all all you listening out there you don't you have to feel like an imposter no but but you need to be willing to learn if you have to I think well you haven't had to so far yeah I haven't had to so far right uh but so if I like sling like sling pie torch okay great uh you know what kind of like do do I know like distributed systems like do I know like what what um what is this what is the stack that you recommend for people that like you know get gets you like a well-rounded end to-end researcher I don't I I don't think there's any specific thing in fact I will try to be as like like agnostic like I I don't like I don't really say like okay you need to learn Jacks you need to learn this by the time you finish learning there a new framework out anyway so so it's more of like staying like constantly like trying to like like being able to continuously learn and update like uh I I don't think there's a single like like single stack or like a single uh uh single like workflow or single like uh yeah I don't think there's a single one yeah got it cool um well that that leads us to Rea yeah uh what's the founding story uh okay uh so so I I I met some of my other co-founders while we were collaborating at at Deep Mind I was at brain and they were like a deep mind uh and then uh we wanted to uh so I I I see myself as like a uh I was not like uh I was I'm not like a a startup person I I I I identify even today as a scientist and researcher more than like a startup person right uh I think uh my my co-founder Danny uh uh started this story right and then of this this record was like in the works from like late 20 2022 I I finally left in 2023 uh it was like uh I always like uh Danny kept asking me he wants to do something uh do I want to go with him and do it and and it took took a while like for me Al I was like kind of the last co-founder to to like uh to kind of uh form the the was the plan always for you to leave at some point and join him no no he was he was just like convincing you to it was like it was like six months morning in fact like uh I think more than six months period of like like uh and I was like like I always had this uh at the back of my mind for since like what August like uh I I I said no like I I I I didn't like actually I didn't want to to do it in the first place but like uh but I think eventually like in March I I felt that like okay it's time for me to experience something new uh so I there there like a like there there a a uh from my side the the F like kind of like my lip of Faith was more of like I want to experience something new uh I've okay I've I've like wrapped up this Palm to work at Google and then like uh you know and then more of like okay let me experience this new life and see where we can go with this uh so I think that was mainly like like the like from my perspective that was the story of like uh uh and I also I you know we we we don't have a lot of like you know I mean I I personally I don't have a lot of like like oh okay like I I I I okay the funny thing was that like like uh many many years ago before my pH I wanted to do a startup actually at that point and then over time I realized that like I was better off as a researcher and I just forgot about the startup thing and it's quite funny that today I end up doing a being a startup right but even until now I I actually don't yeah as I said I don't really I still kind of uh like I identify more as like a researcher and scientist and and and and uh like yeah so I I think this is this is mainly uh the it's a it's a very realistic like down to earth grounded founding story not nothing too to uh nothing too fancy no no no like nothing uh nothing fancy is this yeah well I mean uh it's not when you left bra like you already had a high profile coming out of brain you could have gone to any uh startup out there they all have wanted you right um yeah okay okay yeah so like why did you choose this one basically like is it just because of pre-existing relationships because it wasn't obvious to me like uh you know a lot of the other co-workers went to open a eye others went to you know like the if you're if you're fair you went to M you know that kind of stuff right like um Rea no Rea was like not on the on the map I I I think was for me it was a divion between staying at at at Google and like co-founding something I I I didn't want to like I didn't want to be uh like uh like it was more of the experience of like being a co-founder that like was attracted me right and and wanting to experience that I wouldn't have left like for inflection or something like that like I mean inflection is gone RP they're still alive they're selling they're selling themselves as a model Foundry or some something um some they like I I don't know there Services compy now yeah no but I also think that like like for example like if you to join like another like it will be like a very big Tech experience again right I I don't know I felt like the the experience I get is very complimentary to what I have like basically what what I have I experienced now is very complimentary to what I uh like uh that's the the the experience I had at Google right but if I were to join like something else right then I wouldn't have like it would I would have just stay at at Google to be honest because to me it was very clearly just two DS that that that that uh I didn't really like I was talking to a bunch of other startups but I didn't really actually had the intention to like go uh uh I was happy at Google actually to be honest I'm sure I'm sure they they make they have a lot of things to keep you happy I was happy at yeah actually um so you described yourself as GPU poor but also you had $60 million to to to play with uh you got a whole bunch of gpus uh I I think you disclosed somewhere but I don't remember the exact number uh and you had a you had good training run for Flash and en cor and age M um how would you tell that the the sort of the story like people can read the technical report but also like uh you know what was that overall experience like uh and I should also Point people to the blog post that you wrote um damn uh so there were a lot of interesting things that happened along the way that like like led to our so I think I left around like a early April the March end of March April and everything right but most of our Compu actually came in December actually yeah and there were delays yeah so h00 they were major delays right so we were sitting around right Bunch with like because you don't own the computer you renting yeah yeah yeah so we we we we we sitting around like with you know for for long period of time we had 500 a100 because we we we we we we we made a commitment like uh and and they were constantly being delayed I think because h100 Supply demand whatever like like reasons that um and it was also very hard to get like a lot of compute like in one place right uh and then we were locked in uh like like uh for for and and we had to wait for for the computer to come right so I think it was very painful because like even when the computer came it was mostly broken most of the time and it was broken to a very bad extent that that that that that uh that so so it was actually I I I you know I before I I I left Google I like even I I even the early stage I was very optimistic about like okay this compute translates to to to this amount of flops is this model right but I never expected the the the reliability to be so poor that that it just threw off all the calculations about like uh and then we had to you know like uh work like 10 times harder just to just to like make the thing go smoothly so I would say that like the the it was a like bearable pain I think the pain was like bearable but like it was just way way more than than than than than than than than expected I think you addressed this in your post but uh the Temptation would have been just to run everything on tpus which is the stack that you already know very well uh that that works very well no no so so so tpus outside Google and TP inside Google are probably very different things I think oh how come uh okay firstly is like infrastructure like there was there wasn't like a lot of like good Cod bases like outside Google that was like still right and and uh the the the the code base that I was most familiar with was like T X it was a jack space it would have been like by by the time we wanted to consider it was really like deprecated like for 9 months right uh and then tpus like uh I I mean I I I I I I'm we weren't sure about like the the I mean the availability of tpus was also not great great like oh I my perception is was a lot better people have the learning curve yeah but at the point of time we had our infr set up we were trading already Trading models and like it would be so much cost to like switch to tpus uh so I I think tpus the experience of tpus inside outside Google I have not actually run a single TPU job outside Google by the way uh but just like looking through documentation from what I see outside and from like like how much I think that people inside Google don't care about what people think outside Google uh like I kind of feel like okay we were a bit like like I think I don't think we we considered uh uh uh I mean not not like forever not considering this but like just like uh at that point of time it was like the obvious choice is let's stick to chip use and and and py do and make like uh I mean it's not as if the chips we we we we we we we we ordered with not dead they were dead they just not in the

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Latent Space · Latent Space · 34 of 60

← Previous Next →

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Ep 18: Petaflops to the People — with George Hotz of tinycorp

FlashAttention-2: Making Transformers 800% faster AND exact

FlashAttention-2: Making Transformers 800% faster AND exact

RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

RAG is a hack - with Jerry Liu of LlamaIndex

RAG is a hack - with Jerry Liu of LlamaIndex

The End of Finetuning — with Jeremy Howard of Fast.ai

The End of Finetuning — with Jeremy Howard of Fast.ai

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Four Wars of the AI Stack - Dec 2023 Recap

The Four Wars of the AI Stack - Dec 2023 Recap

The State of AI in production — with David Hsu of Retool

The State of AI in production — with David Hsu of Retool

Building an open AI company - with Ce and Vipul of Together AI

Building an open AI company - with Ce and Vipul of Together AI

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Making Transformers Sing - with Mikey Shulman of Suno

Making Transformers Sing - with Mikey Shulman of Suno

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

Why Google failed to make GPT-3 -- with David Luan of Adept

Why Google failed to make GPT-3 -- with David Luan of Adept

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Breaking down the OG GPT Paper by Alec Radford

Breaking down the OG GPT Paper by Alec Radford

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

LLM Asia Paper Club Survey Round

LLM Asia Paper Club Survey Round

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How AI is Eating Finance - with Mike Conover of Brightwave

How AI is Eating Finance - with Mike Conover of Brightwave

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

State of the Art: Training 70B LLMs on 10,000 H100 clusters

State of the Art: Training 70B LLMs on 10,000 H100 clusters

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

Synthetic data + tool use for LLM improvements 🦙

Synthetic data + tool use for LLM improvements 🦙

RLHF vs SFT to break out of local maxima 📈

RLHF vs SFT to break out of local maxima 📈

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Answer.ai & AI Magic with Jeremy Howard

Answer.ai & AI Magic with Jeremy Howard

Is finetuning GPT4o worth it?

Is finetuning GPT4o worth it?

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Building AGI with OpenAI's Structured Outputs API

Building AGI with OpenAI's Structured Outputs API

Q* for model distillation 🍓

Q* for model distillation 🍓

Finetuning LoRAs on BILLIONS of tokens 🤖

Finetuning LoRAs on BILLIONS of tokens 🤖

Cursor UX team is CRACKED 💻

Cursor UX team is CRACKED 💻

Choosing the BEST OpenAI model 🏆

Choosing the BEST OpenAI model 🏆

How will OpenAI voice mode change API design?

How will OpenAI voice mode change API design?

STEALING OpenAI models data 🥷

STEALING OpenAI models data 🥷

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

Prompt Engineer is NOT a job 📝

Prompt Engineer is NOT a job 📝

Prompt Mining LLMs for better prompts ⛏️

Prompt Mining LLMs for better prompts ⛏️

The six pillars of few-shot prompting 🔧

The six pillars of few-shot prompting 🔧

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

Can you separate intelligence and knowledge?

Can you separate intelligence and knowledge?

The video discusses the 10,000x Yolo Researcher Metagame with Yi Tay of Reka, covering topics such as language research, model architecture research, and efficient Transformers. Yi Tay shares his experiences working on various projects, including Palm 2 and Raa Flash, and emphasizes the importance of collaboration, research focus, and constant learning.

Key Takeaways

Build language models using tools like Palm 2 and U-AL2
Develop efficient Transformers using techniques like retrieval augmented generation
Fine-tune language models using tools like FLAN and Raa Flash
Optimize model performance using techniques like long-range Arena
Collaborate with other researchers to improve model architecture and accuracy

💡 The importance of collaboration, research focus, and constant learning in achieving success in language research and model architecture research.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Japan Gave the World Robots, Bullet Trains, and PlayStation. So Why Is It Losing the AI Race?

Japan, a pioneer in technology, is struggling to keep up in the AI race, and understanding the reasons behind this can provide valuable insights for other countries and businesses

Japan Gave the World Robots, Bullet Trains, and PlayStation. So Why Is It Losing the AI Race?

Japan, a pioneer in technology, is struggling to keep up in the AI race, and understanding the reasons behind this can provide valuable insights for other countries and companies

Boardroom Priority : Why Data Trust Is Now a Leadership Metric for AI-Ready Enterprises

Data trust is now a key leadership metric for AI-ready enterprises, requiring attention from the boardroom to ensure reliable and secure data-driven decision making

AI Glossary I Wish I Had When I Started

Learn key AI terms to improve your understanding of the field and stay up-to-date with new technologies

Medium · Programming

Motorist saved by human chain | 9 News Australia

9 News Australia