How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

Latent Space · Beginner ·📰 AI News & Updates ·2y ago

Skills: LLM Foundations80%LLM Engineering80%Prompt Craft70%Tool Use & Function Calling70%ML Maths Basics60%

Key Takeaways

The video discusses how to hire AI engineers, featuring James Brady and Adam Wiggins of Elicit, and covers topics such as AI engineering, language models, and conventional software engineering, with a focus on the skills and mindset required for AI engineers, including curiosity, enthusiasm, and a fault-first mindset, as well as the importance of defensive coding, system design, and product perspective, and the use of tools such as Python, TypeScript, Kubernetes, and OpenAPI, and the need for fu

Full Transcript

okay so welcome to the Laten SP podcast this is another s remote episode uh that we're recording um actually this is the first one that we're doing around a guest post and um I I'm very honored to have two of the authors of the post with me U James and Adam from Melissa welcome James welcome Adam thank you great to be here hey there hey um okay so uh I think I'll do this kind of in order I think James you're you're sort of the the primary author um so James you are head of engering at elicit um you also uh were VPN Teespring and spring as well um and you also you know you have a long history in sort of engineering um how did you uh you know find your way into something like elicit where you know you you are basically a traditional sort of VP and VP technology type person uh moving into more of an AI role yeah that's right it definitely was something of a sideways move if not a left turn so the story there was I've been doing as you said VP technology CTO type stuff for around about 15 years or so and noticed that there was this crazy explosion of of capability and interesting stuff happening within Ai and and ML and language models that kind of thing uh I guess this was in 2019 or so and decided that I needed to to get involved you know this is a kind of generational shift spent maybe a year or so trying to get up to speed on the states of the art reading papers reading books practicing things that kind of stuff was going to found a startup actually in uh in the space of interpretability and transparency and through that met Andreas who has obviously been on the on the podcast before asked him to be an adviser for my startup and he um countered with maybe you'd like to come and uh run the engineering team at a list it which it turns out was a much better idea and uh yeah I I kind of quickly changed in that direction so I think some of the stuff that we're going to be talking about today is how actually a lot of the work when you're building applications with AI in ml um looks and smells and feels much more like conventional software engineering with a few key differences rather than really deep ml stuff and I think that's one of the reasons why I was able to transfer skills over from one place to the other yeah I definitely agree with that um I I I do often say that I think AI engineering is about 90% software engineering with like the 10% of like really strong uh really differentiated AI engineering um and that might that obviously that number might change over time um I want to also uh welcome Adam uh onto my podcast because you welcomed me onto your podcast two years ago um and yeah that was wonderful glad for that that was that was a fun episode um you famously founded Heroku uh you just wrapped up a few years working on Muse uh and now you're you described yourself as a journalist internal journalist working on elicit yeah well I'm I'm kind of a little bit of an in a Wandering phase here and trying to um take this take this time in between between Ventures to see what's out there in the world and uh some of my wandering took me uh to the elicit team and found that they were some of the folks who were doing the most interesting really deep work in terms of uh taking the capabilities of language models and applying them to what I feel like a really important problems so in this case science and literature search and and that sort of thing it fits into my general uh interest in tools and productivity software I think it is a tool for thought in many ways but a tool for science obviously if we can accelerate that discovery of new medicines and things like that that's that's just so powerful but to me it's it's kind of also an opportunity to learn at the feet of some real Masters in this space people who have been working on it since it was before it was cool if you want to put it that way so for me the last couple of months have been this crash course and why I sometimes describe myself as an internal journalist is I'm helping to write some some posts including supporting James in in this article here where we're doing for Laton space where I'm just bringing kind of my my writing skill and and that sort of thing to bear on their very deep domain expertise around language models and applying them to to the real world and kind of surface that in a way that's I don't know accessible legible that that sort of thing um and so and the great benefit to me is I get to learn this stuff uh in a way that I don't think I would or I haven't just kind of tinkering with my own side projects yeah totally um I should also I forgot to mention that you also run Incan switch which is the uh one of the leading research labs in my mind of uh the tools for Thought productivity space you know whatever people mentioned there or maybe future of programming even a little bit of that as well um I think you guys definitely started the local first wave um I think they were just the first conference that you guys held I don't know if you were personally involved uh yeah I was one of the co-organizers along with a few other folks uh for yeah called local first comp here in Berlin huge success from my my point of view local first obviously a whole other topic we can talk about on another day I think there actually is a lot more um what would you call it uh um you know handshake Emoji between kind of language models and the local First Data model um and that was part of the topic of the the conference here but uh yeah topic for another day not necessarily I mean if if I if I can grab your thoughts at the end on local first and AI we can we can talk about that um I featured uh you know I I selected as one of my Keynotes Justin tuny from llama file uh on working working on L file at Mozilla U because I think there's a lot of people interested in in that stuff um but we can we can focus on the the headline topic uh just to not bury the lead which is we're talking about hire how to hire AI Engineers um this is something that I've been looking for a credible Source on for months uh people keep asking me for my opinions I don't feel qualified to give an opinion given that uh you know I've I've uh I I only have so much engineering experience and uh it's not like I've I've defined a hiring process that I'm super happy with even though I've worked with a number of AI engineers how I'll just leave it open to you James how was your process of defining your hiring hiring roles yeah so I think the the first thing to say is that we've effectively been hiring for this kind of a role since uh before you before you coin the term and tried to kind of build this understanding of what it was which is not a bad thing like it's it was a a concept a concept that was coming to the four and effectively needed a name which is uh which is what you did so uh the reason I mentioned that is I think it was something that we kind of backed into if you will we didn't sit down and and come up with a brand new role from from scratch if this is a the completely novel set of responsibilities and skills that this person would need however it is a kind of particular blend of different um skills and attitudes and uh and Curiosities interests which I think makes sense to kind of bundle together so in the in the post the three things that we say are most important for highly effective AI engineer are first of all conventional software engineering skills which is uh kind of a given but uh definitely worth mentioning the second thing is a curiosity and enthusiasm for uh machine learning and maybe in particular language models that's certainly true in in our case and then the third thing is to do with basically a a fault first mindset being able to build systems that can handle things going wrong in in in some sense and uh yeah the I think this the kind of Middle Point the curiosity about ML and language models is probably fairly self-evident um they're going to be working with and prompting and dealing with responses from these these models so that's clearly relevant the last point though maybe takes the the most explaining um to do with this fault first mindset and the ability to to build resilient systems the reason that is is so important is because compared to normal apis where normal think of something like a a stripe API or a search API or something like this conventional search API uh the latency when you're working with language models is is wild like you can get 10x uh variation I mean I was looking at the stats before actually before before the podcast we do often normally in fact see a 10x variation in the uh P90 latency over the course of half an hour an hour uh when when we're prompting these models which is way higher than if you're working with a you know a more more kind of conventional uh conventionally backed API and the responses that you get the actual content the responses are naturally unpredictable as well they come back with different formats maybe you're expecting Jason it's not quite Jason you have to handle this stuff and also the the semantics of the messages are unpredictable too which is which is a good thing like this is one of the things that you're looking for from these language models but it all adds up to needing to build a resilient reliable solid feeling system on top of this fundamentally well certainly currently fundamentally shaky Foundation the models do not behave in the way that you would like for them to and yeah the the ability to structure the code around them such that it does give the user this warm reassuring nappy solid feeling is is really what we're driving for there yeah I I think so go ahead go ahead you can try in yeah that really struck me as we were starting to dig on on what this what this article would would contain kind of language models as this chaotic Med medium sorry again uh canit what what really struck me as we we dug in on the content for this article was that third point there the the language models is this kind of chaotic medium this this Dragon this Wild Horse you're you're you're riding and trying to guide in the direction that is going to be useful and reliable to users because I think so much of software engineering is about making things not only high performance and snappy but really just making it stable reliable predictable which is literally the opposite of of what you get from uh from the language models and yet yeah the output is so useful and indeed some of their creativity if you want to call it that which is is precisely their value and so you need to work with this medium and the I guess the Nuance or the thing that came out of alistic experience that I thought was so interesting is quite a lot of working with that is things that come from distributed systems engineering uh but you have really the AI Engineers kind of as sort of as we're defining them or or labeling them on the elicit team is people who are really application developers you're building things for end users you're thinking about okay I need to populate this interface with some response to user input that's useful to the task they're trying to do but you have this thing that this medium that you're working with that in some ways you need to apply some of this chaos engineering distributed systems engineering which typically those people with those engineering skills are not kind of the application Level developers with the product mindset or whatever they're more deep in the guts of a of of a system and so it's those those skills and and knowledge do exist throughout the engineering discipline but sort of putting them together into one person that is a that feels like sort of a unique thing and working with the folks on the elista team who have that skills I'm I'm quite struck by that unique that unique blend I haven't really seen that before in my 30-year career in technology yeah that's a um fascinating I like the reference to chaos engineering um I have some appreciation you know I think um when you had me on your podcast I was still working at temporal um and that was like a a nice you know framework if you live within temporals boundaries um you can pretend that all those faults don't exist and uh you can you can code in a sort of very fault tolerant way um what what is you guys' Solutions around this actually like what um I think you're you're emphasizing having the mindset but maybe naming some technologies would help um yeah you know not not being not saying that you have to adop these Technologies but they're just they're just uh quick vectors into what you're talking about when you're when you're talking about distributed systems like that's such a big chunky word you know like are we talking a kubernetes or and I suspect we're not you know like we're talking something else now yeah that's right uh it's more at the application Level rather than at the infrastructure level at least at least the way that it works for us so there's nothing kind of radically Novel here it is more a careful application of existing Concepts so the kinds of tools that we reach for to handle these kind of slightly chaotic objects that Adam was just talking about are retries and fullbacks and timeouts and careful error handling yeah the the standard stuff really um there's also a great degree of dependent we we rely heavily on uh parallelization because you know these language models are not innately very Snappy and uh you know there's just a lot of IO going back and forth all these things I I'm talking about when I was in my earier stages of of a career these are kind of the things that are the the difficult parts that more senior software Engineers we bet at it is careful error handling and concurrency and uh fallbacks on distributed systems and uh you know eventual consistency and all all this kind of stuff um and as Adam was saying the kind of person that is deep in the guts of some kind of distributed systems really high highs scale backend kind of a problem would probably naturally have these kinds of uh skills but you'll find them on on day one if you're building you know an ml powered app even if it's not got massive scale um I think one uh one thing that I would mention that we do do um yeah maybe maybe two related things actually the first is we're big fans of strong typing we have you know we share the types all the way from the backend python code all the way to the to the front end in typescript and find that is I mean we probably doing this anyway but it really helps one reason around the shapes of the data which going to be going back and forth and that's really important when you can't rely upon uh you you're going to have to coers the data that you get back from the ml if you want if you want for it to be structured uh basically speaking and the second thing which is related is we use checked exceptions inside our python codebase which means that we can use the type system to make sure we are handling properly handling all of the the various things that could be going wrong all the different exceptions that could be getting raised check exceptions are not not really uh particularly popular actually there's not many people that are big fans of them uh for our particular use case to really make sure that we've not just forgotten to handle you know this particular type of error uh we have found it useful to um to to force us to think about all the different edge cases that can come up yeah that's fascinating just a quick note of Technology how do you share types from python to typescripts uh do you do you use graphql do you use something else uh we don't we don't use graphql so we've got the types defined in Python that's the source of Truth and uh we go from the open API spec and there's a there's a tool that one can use to generate types dynamically like typescript types from those open API definitions okay excellent okay cool Sorry Sorry for diving into that rap hole a little bit I always like to spell out Technologies for people to uh dig their teeth into one thing I one thing I'll mention quickly is that a lot of the stuff that you mentioned is typically not part of the normal interview loop it's actually really hard to interview for because this is the stuff that you polish out in as you go into production um you know the interviews coding interviews are typically about the happy path um how do we do that how do we how do we design how do we look for a defensive fa mindset because you can defensive code all day long and not add functionality to your to your application yeah it's it's a great question and I think that's exactly true normally the interview is about the happy path and then there's maybe a a box checking exercise at the end if the candidate says of course in reality I would handle the edge cases or something like this and that unfortunately is isn't isn't quite good enough uh when when the happy path uh is is very very narrow and uh there's lots of weirdness on on either side so basically speaking it's just a case of of foregrounding those kind of concerns through the interview process there there's no magic to it we we talk about this in the in the in the post uh that we're going to be putting up on on latent space but the there's two main technical exercises that we do through our interview process for this role the first is more coding Focus focus and the second is more system Desy whiteboarding a potential solution and in without giving too much away in the coding exercise that you do need to think about edge cases you need you you do need to think about errors as uh how best to put this yeah there's the exercise consists of adding features and fixing bugs inside the codebase and in both those two cases it it it does demand because of the way that we set the the application up and the interview up it does demand that you think about something other than the happy path but your thinking is the right prompt of of how do we get the candidate thinking outside of um uh the the kind of normal Sweet Spot smooth uh smooth smoothly fav path in terms of the system design interview that's a little easier uh to prompt this kind of fault first mindset because it's very easy in that situation just to say let's imagine that you know this node dies how does the app still work let's imagine that this network is is going super slow let's imagine that I don't know like you you run out of capacity in in this St space that you sketched out here how do you handle that that that sort of stuff so it's in both cases they're not firmly anchored to and built specifically around language models and ways language models can go wrong but we do exercise the same muscles of thinking defensively and um yeah for foregrounding the Ed cases basically yeah any comment yeah I guess I wanted to mention two James earlier there you mentioned retries and this is something that I think I've seen some interesting debates internally about things regarding first of all retries are can be costly right in general this medium in addition to having this incredibly High variance and response rate and you know being non-deterministic is actually quite expensive and so in many case doing a retry when you get a fail does make sense but actually that has an impact on cost and so there is some sense to which at least I've seen the ai ai engineer on our team worry about that they worry about okay how do we give the best user experience but balance that against what the infrastructure is going to cost our our company which I think is again an interesting mix of yeah again it's it's a little bit the distributed system mindset but it's also a product perspective and and you're thinking about the end user experience but also the the bottom line for the business you're you're bringing together a lot of qualities there and there's also the fallback case which is kind of kind of a related or adjacent one I think there's also a discussion on that internally where I think it maybe was search there was something recently where there was uh one of the Frontline search providers was having some yeah slowness and outages and essentially then we had a fall back but essentially that gave people for a while especially new users that come in that don't know the difference they're getting a they're getting worse results for their search and so then you have this debate about okay there's sort of what is correct to do from an engineering perspective but then there's or or but then there's also what actually is the best result for the user is giving them a kind of a worse answer to their search result better or is it better to kind of give them an error and be like yeah sorry it's not working right at the moment try later both are obviously nonoptimal uh but this is the kind of thing I think that U that you run into or or the kind of thing we need to Grapple with a lot more than you would other kinds of of mediums yeah that's a really good example uh I think it brings to the four the the two different things that you could be optimizing for of up time and response at all costs on one end of the spectrum and then effectively fragility but kind of if you get a response it's the best response we can come up with at the other end of the spectrum and where you want to land there kind of depends on well it certainly depends on the app obviously depends on the user I think it depends on the feature within the app as well so in the search case that you that you mentioned there in retrospect we probably didn't want to have the fallback and we've actually just recently on Monday changed that um to show an error message and rather than giving people a kind of degraded experience in other situations we could use for example uh you know a large language model for a large language model from provider B rather than provider a and get something which is within a few percentage points performance and uh that's just a really different situation uh yeah like any interesting question the answer is it depends uh I I do I do hear a lot of people suggesting um I let's call this model shadowing as a defensive technique which is if open AI happens to be down which happens more often than people think uh then you fall back to anthropic or something uh how realistic is that right like you don't you have to develop completely different prompts for different models and won't the won't the performance of your application suffer from whatever reason right like it it maybe calls differently or it's it's not maintained in the same way um I I I think that people rais this idea of fallbacks to models but I don't think it's I I don't I don't see it practiced uh very much yeah it is you you definitely need to have a different prompt if you want to stay within a few percentage points degradation like I like I said before and that certainly comes at a cost fullbacks and backups and things like this it's really easy for them to go stale and kind of flake out on you because they're off the Beaten Track and um in our particular case inside of elicit we do have fullbacks for a number of kind of crucial functions where it's going to be very obvious very obvious if something has gone wrong but we don't have fullbacks in all cases uh it really depends on a Tas toask basis throughout the app so I can't give you a kind of a a single kind of simple rule of thumb for in this case do this and in the other do that um but yeah we've uh it's a little bit easier now that the apis between the anthropic models and open are more similar than they used to be so we don't have two totally separate co- paaths with different protocols like wire protocols to to speak which makes things easier but you're right you do need to have different prompts if you want to have similar performance across the across the providers I also note just observing again as a relative newcomer here I was surprised impressed I'm not sure what the word is for it uh at the the blend of different backends that the team is using and so there's many the product presents as kind of one single interface but there's actually several dozen kind of main paths there's like for example the search versus a data extraction of a certain type versus uh chat with papers versus and each one of these you know the team has worked very hard to pick the right model for the job and and craft The Prompt there but also is constantly testing new ones so a new a new one comes out uh from either from the big providers or in some cases our own models that are uh you know running on on essentially our own infrastructure and and sometimes that's more about cost or performance but the point is kind of switching very fluidly between them and and very quickly because this field is moving so fast and there's new ones to choose from all the time is like part of the day-to-day I would say so it isn't more of a like there's a main one it's been kind of the same for a year there's a fallback but it's got cobwebs on it it's more like which model and which prompt is changing weekly and so I think it's quite quite reasonable to um to to have a fallback that you can expect uh might work I'm curious because you guys have had experience working uh at both you know elicit which is a smaller operation and and larger companies um a lot of companies are looking at this with with a certain amount of trepidation is is you know it's very chaotic um when you have when you have um you know one engineering team that that knows everyone else's names and like you know they they they they meet constantly in slack and know knows what's going on on it's easier to to sync on technology choices when you have 100 teams all shipping AI products and all making their own independent Tech choices um it can be it can be very hard to control um one solution I'm hearing from like the sales forces of the worlds and Walmarts of the world is that they are creating their own a uh AI Gateway right internal AI Gateway this is the the one model Hub that controls all the things and has our standards um is that a feasible thing is that something that you would want is that something you have and you're working towards um what are your thoughts on on this stuff like centralization of control or like an AI platform internally yeah I think certainly for larger organizations and organizations that are doing things which maybe are running into Hipp compliance or other um um legislative tools like that it could make a lot of sense yeah I think for the tldr for something like elicit is we are small enough as you as you indicated and need to have full control over all the levers available and switch between different models and different prompts and whatnot as Adam was just saying that that kind of thing wouldn't work for us but yeah I've spoken with and um advised a couple of companies that are trying to sell into that kind of a space or at a at a larger stage and it does seem to make a lot of sense for them so for example if you're trying to sell to a large Enterprise and they cannot have any data leaving the EU then you need to be really careful about someone just accidentally putting in you know the sort of Us East one GPT 4N points or something like this if you're try think of more specific example there yeah I think the I'd be interested in understanding better in understanding better what the specific problem is that they're looking to solve with that whether it is to do with data data security or centralization of billing or if they have a kind of Suite of prompts or something like this that people can choose from so they don't need to reinvent the wheel again and again I wouldn't be able to say without understanding the problems and the proposed Solutions you know which kind of situations that be better or or Worse fit for but yeah for elicit where really the the secret Source if the is secret source is Which models we're using how we're using them how we're combining them how we're thinking about the user problem how we're thinking about all these pieces coming together you really need to have all of the affordances available to you to be able to experiment with things and an iterate rapidly and generally speaking whenever you put these kind of layers of abstraction and control and generalization in there that that gets in the way so so for us it would not work do you feel like there's always a tendency to want to reach for standardization and abstractions pretty early in a new technology cycle there's something comforting there or you feel like you can see them or whatever I feel like there's some of that discussion around Lang chain right now um but yeah this is not only so early but also moving so fast you know I think it's I think it's tough to to ask for that that's that's not the space we're in but the yeah the larger an organization the more that's your your default is to to to want to reach for that it's a sort of comfort yeah interesting I find it interesting that you would say that um you know being a founder of Heroku where uh you know you were one of the first platforms as a server that more or less standardize what know that that sort of early development experience should have looked like and I think basically people are feeling the differences between calling various model lab apis and having an actual AI platform where um you know all all their development needs are thought of for them um you know it's it's very much you know and and I I defined this in my AI engineer post as well like the model Labs just see their job ending at serving models and that's about it but actually the responsibility of the a engineer has to fill in a lot of the gaps beyond that um so um okay yeah that's true yeah that's true I think you know a huge part of the exercise with Heroku which was largely inspired by rails which itself was a you know one of the first Frameworks to standardize the kind of crud app with the SQL database and people had bu been building apps like that for many many years I had built many apps I had made my own kind of templates based on that I think others had done it and rails came along at the right moment where we had been doing it long enough that you see the patterns and then you can say look let's let's extract those into a framework that's going to make it not only easier to build for the experts but for people who are relatively new the best practices are encoded into that uh framework you know model the controller to take one example um but then yeah once you see that and once you experience the power of a framework and again it's it's so comforting and the you develop faster and you're uh it's easier to onboard new people to it because you have these uh these standard and this consistency then folks want that for something new that's evolving now here I'm thinking maybe if you fast forward a little to for example when react came on the on the scene you know a decade ago or whatever and then okay we need to do State Management what's that and then there's you know there's a new library every six months okay this is the one this is the gold standard and then you know 6 months later that's deprecated um because of course it's evolving you need to figure it out like the tacit knowledge and the experience of putting it in practice and seeing what those real what those real needs are uh are are critical and so it's it is really about finding the right time to say yes we can generalize we can make standards and abstractions whether it's for a company whether it's for you know a library an open source library for a whole um class of apps um and it's very much a much more of a a judgment call uh Slash just a sense of taste or um uh you know experience to be able to say yeah we're at the right point we can standardize this um but it's at least my my very again and I'm so new to that this world compared to you both but my my sense is yeah still the wild west that's what makes it so exciting and feels kind of too early for too much in the way of standardized abstractions not that it's not interesting to try but um you know you can't necessarily get there in the same way rails did until you've got that decade of experience of whatever building different classes of apps in that with that technology yeah it's it's interesting to think about what is going to to stay more static in what is is expected to change over the coming 5 years let's say which seems like a when I think about it through an ml lens is an incredibly long time and if you just said 5 years it doesn't seem doesn't seem that long I think that that kind of talks to part of the problem here is that things are moving and moving incredibly quickly I would expect this is my my hot take rather than some kind of official carefully thought out position but my hot take would be something like the uh you can you'll be able to get to good quality apps without doing really careful prompt engineering I I don't think that prompt engineering is going to be a kind of durable differential skill that people will will hold I do think that the way that you set up the ml problem to kind of ask the right questions if you see what I mean rather than the specific phrasing of um exactly how are you doing Chain of Thought or F shot or something in the prompt uh I think the way that you set it up is is probably going to be um remain to be trickier for longer and I think some of the operational challenges that we've been talking about of wild variations in in in latency and handling the I mean one way to think about these models is the the first lesson that you learn when when you're an engineer software engineer is that you need to sanitize user input right it was I think it was the top aasp security threat for a while like you you have to sanitize and validate user input and we got used to that and it kind of feels like this is the the the shell around the app and then everything else inside you're kind of control of and you can grasp and you can deburg Etc and what we've effectively done is through some kind of weird rear rear guard action we now got these slightly chotic things I I think the Mor's complex adaptive systems which are you know related but a bit different definitely have some of the same uh Dynamics we've we've injected these into the foundations of the of the app and you kind of now need to think with this div defensive mindset downwards as well as upwards if you if you see what I mean so I think it going to it's it I think it will take a while for us to truly wrap our heads around that also these kinds of uh problems you have to handle things being unreliable and slow sometimes and whatever else even it even if it doesn't happen very often there isn't some kind of industrywide accepted way of handling that at massive scale there are definitely you know patterns and anti-patterns and tools and whatnot but it's not like this is a solved problem so I would expect that it's not going to go down easily as a as a solvable problem at the ml SK Lether yeah excellent uh I would describe uh in the terminology or the stuff that I've written in the past I described this inversion of architecture as um sort of LM at the core versus llm uh or a code at the core uh we're very used to code at the core actually we can scale that very well um when we build llm core apps we have to realize that the the central part of our app that's orchestrating things is actually uh prom prone to you know prompt injections and non-determinism and all that all that good stuff um I did want to move the conversation a little bit from the sort of defensive side of things to the more offensive or you know the fun side of things capabilities side of things because that is the other part of the uh job description that we kind of skimmed over um so I I'll repeat what you said earlier it's you want people to have a genuine curiosity and enthusiasm for the capabilities of language models um we just we're recording this the day after anthropic just dropped CLA 3.5 uh and I I was wondering you know maybe this is a a good exercise is how do people have curiosity enthusiasm for capabilities language models when for example the research paper for CLA 3.5 is four pages um there's there's not much yeah well maybe that's a maybe that's not a bad thing actually in in this particular case so yeah if you really want to know exactly how the sausage was made uh that hasn't been possible for a few years now in fact for for these new models but from our perspective as uh when we're building a lit what we primarily care about is what can these models do how do they perform on the tasks that we already have set up and the evaluations we have in mind and then on a slightly more expansive note what kinds of new capabilities do they seem to have can we elicit no pun intended from the models for example well there there's very obvious ones like multimodality uh you know there wasn't that and then there was that or it could something a bit more subtle like it seems to be getting better at reasoning or it seems to be getting better at metacognition or it seems to be getting better at uh marking its own work and giving giving calibrated confidence estimates things like this uh yeah there's there's plenty to be excited about there it's just that um yeah there's rightly or wrongly been this this this shift over the last few years to uh not give all the details no but from application development perspective every time there's a new model least there's a flow of activity in our slack and we try to figure out what it can do what it can't do run our evaluation uh Frameworks and um yeah it's always an exciting happy day yeah from my perspective what I'm seeing from the from the folks on the team is first of all just awareness of the new stuff that's that's coming out so that's you know an enthusiasm for the for the space and following along and then being able to very quickly partially that's having the slack to do this but be able to quickly map that to okay what does this do for our specific case and that the simple version of that is let's run the evaluation uh framework which Lissa has quite a comprehensive one I'm actually uh working on an article on that right now which I'm very excited about because it's a very interesting world of things uh but basically you can just try just but TR try the new model in the evaluations framework run it it has a whole slew of benchmarks which includes not just accuracy and confidence but also things like performance cost and so on and all of these things may trade off against each other maybe it's actually it's very slightly worse but it's way faster and way cheaper so actually this might be a net win for example or it's way more uh accurate uh but that comes at its slower and higher cost and so now you need to think about those tradeoffs and so to me coming back to the qualities of an AI engineer especially when you're trying to hire for them it's this it's it is very much an application to developer in the sense of a product mindset of what are are our users or our customers trying to do what problem do they need solved or what what does our product solve for them and how does the capabilities of a particular model potentially solve that better for them than what exists today and by the way what exists today is becoming an increasingly gigantic Cornucopia of things right and so you say okay this new model has these capabilities therefore you the simple version of that is plugged into our EX evaluations and just look at that and see if it it seems like it's better for a straight out swap out but when you talk about for example you multimodal capability and then you say okay wait a minute actually maybe there's a new feature or a whole new way we could be using it not just a simple model swap out but actually a different thing we could do that we couldn't do before that would have been too slow or too inaccurate or something like that that now we do have the capability to do so I I I think of that as being a kind of core SK I don't yet even know how want to call a skill maybe it's even like an attitude or a perspective which is a is a desire to both be excited about the new technology you know the new models and things as they come along but also a holding in mind what does our product do who is our user and how can we connect the capabilities of this technology to how we're helping people in whatever it is our product does yeah I'm just looking at one of our internal slight channels where we talk about things like new new model releases and that kind of thing and it is notable looking through these the kind of things that people are excited about are not I don't know the context the context window is much larger or it look at how many parameters it has or something like this it's uh always framed in terms of maybe this could be applied to that kind of part of elicit or maybe this would open up this new possibility for elicit and as Adam was saying yeah I I don't think it's really a a a novel or separate skill is the kind of attitude I would like to have all Engineers to have a company our stage actually uh and and maybe more generally even which is not just kind of getting nerd sniped by some kind of Technology number fancy metric or something but how is this actually going to be applicable to the thing which matters in the end how is this going to help users how is this going to help move things forward strategically that kind of thing yeah applying what you know I think is is is the key here um getting hands on as well um I would I would recommend a few resources for people listening along uh the first is elicits um ml reading list which I I I found uh so delightful after uh talking with Andreas about it um it looks like that's part of your onboarding uh we've actually set up an asynchronous uh paper club inside of my Discord for people following on that reading list I love that you you separate things out into tier one and two and three uh and uh that that gives people a factored cognition way of looking into uh the the Corpus right like uh yes the the Corpus of things to know is growing and the water is slowly Rising as far as U what a bar for a compet an AI engineer is uh but I think you know having some structured thought as to what are the big ones that everyone must know um I think is is key um it's something I I haven't really defined for people and I'm I'm glad that Alysa has actually has something out there that people can refer to uh yeah I I but you know I wouldn't necessarily like make it required for like the job uh interview maybe but uh you know it it'd be interesting to see like what would be a red flag if some AI engineer would not know uh I don't know what you know I don't know where we would stoop to to call something required knowledge you know or you're not part of the the cool kids club but uh there increasingly is something like that right like not knowing what context is is a black mark in my my opinion right yeah that would I think it I think it does connect back to what we were saying before of this genuine curiosity about well maybe it's maybe it's actually that combined with something else which is really important which is a self-starting bias to towards action kind of a mindset which again everybody needs exactly yeah everyone needs that so if you put those two together or if I'm truly curious about this and I'm gonna kind of figure out how to make things happen then you end up with people reading reading lists reading papers doing side projects this kind of this kind of thing so it isn't something that we explicitly includ we don't have an ml focused interview for the a engineer role at all actually it doesn't really seem helpful the skills which we are checking for as I mentioned before this um of fault first mindset and uh uh conventional software engineering kind of thing it's it's the 0.1 and three on the list that that we talked about in terms of checking for ML curiosity and there how familiar they are with these Concepts that's more through talking interviews and culture fit types of things we want for them to have a take on what Alysa is doing certainly as as they progress through the interview process they don't need to be uh completely up to date on everything we've ever done on Day Zero although you know that's always nice when it when it happens but for them to really engage with it ask ask interesting questions and be kind of bought into our view on how we want ml to to proceed I think that is really important and that would reveal that they have this kind of this this interest this this ml curiosity there's a second aspect to that I don't know if now's the right time to talk about it which is I do think that an ml first approach to building software is something of a different mindset uh I could I could describe that a bit now if that if that seems good but yeah yeah um up to you so yeah I think when I joined elicit this was the biggest adjustment that I had to make personally so as I said before I'd been effectively building conventional software stuff for 15 years or so something like this well for longer actually but professionally for like 15 years and had a lot of pattern matching built into my brain and kind of muscle memory for if you see this kind of a problem then you do that kind of a thing and I had to unlearn quite a lot of that when joining elicit because we truly are ml first and try to use ml uh to the fullest um and some of the things that that means is this relinquishing of control almost at some point you are calling into this fairly opaque blackbox thing and hoping it does the right thing and dealing with the stuff that it sends back to you and that's very different to if you're interacting with again a apis and databases that kind of thing you can't just keep on debugging at some point you hit this this uh obscure wall and I think the second the second part to this is the pattern I was used to is that the external parts of the app are where most of the messiness is not necessarily in in terms of code but in terms of degrees of freedom almost of the the user can and will do anything at any point and they'll put all sorts of wonky stuff inside of uh text inputs and they'll click buttons you didn't expect them to click and all this kind of thing but then by the time you're down into your SQL queries for example as long as you've done your input validation things are pretty well defined and that as we said before is is not really the case uh when you're working with with with language models there is this kind of intrinsic uncertainty um when you get down to the to the kernel down to the core even even beyond that all that stuff is somewhat defensive and and these are things to be wary of uh to some degree the the flip side of that the really kind of positive part of taking an ml first mindset when you're building applications is that you if you once you get comfortable taking your hands off the wheel at a certain point and uh relinquishing control letting go really kind of unexpected powerful things can happen if you lean on the if you lean on the capabilities of the model without trying to overly constrain and slice and dice problems to the point where you're not really ringing out the most capability from the model that you that you might so I was trying to think of examples of this earlier and one that came to mind was we were working really early when uh just after I joined all this it we were working on something where we wanted to generate text and include citations embedded within it so it have a claim and then a you know square brackets one in superscript something something like this and every fiber in my in my in my being was screaming that we should have some way of kind of forcing this to happen or structured output such that we could guarantee that this citation was always going to be present later on um you know that the kind of the indication of a footnote would actually match up with the footnote itself and um kind of went into this symbolic I need full control kind of kind of mindset and it was notable that Andreas was uh who's our CEO again has been on the podcast was uh was the opposite he was just kind of give it a couple of examples it'll probably be fine and then we can kind of figure out with a regular expression at the end I really did not sit well with me to be honest I like but it could say anything it could say it could literally say anything and I don't know about just using a Rex to sort of handle this this is important feature of the app but you know this is uh uh that's my first kind of starkest introduction to this ml first mindset I suppose which Andreas has been cultivating for much longer than me much longer than most yeah there might be some surprises of of stuff you get back from the model but you can also it's it's about finding the The Sweet Spot I suppose where you don't want to give a completely open-ended prompt to the model and expect it to do exactly the right thing you can ask it too much and it gets confused and starts repeating itself or goes around in Loops or just goes off in a random direction or or something like this but you can also over constrain the model and not really make the most of the of the capabilities and I think that is a mindset adjustment that most people who are coming into AI engineering AR fresh would need to make of uh yeah giving up control and expecting that there's going to be a little bit of kind of extra pain and defensive stuff on the tail end but the benefits that you get as a as a result are are really striking that it was a brilliant uh uh the ml first mindset I think is something that I struggle with as well because the errors when they do happen are are bad you know they they will they will hallucinate and uh your your systems um will not catch it sometimes if they if you don't have eno

Original Description

One of the top reasons we have hundreds of companies and thousands of AI Engineers joining the World’s Fair next week is, apart from discussing technology and being present for the big launches planned, to hire and be hired! Listeners loved our previous Elicit episode and were so glad to welcome 2 more members of Elicit back for a guest post (and bonus podcast) on how they think through hiring. Don’t miss their AI engineer job description, and template which you can use to create your own hiring plan! https://www.latent.space/p/hiring [00:00:00] Intros [00:05:25] Defining the Hiring Process [00:08:42] Defensive AI Engineering as a chaotic medium [00:10:26] Tech Choices for Defensive AI Engineering [00:14:04] How do you Interview for Defensive AI Engineering [00:19:25] Does Model Shadowing Work? [00:22:29] Is it too early to standardize Tech stacks? [00:32:02] Capabilities: Offensive AI Engineering [00:37:24] AI Engineering Required Knowledge [00:40:13] ML First Mindset [00:45:13] AI Engineers and Creativity [00:47:51] Inside of Me There Are Two Wolves [00:49:58] Sourcing AI Engineers [00:58:45] Parting Thoughts

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Latent Space · Latent Space · 32 of 60

← Previous Next →

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Ep 18: Petaflops to the People — with George Hotz of tinycorp

FlashAttention-2: Making Transformers 800% faster AND exact

FlashAttention-2: Making Transformers 800% faster AND exact

RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

RAG is a hack - with Jerry Liu of LlamaIndex

RAG is a hack - with Jerry Liu of LlamaIndex

The End of Finetuning — with Jeremy Howard of Fast.ai

The End of Finetuning — with Jeremy Howard of Fast.ai

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Four Wars of the AI Stack - Dec 2023 Recap

The Four Wars of the AI Stack - Dec 2023 Recap

The State of AI in production — with David Hsu of Retool

The State of AI in production — with David Hsu of Retool

Building an open AI company - with Ce and Vipul of Together AI

Building an open AI company - with Ce and Vipul of Together AI

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Making Transformers Sing - with Mikey Shulman of Suno

Making Transformers Sing - with Mikey Shulman of Suno

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

Why Google failed to make GPT-3 -- with David Luan of Adept

Why Google failed to make GPT-3 -- with David Luan of Adept

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Breaking down the OG GPT Paper by Alec Radford

Breaking down the OG GPT Paper by Alec Radford

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

LLM Asia Paper Club Survey Round

LLM Asia Paper Club Survey Round

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How AI is Eating Finance - with Mike Conover of Brightwave

How AI is Eating Finance - with Mike Conover of Brightwave

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

State of the Art: Training 70B LLMs on 10,000 H100 clusters

State of the Art: Training 70B LLMs on 10,000 H100 clusters

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

Synthetic data + tool use for LLM improvements 🦙

Synthetic data + tool use for LLM improvements 🦙

RLHF vs SFT to break out of local maxima 📈

RLHF vs SFT to break out of local maxima 📈

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Answer.ai & AI Magic with Jeremy Howard

Answer.ai & AI Magic with Jeremy Howard

Is finetuning GPT4o worth it?

Is finetuning GPT4o worth it?

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Building AGI with OpenAI's Structured Outputs API

Building AGI with OpenAI's Structured Outputs API

Q* for model distillation 🍓

Q* for model distillation 🍓

Finetuning LoRAs on BILLIONS of tokens 🤖

Finetuning LoRAs on BILLIONS of tokens 🤖

Cursor UX team is CRACKED 💻

Cursor UX team is CRACKED 💻

Choosing the BEST OpenAI model 🏆

Choosing the BEST OpenAI model 🏆

How will OpenAI voice mode change API design?

How will OpenAI voice mode change API design?

STEALING OpenAI models data 🥷

STEALING OpenAI models data 🥷

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

Prompt Engineer is NOT a job 📝

Prompt Engineer is NOT a job 📝

Prompt Mining LLMs for better prompts ⛏️

Prompt Mining LLMs for better prompts ⛏️

The six pillars of few-shot prompting 🔧

The six pillars of few-shot prompting 🔧

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

Can you separate intelligence and knowledge?

Can you separate intelligence and knowledge?

The video discusses the skills and mindset required for AI engineers, including curiosity, enthusiasm, and a fault-first mindset, and covers topics such as defensive coding, system design, and product perspective, with a focus on the use of tools such as Python, TypeScript, and Kubernetes, and the importance of evaluation frameworks, and the trade-offs between accuracy, cost, and performance, and the need for a self-starting bias towards action and genuine curiosity, and the use of regular expre

Key Takeaways

Build language models
Apply language models to important problems
Understand the basics of language models
Craft effective prompts
Understand the importance of prompt engineering
Apply prompt engineering to real-world problems
Build and deploy language models
Understand the engineering aspects of language models
Apply language models to real-world problems
Use tools such as Python, TypeScript, and Kubernetes

💡 The video highlights the importance of a fault-first mindset, defensive coding, and system design in AI engineering, and the need for a self-starting bias towards action and genuine curiosity, and the use of regular expressions to handle citation generation, and the importance of finding the sweet s

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Critical thinking in the AI Era

Develop critical thinking skills to navigate the AI era effectively and make informed decisions

Medium · Data Science

Anthropic Just Passed OpenAI Among Business Users. Here’s What That Means for Your Stack.

Anthropic surpasses OpenAI in business user adoption, impacting the AI stack for enterprises

Introducing beLithe: AI Courses Built for Real People, Not Engineers

Learn about beLithe, an AI course platform designed for non-technical individuals, and its mission to make AI accessible to everyone

AI: Energy Taker or Energy Maker

Learn how rising data center energy demands can catalyze a clean energy transition and why it matters for sustainable AI development

Channels Television