Building AI For All: Amjad Masad & Michele Catasta

AI Engineer · Beginner ·🚀 Entrepreneurship & Startups ·2y ago

Key Takeaways

Amjad Masad and Michele Catasta discuss the future of software engineering in the age of AI, introducing Replit's AI-enhanced coding and Model Farm, and showcasing Rapid Code B 1.5, a 3B model trained on 1 trillion tokens of code.

Full Transcript

[Music] excited to be here I agree with uh swix and Ben that it feels like feels like a moment it feels like a historical moment here um my name is amjad I'm the co-founder of repet where we aspire to be the fastest way to get from an idea to a deployed software that you can scale so I'm going to take you back a little bit not like swix to the 6 600 AD but perhaps to the start of computing all right so um very early computers the eniac was the first your in complete uh you know programmable uh vum and machine computer the way you programmed it is like you literally punched cards um not physically but you had a machine that sort of punched these cards these are sort of binary code for the machine to interpret it was really hard there wasn't really a software industry because this was really difficult it automated some tasks that human computers did at the time but it wasn't this it didn't create the software industry yet but then we moved to text from from Punch Cards and um we had First Assembly and then we had compilers and higher level languages such as C and then someone invented JavaScript and it's all been downhill since then uh but tax editors were really or like tax based programming was a minimum a 10x Improvement if not 100 a Improvement in programming so we've had these moments where we've had orders of magnitude improvements in programming before and then you know the IDE became a thing because you know we had large scale software this is a screenshot from like 2017 or 18 when we added LSP uh to every programming environment on repet so anyone with an account can get intelligence and we're really proud about that at the time we're burning a lot of CPU doing sort of inference and you know if you've run typescript server that's like a lot of ram but we were really proud that we're giving everyone in the world tools to create professional grade software about 3 four years ago we started kind of thinking about how AI could change software it started much sooner than that but with the with gpt2 you know you could sort of kind of you know give it some code and kind of completes part of it you're like okay this this thing is actually happening and we we better be part of it and so we started building and we built this uh product called uh Ghost Rider uh which does autocomplete chat and all sorts of things inside the IDE and in just those two years I mean the the pace of progress across the industry the tools basically AI you know was deployed um and a lot of different Engineers were using it the AI enhanced engineer as swix kind of called it everyone is sort of using these tools and so uh we have a world now where a lot of people are gaining huge amount of productivity Improvement I don't think we're at a mold magnitude Improvement yet we're probably in the 50 80 perhaps 100% Improvement for some people but we're still at the start of this and we think that's going to going to be 10x 100x perhaps a th thousand X over the next decade the problem however repas Mission has always been about access our mission is to uh empower the next billion developers and so we really didn't want to create this world where some people have access to Ghost Rider and other people don't have access to it and we started thinking about okay what is it if if you really take into heart everything that they AI engineer conferences about that we're at a moment where software is changing where AI is going to be part of the software stack then you have to really step back a little bit and try to rethink how programming changes so our view is these programming add-ons such as co-pilot and coding Ghost Rider and all these things we're giving them cute names we think that's not the way forward we think that AI needs to be really infused in every programming interaction that you have and it needs to be part of the default experience of rep and I'm sure other products in the future that's why we're announcing today that we're giving AI for our millions of users that are coding on repet and so we think this is going to be the the biggest deployment of um AI enhanced uh coding in the world uh we're going to be burning as much GPU as we're burning CPU so pray for us uh we have people all over the world coding on all sorts of devices we have people coding on Android phones uh and and they're all going to get AI now so they're all going to be AI enhanced Engineers but you know as T showed is it's not just about AI enhanced engineering there's also product so AI being part of the software creation stack makes sense but AI part of the call stack is also where a lot of value is created so uh um uh so that's why we're also we have this new product called Model farm and model Farm basically gives you access to um uh models right into your IDE so all it takes is three lines of code to start doing inference we launched with Google Cloud uh llms but we're adding uh llama uh pretty soon we're adding uh stable diffusion and if you if you're an LM provider and want to work with us and provide this on our platform would love to talk to you but basically um everyone will get uh there's some free tier here everyone will get free access at least until the end of the year uh to model form so you can start doing inference and start building AI based products so uh next up I'm going to bring up uh my colleague the head of aiet Mel kasta to talk about how we train our own AI models and we have uh one more announcement for you coming up [Applause] [Music] [Music] do you that clicker F thank you all right hi everyone so today I'm going to be talking about how we're training llm for coded replit and I will explain why this weird title uh if you been around Twitter I think a bit more than a month ago you you must have read this study from semi analysis and their point was it's meaningless to work on small models training on us you know a limited amount of gpus and that came as a shock to us because we had a very good success story back in May where we started to train our models from scratch and then you know ham Jad and I and the team started to think are we really wasting our time here um I'm going to try to convince you it's actually is not the case so our code completion feature or repet is powered by our own Boke large language model we training open source code both published on GitHub and also developed by the rapit user base it's a very low latency feature so we try to find a different with spot compared to what you might be used with other plugins we try to keep our P95 latency below 250 milliseconds such as the developer experience is almost instantaneous you don't even have to think about it and the code is going to be completed for you at the model size that we were using we have been St of the art across the past few months and let's do a show hands who has heard about our V1 model back in May all right that feels good for a second I feel like an AI star uh jokes aside so we released rapid code V1 3B uh back in may we got a lot of adoption a lot of love and also a lot of contribution and that's one of the key reasons why we decided to give it back uh rapid history has been built on the on the shoulders of giants of all the people contributing to the open source space so we thought we should do exactly the same here we should give back our model and today I'm going to be announcing rapid code B 1.5 3B so the evolution of the model that we we released back in May let's go in detail as amjad was saying so the next 10 minutes we're going to do a technical Deep dive and I'm going to tell you how we built it and why it's so powerful so first of all we follow a slightly different recipe compared to the last time if you recall back in May our Wei W was a llama style uh code model which means we follow a lot of the best recipes that meta pioneered uh now we went you know one level up and we are training up to 300 tokens per parameter so if you have been following a bit history of llms even in you know two years ago most of the models were under Trin pardon me for the for the word it's not exactly you know technically speaking is not correct but the truth is you know uh mid 2022 the chinchilla paper from Deep Mind came out and it was like a bit warning for the old field basically what the paper tells us is that we are under training of our models we should give them way more high quality data and in exchange we could train smaller models so in a sense we're amortizing training time for inference time spending more compute to train a smaller more powerful model and then at inference time the latency will be lower and that's the key inside that we're going to be carrying along you know this keynote today now conver differently from the V1 this time we also double the amount of high quality data so we train it up to 1 trillion tokens of code it's the data mixture is roughly 200 billion tokens across five EPO plus a linear cool down at the end that really allows us to squeeze the best possible performance for the model and Rapid code B 1.5 this time supports 30 programming languages and we also added a mixture coming from Stock Exchange posts that are oriented towards developers so questions about coding questions about software engineering and so forth so this is the basis of our data now let's go and take a look inside at the data set that we used so we started from the stack which is an initiative led by big code it's a group you know 100 the hugin face umbrella uh very uh grateful about the work that these people have been doing basically they have built a big pipeline getting data from GitHub selecting top repositories is cleaning up part of the data and then especially leaving only code that is licensed under permissive licenses such as MIT uh BSD Apachi 2 and so forth out of this mixture we selected 30 top languages and then really the key secret ingredient here is how much time we spent on working on the data you must have been hearing this again and again and every time you go to an llm talk there is a go stage saying you should pay attention about the data quality uh here to tell you exactly the same once again that's probably the most important thing that you could be spending your time on especially because the model I'm talking about today is trained from scratch so this is not a fine-tuning all the models that we release have been trained from the very first token prepared by us so it's extremely important to have high uh data quality so we we took inspiration from the initial quality pipelines built by codex by the pound paper and then we applied way more tics there so we're filtering for code that is being autogenerated minified nonp Parable basically all the code that you wouldn't want your model to recommend back to you because it's not something that you will be writing yourself we also remove toxic content and all this pipeline have be built on spark so I'm trying to encourage you to also think of working on your own models because pretty much a lot of the base components are out there available open source so you you could really build the whole pipeline to train and serve an llm with a lot of Open Source components and as we was saying you have seen this CRA acceleration in the last 9 months if you wanted to do this in 2022 good luck with that uh it feels like we're a decade ahead compared to last year so it's pretty amazing and I didn't even exp in myself the speed to move this fast the other inside that we can of pioneer for the our V1 model and turns out to be very powerful also for this new one so when we released the V1 uh few weeks after coincidentally a very interesting paper has been published called scaling data constraint language models and I highly recommend it it's a it's a great read and it's probably one of the most interesting results in LM in my opinion and this intuition allowed us to basically train the model to completion rather than making trade-offs on the data quality it allowed us to select a small high quality subset of data and then repeat it several times the key funding of this paper is basically in this two plots I'm going to be sharing the slid so you know you can go and check the links and the idea is your loss curve after you repeat data four or five times is going to be comparable to training on a novel data set okay now not only this is very useful because it allow us to work only on high quality data it also allowed us to work with data that is exclusively released under permissive license therefore once again for our 1.5 model we're going to be releasing it open source and it's going to be released with a permiss commercially permissive license so you can use it there we go just shoot us an email when you use it because I'm very curious if you're having a good time so details about the model training we change a few things here and there slightly larger model it's a 3.3 B uh it's 4K context the old one was a 2K we train a new domain specific vocabulary 32k so a small one uh it it helps us to achieve even higher compression on the data uh if youve been reading again about llms you know that in a simplistic from a placing point of view they are data compressors losty data compressors so if your vocabulary allows you to get to pack even more data on fewer tokens then you're basically bringing more signal to the model while you're training and with this new vocabulary we're squeezing you know few percents extra and it's a better vocabulary for code compared to what star coder or code Lam are using today we trained 128 h180 gigs gpus which are as rare as you know gold at this point we have been on the Mosaic platform for a week and to our knowledge this is the first model officially announced to be train on h100s and release open source so we're very excited about it and we follow a list of llm best practices so of course we support flesh attention we have group CER attention which allow us to achieve better inference performance alabi position embedding latest optimizers in the game and that you know is really the reason why at the end you will see very exciting numbers that I don't want to spoil right away so let's start from the Bas model and then there is surprise coming um this is the evaluation P one on human eval for those of you who never heard about it human Val is a benchmark L back in 2021 AI if I Rec correctly the format is the following you have a natural language description of a task in English and then expect the model to generate a self a self-contained python snippet then then is going to be tested with a test harness so you generate code and then you execute it and you see if the if the values uh in output are exactly what you expect now an interesting evolution in the last few months in the field is we were not content on benchmarking exclusively on python so we're also doing that across several different programming languages and this is coming from the multilingual code eval Arness again built by big code and they also maintain a very interesting leaderboard so what they do is they take models across you know several companies and several open source contributors the Run devals themselves and then they compil this very interesting leaderboard so you will find us there I guess uh in a few days so uh from the left column we have star coder 3B which as of yesterday was the state-of-the-art model at the 3B uh parameter size across uh languages and today our we 1.5 is basically optimal across every single language that you see on the list but what gets me excited is not that much know the fact that we are know more powerful than star coder which has been released a few months months ago what got me hyped you know when we were training it is that we're very very close to col Lama 7D so as a reminder col Lama 7B is a Lama 2 model from meta the 7B version which has been trained on two trillion tokens of natural language and then it has an additional pre-training phrase of 500 billion tokens exclusively on code so it's a model that is twice the size it's 2.5x more data way more GPU compute so you see where I'm going you know we're getting big close how do we surpass Cod Lama here is the trick this is the the other model that we've been training in parallel and this is the rle tune version and it means the following we further pre-trained it on 200 billion tokens of code this time coming from our home developers so on ret when you create a public rep hole it's automatically published under IM license so we use this code to further pre-rain our model and we extract again 30 billion tokens of code same languages same data filtering pipeline to retain only the top quality ones we do this three EPO then we do also linear cool down and we are using basically the languages that are predominantly popular for replit users so not the same list as we saw before if you go on replit I would say 95% of the people are mostly writing Python and JavaScript these are the cool cool languages of today another key Insight is power cut off for this model is literally a few weeks ago so if there is a cool new library that everyone is you know writing software 4 in the last month our model is going to be capable of generating code that you know follows that Library so and we're going to keep basically these models up to date so that we can follow the trends and we can make our developers more happy here is the table that I love so we're back to this backto back comparison uh on the very left we have our base model uh we didn't add star coder here for a sake of space and also the mod the base model is already uh topping it on every other language so it didn't make sense now we have K in between and you can see why uh we are on pretty much every language substantially better so we have 36% on the open AI human eval Benchmark as a reminder um I when I was working on B coder for example that was a ped one result that we publish uh in early 2022 and that model was a 530 billion tokens so almost 200x larger than this model and it achieves exactly the same unal pass one performance same Coda Vinci 001 if you go back to the paper is getting exactly 36% so we were pretty much amazed when this happened now why do we go through all this struggle of training our models not only because it's cool you know uh we love to do this stuff but there is a there is a rational behind it so we really want to go as fast as possible with the most powerful small model we could train and the reason is all of our models are actually optimized for inference rather than for being awesome at benchmarks the fact that that happens gives us a lot of Pride and also makes us feel good when we do a Vibe check with the model and it performs as we expect or even better but it turns out that our key result is on a single model with no batching we're generating know above 200 tokens per second and we tune the architecture for Speed in every possible way we're training a small a vocabul as I was saying before we're using flesh attention with a traton kernel we're using the latest gqa so every single aspect is there to make sure that we can go as fast as we can and we optimize basically for the usage on the Tron inference server and acceleration framework such as tensor or tlm there really squeeze you know the last drop for andb the gpus now the other very interesting Insight is we work very hard also to make the model deployment go much faster so if you ever you know had the bad luck to work with kubernetes in your life you know you know thankfully can get you know to get your pod and you download all the dependencies and build it and yada yada you know so the very first time we brought this infrastructure H it took 18 minutes to go you know from clicking until the model was deployed now if you want to you know adapt to the load that the application is receiving 18 minutes you know looks like an eternity like if if there is a traffic Spike good luck with that um so one of our awesome Engineers Bradley you're going to find him at the boots later today brought this number from 18 minutes to just two minutes there is a laundry list of tricks that he used I'm not going to go through them just talk to Brad uh the cool Insight here is the fact now whenever we get more load we can react very quickly and that's how we serve a very large user base so the moment that I'm J announced AI for all literally 10 minutes ago we flip the switch and now code competion is in front of all our users and that's the way we made this happen now I've been asked several times guys why are you releasing your model open source you put so much effort maybe not that's that's an advantage for a company um it turns out that the moment we did it we got a lot of adoption and apart from a lot of log which you know always feels good and it feels good to know to with other people in AI that are using what we build uh we also started to get fine tune versions instruct tune versions of that and we have seen a lot of people using our small model deployed in local say with gml which goes super fast on opal silicon and they built their own custom privacy aware like GitHub calot alternative with rapid V1 so we expect the same to happen with uh B 1.5 in the next few days as we speak also if you go on a phase the model is available uh we're working on the readme come to to Wi mava the boot is The Mastermind behind it so it's going to tell you every single detail on how to make it run in production we're going to be here until tonight so more than happy to play with the model together now in the last minute I've left I want to give you like a teaser of what we're going to be doing in the next weeks so we aligned a few very exciting collaborations the first one is with glaive AI and it's a company that is building synthetic data sets and we're working on on an if version of our models so an instruct F tune version over 2,000 uh 210,000 uh coding instructions we're already seeing very exciting results we want to trle check them and you know follow our twitters and the moment that we're sure that this is performing as we expect is going to be out there and we going to be able to play with it second announcement we're also collaborating with more flabs I think Jesse is here today and he's going to run a session later explaining you exactly what uh this new format does I'm going to give you a teaser and then you know go to Jesse talking he's going to explain you all the details so we are designed parters on the fist format which is fing the syntax Tre you might have heard of feel in the middle this concept where you can take your file split it in a half and then basically if you're WR writing code in between you can tell the llm that the top of the file is your prefix the bottom of the file is your suffix and you give this context to the model so that it knows which part you should F now we found that this format is even more powerful is aware of the abstract syntax 3 underlying the source code we're seeing very uh promising results already and again this will be out you know just a matter of like a few days or weeks last thing we have collaborations with the perplexity AI guys you might have used their Labs so it's a place where the host models incredibly fast and the rapid B 1.5 will appear there and you can start to play with it and get a VI check by tonight thanks [Applause] everyone

Original Description

What is the future of Software Engineering in the age of AI? Amjad Masad, Founder & CEO of Replit, and Michele Catasta, head of AI at Replit, provide their take on this in this opening keynote presentation from the AI Engineer Summit 2023. Recorded live in San Francisco at the AI Engineer Summit 2023. See the full schedule of talks at https://ai.engineer/summit/schedule & join us at the AI Engineer World's Fair in 2024! Get your tickets today at https://ai.engineer/worlds-fair 00:00 Introduction - Amjad Masad 00:42 Historical perspective 02:22 How AI can change software 04:29 ⚠️📢 Announcing AI for all! 06:33 A tale of Code LLM & GPU-Poor - Michele Catasta 07:29 How Replit's code completion works 08:39 ⚠️📢 Announcing Replit's new model! 13:45 ⚠️📢 Announcing the new model is open source! 14:06 Model training 15:26 Model evaluation 17:31 Model data & training 18:45 Model evaluation 19:51 Model inference 22:10 Why open source? 23:10: Glaive AI Collaboration 23:50 Morph Labs Collaboration 24:42: Perplexity Collaboration
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Engineer · AI Engineer · 6 of 60

1 AI Engineer Summit 2023 — DAY 1 Livestream
AI Engineer Summit 2023 — DAY 1 Livestream
AI Engineer
2 AI Engineer Summit 2023 — DAY 2 Livestream
AI Engineer Summit 2023 — DAY 2 Livestream
AI Engineer
3 Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)
Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)
AI Engineer
4 Announcing the AI Engineer Network: Benjamin Dunphy
Announcing the AI Engineer Network: Benjamin Dunphy
AI Engineer
5 The 1,000x AI Engineer: Swyx
The 1,000x AI Engineer: Swyx
AI Engineer
Building AI For All: Amjad Masad & Michele Catasta
Building AI For All: Amjad Masad & Michele Catasta
AI Engineer
7 The Age of the Agent: Flo Crivello
The Age of the Agent: Flo Crivello
AI Engineer
8 See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman
See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman
AI Engineer
9 Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase
Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase
AI Engineer
10 Pydantic is all you need: Jason Liu
Pydantic is all you need: Jason Liu
AI Engineer
11 Building Blocks for LLM Systems & Products: Eugene Yan
Building Blocks for LLM Systems & Products: Eugene Yan
AI Engineer
12 The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer
The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer
AI Engineer
13 Climbing the Ladder of Abstraction: Amelia Wattenberger
Climbing the Ladder of Abstraction: Amelia Wattenberger
AI Engineer
14 Supabase Vector: The Postgres Vector database: Paul Copplestone
Supabase Vector: The Postgres Vector database: Paul Copplestone
AI Engineer
15 [Workshop] AI Engineering 101
[Workshop] AI Engineering 101
AI Engineer
16 The Hidden Life of Embeddings: Linus Lee
The Hidden Life of Embeddings: Linus Lee
AI Engineer
17 [Workshop] AI Engineering 201: Inference
[Workshop] AI Engineering 201: Inference
AI Engineer
18 The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex
The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex
AI Engineer
19 The AI Evolution: Mario Rodriguez, GitHub
The AI Evolution: Mario Rodriguez, GitHub
AI Engineer
20 Move Fast Break Nothing: Dedy Kredo
Move Fast Break Nothing: Dedy Kredo
AI Engineer
21 AI Engineering 201: The Rest of the Owl
AI Engineering 201: The Rest of the Owl
AI Engineer
22 Building Reactive AI Apps: Matt Welsh
Building Reactive AI Apps: Matt Welsh
AI Engineer
23 Pragmatic AI with TypeChat: Daniel Rosenwasser
Pragmatic AI with TypeChat: Daniel Rosenwasser
AI Engineer
24 Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan
Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan
AI Engineer
25 Retrieval Augmented Generation in the Wild: Anton Troynikov
Retrieval Augmented Generation in the Wild: Anton Troynikov
AI Engineer
26 Building Production-Ready RAG Applications: Jerry Liu
Building Production-Ready RAG Applications: Jerry Liu
AI Engineer
27 120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson
120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson
AI Engineer
28 The Weekend AI Engineer: Hassan El Mghari
The Weekend AI Engineer: Hassan El Mghari
AI Engineer
29 Harnessing the Power of LLMs Locally: Mithun Hunsur
Harnessing the Power of LLMs Locally: Mithun Hunsur
AI Engineer
30 Trust, but Verify: Shreya Rajpal
Trust, but Verify: Shreya Rajpal
AI Engineer
31 Open Questions for AI Engineering: Simon Willison
Open Questions for AI Engineering: Simon Willison
AI Engineer
32 Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD
Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD
AI Engineer
33 GPT Web App Generator - 10,000 apps created in a month: Matija Sosic
GPT Web App Generator - 10,000 apps created in a month: Matija Sosic
AI Engineer
34 Using AI to Build an Infinite Game: Jeff Schomay
Using AI to Build an Infinite Game: Jeff Schomay
AI Engineer
35 How to Become an AI Engineer from a Fullstack Background - Reid Mayo
How to Become an AI Engineer from a Fullstack Background - Reid Mayo
AI Engineer
36 The Code AI Maturity Model and What It Means For You: Ado Kukic
The Code AI Maturity Model and What It Means For You: Ado Kukic
AI Engineer
37 AI Engineer World’s Fair 2024 - Keynotes & Multimodality track
AI Engineer World’s Fair 2024 - Keynotes & Multimodality track
AI Engineer
38 From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet
From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet
AI Engineer
39 The Making of Devin by Cognition AI: Scott Wu
The Making of Devin by Cognition AI: Scott Wu
AI Engineer
40 The Future of Knowledge Assistants: Jerry Liu
The Future of Knowledge Assistants: Jerry Liu
AI Engineer
41 Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney
Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney
AI Engineer
42 Open Challenges for AI Engineering: Simon Willison
Open Challenges for AI Engineering: Simon Willison
AI Engineer
43 Lessons From A Year Building With LLMs
Lessons From A Year Building With LLMs
AI Engineer
44 From Software Developer to AI Engineer: Antje Barth
From Software Developer to AI Engineer: Antje Barth
AI Engineer
45 Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner
Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner
AI Engineer
46 Copilots Everywhere: Thomas Dohmke and Eugene Yan
Copilots Everywhere: Thomas Dohmke and Eugene Yan
AI Engineer
47 Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han
Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han
AI Engineer
48 Low Level Technicals of LLMs: Daniel Han
Low Level Technicals of LLMs: Daniel Han
AI Engineer
49 Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta
Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta
AI Engineer
50 How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou
How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou
AI Engineer
51 What's new from Anthropic and what's next: Alex Albert
What's new from Anthropic and what's next: Alex Albert
AI Engineer
52 Using agents to build an agent company: Joao Moura
Using agents to build an agent company: Joao Moura
AI Engineer
53 Decoding the Decoder LLM without de code: Ishan Anand
Decoding the Decoder LLM without de code: Ishan Anand
AI Engineer
54 Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner
Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner
AI Engineer
55 Building with Anthropic Claude: Prompt Workshop with Zack Witten
Building with Anthropic Claude: Prompt Workshop with Zack Witten
AI Engineer
56 Building Reliable Agentic Systems: Eno Reyes
Building Reliable Agentic Systems: Eno Reyes
AI Engineer
57 10x Development: LLMs For the working Programmer - Manuel Odendahl
10x Development: LLMs For the working Programmer - Manuel Odendahl
AI Engineer
58 Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner
Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner
AI Engineer
59 Hypermode Launch: Kevin Van Gundy
Hypermode Launch: Kevin Van Gundy
AI Engineer
60 Git push get an AI API: Ryan Fox-Tyler
Git push get an AI API: Ryan Fox-Tyler
AI Engineer

Amjad Masad and Michele Catasta introduce Replit's AI-enhanced coding and Model Farm, and discuss the future of software engineering in the age of AI. They showcase Rapid Code B 1.5, a 3B model trained on 1 trillion tokens of code, and highlight its capabilities in code completion and model deployment. The lesson covers the concepts of AI-enhanced coding, LLMs, and model deployment, and provides practical steps for building and deploying AI-enhanced coding tools.

Key Takeaways
  1. Build AI-enhanced coding tools using Replit's Model Farm
  2. Deploy LLMs for code completion using Rapid Code B 1.5
  3. Optimize model deployment using Tron inference server and Tensor
  4. Collaborate with other AI companies to fine-tune models
  5. Use syntax tree format to give context to the model for code completion
💡 The future of software engineering in the age of AI requires collaboration between humans and AI systems, and the development of AI-enhanced coding tools that can learn from large datasets and optimize model deployment.

Related AI Lessons

I’m building Gealo, and here’s where it actually stands
Learn how to critically evaluate startup claims and focus on actual progress, not just hype
Medium · Startup
Pilot Projects and Proof of Concepts — Why Informal Deals Become Legal Disputes
Learn how pilot projects and proof of concepts can lead to legal disputes and how to avoid them in startup enterprise deals
Medium · Startup
What Finding Our First Users Taught Me About Startups
Finding the first users for a startup is a crucial challenge that can teach valuable lessons about the product and market
Medium · Startup
Why the Best Companies Are Built with the Right People Around the Table
Building a successful company requires assembling the right team, learn how to identify and bring the best people on board
Medium · Startup

Chapters (15)

Introduction - Amjad Masad
0:42 Historical perspective
2:22 How AI can change software
4:29 ⚠️📢 Announcing AI for all!
6:33 A tale of Code LLM & GPU-Poor - Michele Catasta
7:29 How Replit's code completion works
8:39 ⚠️📢 Announcing Replit's new model!
13:45 ⚠️📢 Announcing the new model is open source!
14:06 Model training
15:26 Model evaluation
17:31 Model data & training
18:45 Model evaluation
19:51 Model inference
22:10 Why open source?
23:50 Morph Labs Collaboration
Up next
Watch this before applying for jobs as a developer.
Tech With Tim
Watch →