Create a Custom Language Model with Surge AI and Cohere

Cohere · Beginner ·🧠 Large Language Models ·3y ago

Skills: Fine-tuning LLMs95%LLM Foundations90%Prompt Craft80%Prompt Systems Engineering70%

Key Takeaways

This video demonstrates how to create a custom language model using Surge AI and Cohere, covering topics such as fine-tuning, toxicity classification, and generative AI. The video showcases the use of Surge AI for data labeling and Cohere for building custom language models.

Full Transcript

okay let's let's go ahead and get started um well thank you all so much uh for coming today my name is Ellie I'm on the product team at cohere um I'm here today with Scott uh Scott would you like to briefly introduce yourself yep hey folks I'm Scott I'm a product manager at surge really excited to have you all here and to dive into more about fine-tuning language models pass it back to Ellie awesome uh yeah during the call today we'd appreciate if you could mute yourselves until the Q a portion at the end of the workshop and then feel free to unmute and ask questions uh while Scott and I um are are um walking through our respective products today feel free to post questions in the chat uh throughout and we'll answer those at the end as well and at the end of the webinar today you'll all receive an email uh with a link to the data set that we'll be using for this example so with that let's go ahead and get started um yeah so I'm really excited to to speak with you all today about leveraging both surge and cohere to build a custom language model for your your project or your product so for those who maybe aren't as familiar with language models on the call today these are tools that can ultimately help you unlock new product or feature capabilities cut costs and ultimately be more customer-centric whether you're interested in better understanding the voice of your customer through a sentiment analysis tool summarizing content generating copywriting material the possibilities are really endless and and very exciting so cohere offers Baseline language models you can use without any additional training but we do offer the ability to create a custom solution so given a data set that reflects the type of behavior or the type of task you'd like to elicit from the model we can build a custom language model to excel at that task and optimize performance before you go to production but in order to build a custom language model you need high quality data which some developers might not have at the ready so during the calls today Scott will walk through how to kick off a project in the search platform to get a high quality data set and then I'll take the mic and walk through how to leverage that search data set to build a custom language model that you can then ultimately productionize anything you want to add Scott before we get started now let's jump in all right so let me share my screen here and get started so for those of you who just joined uh hey good to see you I'm Scott I am a product manager at search AI uh we appreciate you guys taking the time to join us today as we teach you a bit about fine-tuning language models for for any use case like Ellie mentioned we encourage you to ask questions along the way either in the chat or save them for the end and we'll try to get to as many of them as we can at the end of this session so I want to First kick this off by giving you a bit of context on surge Ai and what we do then I'm going to briefly explain why having high quality data is so important when it comes to training models and then finally we're going to dive into the fine-tuning process with a Hands-On demonstration of building a custom Training data set on the search platform and then I'll pass it back to Ellie and she'll show you how to actually use that data set to fine-tune a model on the cohere platform so for a little bit of context for those of you who aren't as familiar with surge AI we are a data labeling platform and human Workforce that builds custom high quality data sets for folks like you so you can ship better models faster so we provide the training data and then you can go train great models we work with a ton of awesome companies that are deploying Ai and ml in really novel ways so folks like Microsoft and Google and Amazon research Labs at Stanford and NYU and large language model companies like anthropic and of course our good friends here at cohere so one of our core beliefs at Surge and and I know that Ellie and the folks that cohere know this well and would agree is that the quality of your training data is a huge factor in determining how well your model works in the real world so in some ways you can actually think of your model as a child right and you can only teach this child through example so if you give it a bunch of bad examples to learn from you shouldn't really be surprised when it grows up and starts misbehaving in the process of putting together a training data set is really just the process of selecting all the examples you want your child slash model to learn from so it's critical that each of these examples is high quality and accurate otherwise you are actually just simply teaching your model the wrong Behavior so we're going to jump into the fine tuning process here and and make this tangible so for today's webinar let's say that we want to build a toxicity classifier for Twitter so we want this model to look at tweets and determine if they are either toxic or not toxic it's as simple as that it's basically a binary decision with a confidence score associated with that decision now if we can build a model that can do this accurately we can first of all drastically speed up our content moderation decisions and and workflow but more importantly we can actually make our platform a more enjoyable and civil place for all which is obviously the end goal when building products like this so the first thing that we need to do in this fine tuning process is we need to build a training data set that contains examples of toxic tweets and examples of non-toxic tweets and we're going to use that data set to teach the model what we think of as toxic and what we think of as not toxic so that it can go out in the real world and make accurate judgments on text that it's never seen before so we need to make sure that this training data set is of adequate size and diversity and quality to show the model all different kinds of toxicity and all different kinds of non-toxicity so I'm going to peel back the curtain here a bit and show you the work that we would do behind the scenes to build a data set like this for customers like you so let's go to The Surge platform and I can see a list here of recent data labeling projects that I've run and we're going to start with a project to gather toxic tweets so I can preview the task here and now I'm I'm actually seeing what the sergers these These are the humans on our platform are going to see when they work on the task so it's it's actually quite simple right I'm just asking folks to go to Twitter find some toxic tweets I'm giving them a brief definition of how I'm thinking about toxicity and they're going to go out and find tweets that fit this criteria they're going to submit the URL and the raw text of the tweet and we're going to start building our data set up so I'm doing two key things here to make this data collection project successful first I'm providing guidance to the sergers again those are the folks on our platform who are actually going to do this work of going and finding these tweets I'm providing guidance to them on how they should think about toxicity for this project so it's actually quite simple I'm just using a simple sentence here to say hey go find toxicity that you think people would generally agree is toxic now that's pretty open-ended but for other customers like you may have 25 pages of guidelines with strict criteria for your specific platform on how you want to Define toxicity so maybe you have strict criteria for what constitutes sexism on your platform or what constitutes racism on your platform that's okay too either is fine as long as you have a clear idea of what you want the model to behave how you want the model to behave then we can work with you to craft the guidance so the sergers can find the correct types of content to build this data set so that's the first thing I'm doing to set myself up for Success the second thing I'm doing is I'm going to assign a particular team of sergers to this project so that only they can work on it so on the search platform we have teams of sergers with all different kinds of Specialties so some are coders who work on code generation AI projects we also have a team of creative writers who work on generative AI and in this case we have a team of sergers who are deeply familiar with the nuances of online toxicity so this is my toxicity team here you can see I have a couple hundred people in it and all of them are vetted and very familiar with going out online and either finding or evaluating this type of toxicity so in this case it means that they're well educated they're going to be fluent English speakers they're going to understand the political jargon and other coded language used on Twitter and other online platforms to sometimes promote harmful content so you could think of phrases like let's go Brandon or fjb or acronyms like that that you actually sort of need to be plugged into the cultural context here to understand what they mean um they mean more than just a collection of letters or um or a phrase about you know someone named Brandon so now that my project is ready to go I can simply launch it and it gets automatically sent out to that team of sergers they get a notification and they start working on it going out to Twitter finding these tweets and submitting them to us and at the same time I'm also going to create a project to gather non-toxic tweets right because I need both classes of data here so I have a project for toxic tweets I'm going to do another project for non-toxic tweets it looks very similar I'll show you in a minute that I'm actually going to collect three different types of non-toxic tweets to make my model perform even better but I'll show you with with examples a bit later but I can go ahead and launch this project too and when I do I have both projects running and I just need to wait a little bit of time the data to come back it's going to get run through some automated quality controls we have on our end to make sure the quality is very high and just a few moments later voila I have a finished data set of a thousand tweets that we can dive into and check out the results so let's do that by the way this is the data set that we are going to send out to all of you after this webinar so um you'll have access to it you can download it do whatever you want with it add to it modify as you wish I'll show a few examples here but you'll get a link after the webinar to this data set and you can you can do as you please with it um so you can see here that I have a thousand tweets this is all the raw data in a table format I can scroll through and make sure it all looks good I can also see the breakdown up at the top that I've gathered you know 50 of my data set is toxic tweets and the remaining 50 uh is covered by these three categories of non-toxicity and as I mentioned before I didn't just gather generic non-toxic tweets so I actually specifically gathered non-toxic tweets that use profanity and non-toxic tweets that use reclaimed slurs now this is a crucial step to ensure that our model doesn't start thinking that any instance of profanity or any instance of a slur means that a tweet or the text is toxic and this is how we can quickly actually make our models behave in a really nicely nuanced way through the fine-tuning process and let's just look at a few examples uh to drive this point home so here's our first tweet here the text of the Tweet excuse my French but the text of the Tweet is holy I thought doggo was just terrified right now some toxicity classifiers on the market today would actually get fooled by this because they would see this word and they'd think oh it must be toxic because that's a bad word but to all of us on this call it's really obvious that this is not toxic right this is actually um a video of a dog playing goalie playing hockey goalie um and doing an amazing job so I highly recommend that you watch this after but we can all agree we can all agree that this is not a toxic text that we would want off our platform so similarly we have our second tweet here it says update I'm still that right so obviously the word can be used as a slur in this case obviously it's not being used as a slur it's actually being used as a statement of confidence and positivity and we want to make sure that this again is not flagged as toxicity so we're specifically calling out to the model like hey when you see text like this make sure you understand that it's not toxic and so in that way we're building a data set that's powerful enough to cover these different nuances and use cases so that our model can actually go out into the real world and make accurate judgments on the way that people actually speak so I can continue to scroll through and check out the different examples of tweets we have in here I'll let you guys do that after the webinar um uh but for the time being I've looked at my data I'm confident I have the classes that I need um it's diverse it's large enough and it's ready to start the next phase of the fine tuning process so now that I'm ready to move on I can simply download my results as a CSV or Json file up here in the top right and with that I would take that file to the cohere platform and start fine-tuning a model on this data set so I'll pass it back to Ellie now who's going to walk through the rest of this process of taking the data set using the cohere platform to actually fine-tune their base model on this and having a model ready to deploy so Ellie uh back to you sounds great I'll go ahead and share my screen can you see my screen all right yeah we got you awesome so as Scott uh called out a custom language models are really helpful tools for moving the needle on performance for tasks that are incredibly Niche so for example with the toxicity classification task that we have today there isn't one standard definition for what defines toxic content many platforms uh Define toxicity in unique ways depending on their user base or their product so for example a gaming platform an edtech Community for for young kids and a social media platform I'll likely have very distinct definitions of but what is toxic content and this is where a custom model can be can be really helpful so let's get started uh a skull as Scott um called out what I've done behind the scenes here is I've downloaded that surge data set um as a CSV from their website and then I've gone ahead and mopped all of the labels into two distinct categories toxic and not toxic just for purposes of the example today so instead of having the additional labels like non-toxic with reclaimed speech non-toxic with profanity I just have have two buckets and I've turned this into a binary classification problem so to create a custom language model with cohere it's actually really simple all I need to do is specify the type of task for this particular example it's a classification task and then I'll go ahead and upload that training set uh you optionally can specify a validation set as well which can be used to contextualize the performance of your fine tune across different uh different performance benchmarks or I can just upload the default file like I'm going to do today and what cohere will do behind the scenes is extract a portion of those samples to be used for validation I'll then go ahead and review my data make sure that everything's formatted correctly um I'll go ahead and name um my model call it toxicity classifier and then I'll go ahead and start training um so once we've kicked this off it'll take about 10 minutes uh to train and while it's training we'll start to see these performance results trickle in here fortunately we don't have to wait right before this webinar I kicked off a fine tune so I could show you the performance improvements in real time so let me go ahead and switch tabs here so to get an idea of how the custom language model performs against a baseline non-co non-customized offering we can project the embeddings of the samples into a two-dimensional Vector plot or vector space in our playground so at a high level you can think of embeddings as a numerical vectorized representation of the meaning of a word or phrase and the distance between two points and the plot represent how semantically similar they are so the closer those two dots are in the plot the more similar and vice versa so to test our Baseline model here I've added a few toxic and non-toxic samples which weren't found in our training data set a good model for this task will have a very clear separation between samples that are toxic and samples that are not toxic as they are semantically very different and so I've projected the embeddings for our Baseline large model you can see that they're isn't a clear distinction between the samples that I would Define as toxic and the samples I would Define as not toxic but when we project those embeddings for the fine-tune model we can see two distinct clusters which is actually really exciting so we can see that non-toxic samples are found in this grouping and toxic samples are found in the grouping to the right of the screen so similar data points are now pushed even closer together and further apart from the rest which indicates this model has adapted really well to the additional data that it received during training during training and is uh more likely to perform even better for this particular classification task now we can actually go ahead and test the model in real time in the playground with some examples so here I've added a few samples of text that contain profanity but don't align with my working definition of what toxic content looks like so these are all non-toxic samples that contain uh some profanity we can go ahead and specify that fine-tuned model here in the playground and classify the samples and we see that all of these samples have the correct label of not toxic with an accompanying confidence score so all of these classifications are correct from here as a developer what I would likely do is evaluate this model with a few more examples make sure I'm really happy with the result and then I can consider productionizing um this classifier let's say building a stamp building a system that automatically classifies a random sample of user content and then when a sample is deemed toxic I could then trigger a job to remove that sample from the platform great so I will kick it back over to Scott to highlight another example of how you can how you can leverage search and Skip surgeon cohere to build a custom language model awesome thanks Ellie um that was that was great I love seeing those embeddings too it's always so satisfying um so we want to do one more example in the few minutes we have left before we see if there are any questions from you folks uh this was a toxicity classifier the first model we fine-tuned now we want to fine-tune a uh generative AI model so in this case we're going to use sort of one of the the canonical examples here where what our model is going to take as an input is a essay for a subject or a keyword for an essay and then a tone of voice and then it's going to Output a paragraph or a couple paragraphs of text based on the combination of that tone of voice that we specified and the keywords so we're going to do this sort of in an expedited fashion um but we built another data set here we built another data set here of 300 examples of Snippets of text associated with um with a keyword so in this case I have a paragraph on why you should eat breakfast before school and I'm asking for a persuasive tone and then I'm having a serger either right or go out and find um paragraphs that meet that meet that criteria and I'm going to do 100 of these for each tone so persuasive friendly and professional are my three tones and I'm going to build this data set in a similar fashion to how I built the the Twitter data set and by having folks go out and find this text or or create this text from hand and Associate it with uh the keyword and the tone of voice and in that same way we're going to use this training data set to show the cohere Baseline model like hey this is how I want you to behave given this keyword and a tone of voice here's a bunch of examples of what I want you to do and now when I give you a new keyword a new or or yeah a new keyword and one of the tones of voice I've specified you can go create brand new text so you can imagine this has tons of use cases for for speeding up business workflows or creating interesting uh fiction content runs a whole a whole gamut once again I kicked off a labeling job to get this done got these results back quickly and now I can download the results and pop back over to cohere to fine tune this model so gonna go back to Ellie and she's going to show just the process of not only fine-tuning but what it actually looks like in production uh to ship this live so Ellie back to you awesome I'll go ahead and share my screen again so for this next job we'll go through the same workflow we did the first time around but this time we'll specify a generative task we'll go ahead and upload that data set um that we downloaded from the search platform and then once that model has finished training in the cohere platform you'll receive a specific model ID and what you can then go ahead and do click to copy that model ID you can experiment with the model directly in the playground as we did with that last toxicity classification example or you can go right ahead and start to to plug this into your specific project or application so you can go ahead and call the generate endpoint and where you specify the model instead of pointing to one of the default out of the box cohere models you'll go ahead and paste the ID of the the custom model that you trained and so for this example today I built a really simple low Fidelity uh web app that calls the generate endpoint specific to that model um that we just trained um so for this web app I nudge users to select um the type of of tone um that they'd like to produce content um for and then they can specify a writing tasks like um let's say um you know I want to to generate content about moving from San Francisco to Toronto uh I'll click submit and then behind the scenes I'll go ahead and and recall um that specific generative model that we trained um awesome so in the interest of time here we can wrap things up and shift over to the Q a portion of the webinar um so folks have any questions feel free to unmute yourselves you can also ask questions directly in the chat I'll ask a question in uh in the meantime to kick us off here um Ellie could you explain a little bit about the other parameters that are available to users when uh fine-tuning a model so like temperature and tokens and things like that I know that's a factor in like what the model will end up outputting um I think yeah learning a little bit more about how to think about that when fine-tuning model could be interesting yeah definitely happy to happy to speak to that so I will say that you know once you have this custom model what's equally as important is making sure that you have the right prompt and the right sampling parameters to make sure that you're getting a high quality classific classification or generation consistently um so when it comes to those hyper parameters for a generative task like writing ad copy or um paraphrasing text writing a summary um I always nudge users to ensure that they explore a bit with the temperature setting you can think of temperature as the degree of Randomness or degree of creativity of the model so when you have an extraction task you might want to lower that degree of creativity or Randomness but when you're looking to you know generate a blog post you'll want to increase that temperature quite a bit so once you have your fine tune deployed I I would recommend that that folks check out the the playground environment and cohere it makes it really easy to experiment with these different settings adjust your prompt and see what will work best with with the custom model that you've created awesome sweet um and I see we have a couple questions here so let me let me take this first one one and a half questions um and then I'll pass it back to you Ellie so first uh someone's asking uh how you can become a data labeler for surge um you can reach out to us uh our email is on the data set you'll get after this um after this webinar uh definitely appreciate the interest um one of our key one of the key things that we do on our platform is make sure that all the folks working on it and being labeling data for us are super high quality well educated very fluent in the languages they're working in um so always always appreciate and flatter by the interest and feel free to reach out to us um for the next question here uh it's how much training data should you have for a fine tune um so that's a good question and I think it depends a little bit on what you're trying to accomplish um I do think that we've seen with cohere models that you actually don't need a ton of data to to get started with a fine tune so sometimes even 10 to 100 examples uh can be a a really great starting place if you want something more robust that's why we were in the like a thousand range it allows you to get more diversity so if you think about toxicity on Twitter for example one of the things that you want to make sure you have coverage over is all different forms of toxicity on Twitter so again there could be stuff that's sexist that could be stuff that's racist there's tons of categories of awful content online and you want to make sure not only you have coverage across all those categories but within the categories you sort of have coverage of the subcategories that can be hard to do sometimes with just 10 or 100 examples so as you increase the size of your training data set your model will just become better and better around the edge cases and nuances which can really have an outsized impact on your users right any use user that gets flagged for saying something that the system thinks is toxic but they clearly think is not is not a good user experience so it is good to go like a thousand and higher uh for a fine tune as well just to get as much like accuracy and coverage as possible but also not a barrier to entry you can get started with just a bit of data and see where that gets you um I will pass it to Ellie in terms of this next question on uh prompting versus fine tuning and how you think about the gains from each of those yeah so one of the benefits that people often don't think of uh when it comes to fine-tuning versus having um a very robust prompt with a baseline model is actually speed UPS so because the fine-tuned model is already conditioned to generate content that or produce a classification that aligns with your quality expectations you'll generally see that you don't need as lengthy of a prompt in terms of descriptions and potentially a few examples and so because of this your fine-tuned model will likely require a more brief prompt which will able which will Empower you to get a quicker quicker response um from from the end point which can be really helpful if you're say building a conversational AI um tool and you you need a response uh in real time Ellie do you also want to take this question of which base model you used for the toxicity classifier yes so I used our medium model so today on the cohere platform our medium embedding model uh powers are classification representation fine tunes and then our medium generation model Powers those generative fine tunes and in the future you'll see that the platform will start to support uh fine-tuning of our our larger models awesome and I see that Nick has raised their hands Nick do you want to unmute and ask a question here um can you hear me yeah yeah I've got you hey how's it going um good uh thank you for the presentation this was really really amazing and very relevant to a project I'm working on right now you mentioned um class uh conversational if you're building a conversational AI tool I'm wondering how you might do a fine tune like how you might be able to like it seems like each row of the data is only like one one response to one question or whatever or one response to one prompt um how would you be able to have a AI tool recognize the context and Contin like remember earlier parts of like how can you lead like have like a train of conversation that like remembers the uh prompt like the inputs from the past yeah that's a that's a great question um so that ultimately depends on on the data that you're using for fine tuning and one thing that I would recommend is that uh you upload a data set that's reflective of multiple conversation turns so say I have um one statement from the user and then one statement from the conversational agent to accompany that and then I'll build off of that response in a secondary example I'll have that like initial initial request initial response um and then another command from the human and you can start to build these samples where like the context of the dialogue is getting longer and longer um Scott do you have any recommendations about how to to kick off a data labeling job specific to a conversational task with search yeah so search does a lot of conversational AI labeling as well so we have actually tools in our um on our platform that allow you to call like an endpoint of a maybe a chatbot assistant you have um and then label data based on that conversation so you could have sergers on our platform have a conversation with your beta bot or whatever state it's in and then label things like you know did the bot say something toxic or did it say something helpful to me did it lie to me um and answer a question per turn so you can get data on how your AI is how your conversational AI is behaving how it's performing and how you can improve it so yeah we have a range of of conversational AI tools that we'd be happy to to show another time or you can reach out to us too to to learn more about them cool and quick follow-up what is there like a cost barrier to to use surge like like what does the pricing look like uh so the pricing is usually like uh bespoke on a project basis my my best recommendation for learning more about that would be to reach out to the email address uh in the data set that you'll be sent after the webinar it's it's team at surgehq.ai actually put it um in the chat for you for everyone one second and we can get you connected with with someone that can advise kind of in more detail depending on on your specific use case thank you very much I also see a question uh in the chat about cohere pricing so creating a fine-tuned custom model is entirely free we also have a free developer tier so as long as you're not serving these model outputs in a production environment you don't need to to pay and then once you do enter that production State you'll you'll pay for the number of of her quests um that you're calling to each endpoint awesome so we have a few more minutes here if there are other questions happy to take them I think one of the things also we want to highlight is we gave you two examples of use cases that you could use to fine-tune a model with but the options are really uh you know human creativity is sort of the barrier here so anything you want to accomplish uh or a wide range of things you want to accomplish is possible with fine-tuning large language models they're really capable capable of a wide variety of tasks especially when you sort of Point them in the right direction with this fine-tuning process and I know cohere spends a lot of time making sure those base models are as strong as possible that's a great starting place but then always like going that extra step of fine tuning for your specific use case think about like the nuances of of your company um your users and what they're expecting from your products right you can fine tune it for that very specific use case so that when users go on your site and interact with some model that you're deploying um it feels it feels natural and intuitive to them and not like um not like you're trying to stick something that's not quite relevant into the mix so fine tuning is a really great way to make the customers feel like more comfortable um with your product on your platform kind of keep them uh engaged I see a couple other questions here so let's see if we can get to a couple more before we are out of time here yeah so as for the question relating to uh conversational AI data sets that exist um I found that there are some some helpful data sets that are open source and exist publicly but I what I would have um or what I've found is um has been really helpful is leveraging surge to actually clean some of those those data sets um or make them look a bit more uh specific um to the task that I have so yeah I just wanted to call out that you don't necessarily only need to leverage surge when you're taking a data set from zero to one you can also leverage the platform to improve the quality of a data set that you come across online and let me also drop this link for you all we have a search has a bunch of free data sets that you are welcome to use in any way you see fits I'm just going to link that to you all here you can check these out after this webinar there's a whole bunch of different data sets on toxicity and sentiment analysis and search evaluation financial transaction data it's all free it's on our it's on our platform it's in that same results viewer that we showed you earlier so feel free to browse through those you could use that could be a great starting point for fine-tuning a cohere model if that sort of matches up roughly with one of your use cases or you could take that data set and uh add on it uh embellish it um make it more specific to you so that's also a good resource for you all to know about is uh the free free data sets from surge so encourage you to to check those out um Ellie do you want to grab maybe one more question here if yeah yes and then we'll wrap up sounds great I do see a question about leveraging different types of models in a singular application this is definitely something that you can do I've spoken to quite a few users who have chained our endpoints together so you could for example um uh run a classification task with our classify endpoint if that classification meets X criteria is X label then you could trigger a call to our generate endpoint to accomplish some other type of of task so that's that's definitely something that the cohere API supports awesome cool well I think we're right at time so just to wrap this up uh we want to thank you a ton for joining us today giving us 45 minutes I hope we made it worth your time and you learned something about actually how easy it is to go fine-tune a model from scratch uh by using both Surge and cohere um we're gonna send you an email after this uh with a link to a survey and the data set that we mentioned the toxicity data set that I showed earlier um if you don't mind filling out the survey be super helpful we're going to do more of these webinars together in the future and we just want to make them as helpful to you as possible so any feedback or suggestions you have on what you would want to see in the future were all ears we appreciate the feedback and more than that we just appreciate your time we're glad you were here um it's a super exciting time to be in the large language model space to be thinking about deploying them in products so we'll definitely be doing more of this type of webinar to help everyone understand how easy this is and how impactful and Powerful it can be for your business uh Ellie any any last words from you before we wrap this up nothing else from me thank you all so much for coming today um yeah if you have any additional questions feel free to reach out to me directly I'll drop my email uh in the chat uh cohere also has a Discord community so don't hesitate to reach out there with questions too awesome I will drop my LinkedIn too feel free to connect and DM me there if you have any questions and we look forward to seeing you guys next time and hope you have a great day in the meantime thanks all right bye

Original Description

In this session, learn how to build an advanced custom language model that's customized to your needs -- whether you're looking to summarize text, or build a toxic content classification system. Join Cohere and Surge AI in our NLP Jumpstart session as we discuss: 1) How best to solve language AI problems with high-quality labeled datasets 2) How large language models (LLMs) can help teams achieve time to value as quickly as possible with a few examples of labeled data 3) How to integrate Surge AI labeling platform into your Cohere workflow

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Cohere · Cohere · 43 of 60

← Previous Next →

Andreas Madsen on Independent Research and Interpretability

Andreas Madsen on Independent Research and Interpretability

Plex: Towards Reliability using Pretrained Large Model Extensions

Plex: Towards Reliability using Pretrained Large Model Extensions

Independent Research Panel Discussion

Independent Research Panel Discussion

The Future of ML Ops: Open Challenges and Opportunities

The Future of ML Ops: Open Challenges and Opportunities

C4AI Special - Grad School Applications

C4AI Special - Grad School Applications

Cohere For AI Fireside Chat: Samy Bengio

Cohere For AI Fireside Chat: Samy Bengio

Cohere For AI - Scholars Program Information Session

Cohere For AI - Scholars Program Information Session

Modular and Composable Transfer Learning with Jonas Pfeiffer

Modular and Composable Transfer Learning with Jonas Pfeiffer

Jay Alammar Presents Large Language Models for Real World Applications

Jay Alammar Presents Large Language Models for Real World Applications

Catherine Olsson - Mechanistic Interpretability: Getting Started

Catherine Olsson - Mechanistic Interpretability: Getting Started

How To Prompt Engineer a Tech Interview App | TOHacks 2022 Winners

How To Prompt Engineer a Tech Interview App | TOHacks 2022 Winners

C4AI Sparks: Samy Bengio

C4AI Sparks: Samy Bengio

BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1

BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1

Exploring News Headlines With Text Clustering | Jay Alammar

Exploring News Headlines With Text Clustering | Jay Alammar

Scale TransformX | Fireside Chat: Aidan Gomez and Alexandr Wang

Scale TransformX | Fireside Chat: Aidan Gomez and Alexandr Wang

Making Large Language Models Accessible | Scale AI Fireside chat with Bill MacCartney

Making Large Language Models Accessible | Scale AI Fireside chat with Bill MacCartney

Intro to KeyBERT - BERTopic for Topic Modeling

Intro to KeyBERT - BERTopic for Topic Modeling

Intro to PolyFuzz - BERTopic for Topic Modeling

Intro to PolyFuzz - BERTopic for Topic Modeling

API Design Philosophy - BERTopic for Topic Modeling

API Design Philosophy - BERTopic for Topic Modeling

Code demo of BERTopic - BERTopic for Topic Modeling

Code demo of BERTopic - BERTopic for Topic Modeling

Short texts vs long texts in BERTopic- BERTopic for Topic Modeling

Short texts vs long texts in BERTopic- BERTopic for Topic Modeling

How People can help BERTopic - BERTopic for Topic Modeling

How People can help BERTopic - BERTopic for Topic Modeling

Cohere For AI: Training Sensorimotor Agency in Cellular Automata with Bert Chan

Cohere For AI: Training Sensorimotor Agency in Cellular Automata with Bert Chan

Cohere API Community Demos | October 2022

Cohere API Community Demos | October 2022

Perfect Prompt Demo By Arjun Patel

Perfect Prompt Demo By Arjun Patel

Project Idea Generator Demo By Tobechukwu Okamkpa

Project Idea Generator Demo By Tobechukwu Okamkpa

SuperTransformer Demo By Amir Nagri and Team Megatron

SuperTransformer Demo By Amir Nagri and Team Megatron

Cohere For AI Fireside Chat: Pablo Samuel Castro

Cohere For AI Fireside Chat: Pablo Samuel Castro

How Startups Can Use NLP to Build a Competitive Moat

How Startups Can Use NLP to Build a Competitive Moat

Build Chatbots Faster with Large Language Models

Build Chatbots Faster with Large Language Models

Tools to Improve Training Data - Vincent Warmerdam - Talking Language AI Ep#2

Tools to Improve Training Data - Vincent Warmerdam - Talking Language AI Ep#2

Utku Evci - Sparsity and Beyond Static Network Architectures

Utku Evci - Sparsity and Beyond Static Network Architectures

Adding human intelligence to ML models with human-learn #shorts #machinelearning #nlp

Adding human intelligence to ML models with human-learn #shorts #machinelearning #nlp

Iterating on your data with doubtlab - Tools to Improve Training Data

Iterating on your data with doubtlab - Tools to Improve Training Data

Adding Human Intelligence to ML models with Human learn - Tools to Improve Training Data

Adding Human Intelligence to ML models with Human learn - Tools to Improve Training Data

Scikt Learn embeddings helpers with Embetter - Tools to Improve Training Data

Scikt Learn embeddings helpers with Embetter - Tools to Improve Training Data

Building Cohere API Demo App With Streamlit | Adrien Morisot

Building Cohere API Demo App With Streamlit | Adrien Morisot

Rosanne Liu - career creation for non-standard candidates

Rosanne Liu - career creation for non-standard candidates

Giving computers many human languages with Cohere's multilingual embeddings

Giving computers many human languages with Cohere's multilingual embeddings

Learning by Distilling Context with Charlie Snell

Learning by Distilling Context with Charlie Snell

Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3

Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3

Reflecting on for.ai...

Reflecting on for.ai...

Create a Custom Language Model with Surge AI and Cohere

Create a Custom Language Model with Surge AI and Cohere

Cohere API Community Demos | November 2022

Cohere API Community Demos | November 2022

Cohere API Community Demos | December 2022

Cohere API Community Demos | December 2022

Cohere For AI Presents: Colin Raffel

Cohere For AI Presents: Colin Raffel

Lucas Beyer - FlexiViT: One Model for All Patch Sizes

Lucas Beyer - FlexiViT: One Model for All Patch Sizes

What is Neural Search? Nils Reimers - Sentence Transformers and Embedding Evaluation

What is Neural Search? Nils Reimers - Sentence Transformers and Embedding Evaluation

Evaluating Information Retrieval with BEIR

Evaluating Information Retrieval with BEIR

Evaluating Embeddings with MTEB Massive text embeddings benchmark - Nils Reimers

Evaluating Embeddings with MTEB Massive text embeddings benchmark - Nils Reimers

High quality text classification with few training examples with SetFit

High quality text classification with few training examples with SetFit

Multilingual and cross lingual embeddings - Nils Reimers

Multilingual and cross lingual embeddings - Nils Reimers

Developing open-source software: lessons, benefits, and challenges - Nils Reimers

Developing open-source software: lessons, benefits, and challenges - Nils Reimers

Ask Me Anything with Ed Grefenstette, Head of Machine Learning at Cohere

Ask Me Anything with Ed Grefenstette, Head of Machine Learning at Cohere

HyperWrite Powers Its Generative AI Service with Cohere

HyperWrite Powers Its Generative AI Service with Cohere

EMNLP 2022 Conference Special Edition - Talking Language AI #4

EMNLP 2022 Conference Special Edition - Talking Language AI #4

Cohere API Community Demos | January 2023

Cohere API Community Demos | January 2023

C4AI Sparks: Rosanne Liu on Career Creation for Non-Standard Candidates

C4AI Sparks: Rosanne Liu on Career Creation for Non-Standard Candidates

Michael Tschannen - Image-and-Language Understanding from Pixels Only

Michael Tschannen - Image-and-Language Understanding from Pixels Only

How to Add AI to your App

How to Add AI to your App

This video teaches viewers how to create custom language models using Surge AI and Cohere, covering topics such as fine-tuning, toxicity classification, and generative AI. Viewers will learn how to build and deploy custom language models for specific tasks, and how to improve model performance with fine-tuning and prompt engineering.

Key Takeaways

Build a training data set containing examples of toxic and non-toxic tweets
Provide guidance to data labelers on how to think about toxicity
Collect tweets that fit the criteria of toxicity
Submit the URL and raw text of the tweet to build the data set
Fine-tune the model using the training data set
Specify task type for classification
Upload training set and validation set
Review data and format correctly
Name model and start training
Project embeddings into 2D vector space

💡 Fine-tuning a language model with a high-quality data set can significantly improve its performance and accuracy, especially for specific tasks such as toxicity classification.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Fine-tuning LLMs

View skill →

Fine-tuning T5 LLM for Text Generation: Complete Tutorial w/ free COLAB #coding

Fine-tuning T5 LLM for Text Generation: Complete Tutorial w/ free COLAB #coding

Train image classifier using transfer learning - Fine-tuning MobileNet with Keras

Train image classifier using transfer learning - Fine-tuning MobileNet with Keras

Advanced Fine-Tuning in Rust

Advanced Fine-Tuning in Rust

GPT-4o: Fine-tune OpenAI's Multimodal Model | Live Coding & Q&A (Oct 3rd)

GPT-4o: Fine-tune OpenAI's Multimodal Model | Live Coding & Q&A (Oct 3rd)

LLM Fine-tuning: Two Crucial Tips for New Models - LLama 2

LLM Fine-tuning: Two Crucial Tips for New Models - LLama 2

SDXL LORA STYLE Training! Get THE PERFECT RESULTS!

SDXL LORA STYLE Training! Get THE PERFECT RESULTS!

Related AI Lessons

Embeddings Simplified

Learn the basics of embeddings and how they simplify complex data, a crucial concept in AI and ML

Building LSTMs with PyTorch and Lightning AI Part 7: Resuming Training with Checkpoints

Learn to resume LSTM training with checkpoints using PyTorch and Lightning AI, enabling efficient model iteration and development

Dev.to · Rijul Rajesh

How AI Learns with Less Labeled Data

Learn how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection

Comparing Sarvam-30B and Qwen2.5–14B on Spider Text-to-SQL: An Active-Parameter Perspective

Learn how to compare large language models like Sarvam-30B and Qwen2.5-14B on the Spider Text-to-SQL benchmark from an active-parameter perspective

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)