DeepSeek Coder V2 - Quick Look!

1littlecoder · Advanced ·🧠 Large Language Models ·1y ago

Key Takeaways

Introduces DeepSeek Coder V2, a mixture-of-experts code language model

Full Transcript

the most underrated llm for coding is a deep seat coder V2 I don't know why not a lot of people talk about this model primarily probably it is from China but I believe this is one of the best models that are available out there that has a decent open license and also really good for coding I mean you don't have to believe what I say I'm going to show you a couple of benchmarks and we are also going to do a couple of test to start with this is a model from a group called Deep seek AI if I remember correctly this is probably somewhere related to the Alibaba Group which is from China so this model the latest version is deep seek coder V2 this model performs a better than gp4 turbo in some tasks for example like human evil mbbp plus which is a python test and in fact like there are other benchmarks where this model performs really well but again like this is a benchmark that has been given by the model creators themselves but if you want to take an independent Benchmark for example there is AER which is an llm based coding framework so AER helps you in creating the entire llm based coding projects so if you see aers llm leaderboard uh the second best model in terms of the code or project completion with 75.2% completion is deep C coder V2 the first model is CLA 3.5 Sonet the second model is deepy coder V2 this is in in fact slightly above the GPT 40 model and in fact it is a lot more above than gp4 model and it is not just this this particular Benchmark you can see the next Benchmark that they've got which is for a code refactoring even there it is on par almost with uh the gp4 turbo and gp4 preview model and if you go to the popular Benchmark that we have got is LMS arena in terms of in this is at the fifth position with uh 16 position Improvement so deep C coder V2 model is again scoring really good in coding why is this model really good that is the first question that we should be asking at this point I hope I made you believe or establish the fact and believe that this is a good model this is a good model primarily because this is not one model it is a mixture of experts model so this is a mixture of experts model just like the kind current popular Trend we have got mixl we have got probably gp4 I'm not sure then we have got like other models which are also mixture of experts model so this model comes in four different flavors so we have got the deepy coder the light model deeps coder light base and instruct deeps coder the whole full model with base and instruct in the light model you have got 16 billion parameters totally the mixture of export setup and 2.4 billion parameter models are active for every single token for the full model 236 billion parameter model and the active model is 21 billion parameter the most important thing or the most exciting thing is youve got 128,000 tokens active the context window I think this is extremely important for coding related task and you can see that an aers Benchmark if you see aers Benchmark one of the reason that this model is doing good because this model is not just doing code diff edit it is doing the code whole edit and uh it has got like 100% the percentage of correct edit format so this is one of the secret Source probably like I believe context window is one of the important aspects why this model is doing extremely well otherwise I guess uh they have used extremely really good uh training tokens that has made it possible for this model to have a really good um score in terms of all the benchmarks like human evil mbbp math GSM K so in all these benchmarks if you see DC coder V2 has been scoring really insanely good thewe bench out of box is where this model is not as good as let's say um gbd4 turbo Gemini 1.5 Pro uh This Is The Benchmark which uses the models to solve GitHub issues the actual GitHub issues in fact this is a benchmark which made divin quite popular D is still not there so anyways we'll leave that out uh deep seek also provides an endpoint uh if you want to use APA and that is probably the cheapest Endo that I've ever seen the input token is .14 the output is for million tokens the output is 28 so you can see this chart like ridiculously insanely cheap cheaper than Claude 3 hu so I'm not sure how they're managing to do it but you never know what kind of Technology they've got but if you want to run the model locally you can run the model locally I'm not going to run this locally but I'm going to jump directly into to their platform which is called cod. Deep seek.com if you go to their platform you can start using this code so I just went simply asked okay develop a simple HTML CSS JavaScript website that takes user input and displays an appropriate Emoji so the code is really good the only catch here is that the code is basically a set of k switch uh statement so for this command it's going to give you this output so anything that is not part of this this is going to this give this but again this is is a decent code like for example I can go here and also run the HTML and I can say something like angry and it would show me angry emoticon but again if I say anger it is not going to show anything because it's literally keyword based I'm not judging the quality of the application in itself but it does a pretty good job so I can go ahead and then ask it to improve things like for example I can say can you add a logo make the website more beautiful so I can just say that and send it and just like CL 3 Sonet 3.5 Sonet within artifacts you would be able to run this HTML within a Sandbox environment so ideally what we would do we would copy this go to a platform like code pin and then run this but you don't have to do this it is all available for you within one single interface and this at this point this model is available for you to use free I'm not sure about the rate limits but you can use this model within this uh platform again um if you're going to use any sensitive data I would I would suggest like you should probably not do it because this is a hosted platform and again um it it depends like how much you are bothered by that this is a Chinese llm not Chinese llm at this point I'm not very bothered um and again this model comes with open license so if I want to run I can still run it locally so run HTML it it gave me a placeholder logo here um which I can replace it with some other image and I can just go say anger angry and it is going to generate and you can see it is giving me more effect and I H it is giving me zoom and all these things okay that's well and good let me go ahead and ask another question I can click clear the context here so you can go up and then see the context but I'm going to click clear the context and ask a very simple question I'm going to say create a simple gradio application that takes two inputs and then creates a new stable diffusion based image AG using diffusers maybe I shouldn't say diffusers cool so I'm expecting it to understand that it can use a diffusers which it very correctly did it and you can also see that it installs all the right Frameworks in this case diffusers even though it is there Transformers and torch would be required for you to run it on Google collab and then it starts using stable diffusion pipeline which is also correct the only catch here is that it uses stable diffusion 1.4 model which still a lot of people like and uh it also uses if you have GPU um what to do Cuda if you have CPU what to do the code is really good it uses stable uh gradio older class which is interface and I can say it seems you are using gradio interface do you know to use blocks let's see if it can change the code with blocks and um as you can see here it kind of seems like understanding what I just said so the rest of the code St same and this is absolutely a code that would ideally work so the model is being downloaded the model is being sent to the device and then you have got prompt and it uses the blocks and this is how you should ideally write the blocks and you have got all the things that are available for you so it gives you the explanation as well so so far it seems like this is one of the best models from uh my testing as well but I'm going to spend more time explaining this model and what kind of things that you can build with this model let me know in the comment section what do you think about it and also if you have any hypothesis why this model is not being discussed a lot but this is honestly one of the best probably I can say the best open model with open license that is available for us to use in terms of coding model so deep C coder we do check it out see you in another video Happy prompting

Original Description

DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. 🔗 Links 🔗 Deepseek Coder V2 - https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct Aider leaderboard - https://aider.chat/docs/leaderboards/ ❤️ If you want to support the channel ❤️ Support here: Patreon - https://www.patreon.com/1littlecoder/ Ko-Fi - https://ko-fi.com/1littlecoder 🧭 Follow me on 🧭 Twitter - https://twitter.com/1littlecoder Linkedin - https://www.linkedin.com/in/amrrs/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from 1littlecoder · 1littlecoder · 0 of 60

← Previous Next →
1 How to create your Free Data Science Blog on Github with Fastpages from Fastai
How to create your Free Data Science Blog on Github with Fastpages from Fastai
1littlecoder
2 Making Interactive Matplotlib Plots for Data Science Visualizations on Jupyter (Python)
Making Interactive Matplotlib Plots for Data Science Visualizations on Jupyter (Python)
1littlecoder
3 Create your first Data Science Web App using R Shiny
Create your first Data Science Web App using R Shiny
1littlecoder
4 How to create a Reproducible Example in R using reprex
How to create a Reproducible Example in R using reprex
1littlecoder
5 No Code Visualization using esquisse with Tableau-like Drag and Drop GUI in R
No Code Visualization using esquisse with Tableau-like Drag and Drop GUI in R
1littlecoder
6 Scrape HTML Table using rvest and Process them for insights using tidyverse in R
Scrape HTML Table using rvest and Process them for insights using tidyverse in R
1littlecoder
7 Google Teachable Machine Learning Build No Code AI solution
Google Teachable Machine Learning Build No Code AI solution
1littlecoder
8 Create meaningful fake tidy datasets in R using fakir [#rstats Package]
Create meaningful fake tidy datasets in R using fakir [#rstats Package]
1littlecoder
9 How to enable using R Programming with Visual Studio VS Code
How to enable using R Programming with Visual Studio VS Code
1littlecoder
10 Python, Community, Books - with Abhiram R - Bangpypers Co-organizers | 1littlecoder podcast
Python, Community, Books - with Abhiram R - Bangpypers Co-organizers | 1littlecoder podcast
1littlecoder
11 Growing a Tech Community across India - Anubha Maneshwar, Founder Girlscript | 1littlecoder Podcast
Growing a Tech Community across India - Anubha Maneshwar, Founder Girlscript | 1littlecoder Podcast
1littlecoder
12 Intro to Google Colab - How to use Colab
Intro to Google Colab - How to use Colab
1littlecoder
13 Intro to Plotly Express - Complex Interactive Charts with One-Line of Python Code
Intro to Plotly Express - Complex Interactive Charts with One-Line of Python Code
1littlecoder
14 Indic NLP Python Toolkit Open Source Development - iNLTK Creator Gaurav Arora | 1littlecoder Podcast
Indic NLP Python Toolkit Open Source Development - iNLTK Creator Gaurav Arora | 1littlecoder Podcast
1littlecoder
15 Do you want a career in Data Science - Tamil Webinar
Do you want a career in Data Science - Tamil Webinar
1littlecoder
16 Android Smartphone Analysis in R [Live Coding Screencast]
Android Smartphone Analysis in R [Live Coding Screencast]
1littlecoder
17 Programmatically create Images, Memes, Watermarks using Python with imgmaker
Programmatically create Images, Memes, Watermarks using Python with imgmaker
1littlecoder
18 Kaggle Walkthrough to get you started with Data Science - Webinar
Kaggle Walkthrough to get you started with Data Science - Webinar
1littlecoder
19 Community, Corporate Job, Coding - Gnana Lakshmi T C aka Gyan, WomenWhoCode Leadership Fellow
Community, Corporate Job, Coding - Gnana Lakshmi T C aka Gyan, WomenWhoCode Leadership Fellow
1littlecoder
20 Easy ggplot2 Theme Customization with {ggeasy} | Data Visualization in R
Easy ggplot2 Theme Customization with {ggeasy} | Data Visualization in R
1littlecoder
21 Excel to R - Pivot + Bar Chart in Excel  & R using tidyverse [Live Coding]
Excel to R - Pivot + Bar Chart in Excel & R using tidyverse [Live Coding]
1littlecoder
22 Excel to R #2 - VLOOKUP in Excel to LEFT_JOIN, MERGE in R
Excel to R #2 - VLOOKUP in Excel to LEFT_JOIN, MERGE in R
1littlecoder
23 5 websites to get Free Real-World Datasets for Data Science/ML Projects
5 websites to get Free Real-World Datasets for Data Science/ML Projects
1littlecoder
24 Excel to R #3 - APPROXIMATE VLOOKUP in Excel to FUZZY LEFT_JOIN in R
Excel to R #3 - APPROXIMATE VLOOKUP in Excel to FUZZY LEFT_JOIN in R
1littlecoder
25 Correlation-alternative PPS (Predictive Power Score) Python Package Demo
Correlation-alternative PPS (Predictive Power Score) Python Package Demo
1littlecoder
26 Automated Website Screenshots in R using {webshot}
Automated Website Screenshots in R using {webshot}
1littlecoder
27 Installing Custom RStudio Theme (Synthwave85)
Installing Custom RStudio Theme (Synthwave85)
1littlecoder
28 Analyse Google Trends Search Data in R using {gtrendsR}
Analyse Google Trends Search Data in R using {gtrendsR}
1littlecoder
29 3 Tips to ask question on Stack Overflow the right way to get answers
3 Tips to ask question on Stack Overflow the right way to get answers
1littlecoder
30 Learn Data Science with R - Mini Projects - Web Scraping Zomato
Learn Data Science with R - Mini Projects - Web Scraping Zomato
1littlecoder
31 Easily make Dumbbell Chart using {ggcharts} | Data Visualization in R
Easily make Dumbbell Chart using {ggcharts} | Data Visualization in R
1littlecoder
32 GET Hackernews Front Page Results using REST API in R
GET Hackernews Front Page Results using REST API in R
1littlecoder
33 Quickly deploy ML WebApps from Google Colab using ngrok
Quickly deploy ML WebApps from Google Colab using ngrok
1littlecoder
34 Use Jupyter Notebooks within VSCode (Visual Studio Code) in 2020
Use Jupyter Notebooks within VSCode (Visual Studio Code) in 2020
1littlecoder
35 Plotly Interactive Plots as Pandas Plotting Backend df.plot()
Plotly Interactive Plots as Pandas Plotting Backend df.plot()
1littlecoder
36 Stack Overflow Developer Survey 2020 Highlights for New Programmers
Stack Overflow Developer Survey 2020 Highlights for New Programmers
1littlecoder
37 Matplotlib Animation Charts in Python using Celluloid
Matplotlib Animation Charts in Python using Celluloid
1littlecoder
38 Coding, Postwoman, Passion Project Book - Liyas Thomas Open Source Developer - 1littlecoder podcast
Coding, Postwoman, Passion Project Book - Liyas Thomas Open Source Developer - 1littlecoder podcast
1littlecoder
39 Aspiring Data Scientist, Tips on How to learn Business Domain Knowledge
Aspiring Data Scientist, Tips on How to learn Business Domain Knowledge
1littlecoder
40 Bokeh Interactive Charts as Pandas Plotting Backend df.plot_bokeh()
Bokeh Interactive Charts as Pandas Plotting Backend df.plot_bokeh()
1littlecoder
41 Easy Fast Python Pandas Summary with Sidetable | Pandas Tips & Tricks
Easy Fast Python Pandas Summary with Sidetable | Pandas Tips & Tricks
1littlecoder
42 Inception, Content Ideas, Consistency - Srivatsan Srinivasan AIEngineering YouTube Content Creator
Inception, Content Ideas, Consistency - Srivatsan Srinivasan AIEngineering YouTube Content Creator
1littlecoder
43 ggplot2 Text Customization with ggtext | Data Visualization in R
ggplot2 Text Customization with ggtext | Data Visualization in R
1littlecoder
44 Penguins Dataset Overview - iris alternative | EDA Data Visualization in R
Penguins Dataset Overview - iris alternative | EDA Data Visualization in R
1littlecoder
45 YouTube Growth Tips, Content Creation - Bhavesh Bhatt, YouTuber (Data Science & Machine Learning) #7
YouTube Growth Tips, Content Creation - Bhavesh Bhatt, YouTuber (Data Science & Machine Learning) #7
1littlecoder
46 Matplotlib Animated Bar Chart Race in Python | Data Visualization
Matplotlib Animated Bar Chart Race in Python | Data Visualization
1littlecoder
47 Simple Python GUI Development using {guietta}
Simple Python GUI Development using {guietta}
1littlecoder
48 #8 Niche, Growth, Monetization - David Langer - YouTuber Dave on Data
#8 Niche, Growth, Monetization - David Langer - YouTuber Dave on Data
1littlecoder
49 Simple Fast 3-step Python OCR using Deep Learning 40+ Languages
Simple Fast 3-step Python OCR using Deep Learning 40+ Languages
1littlecoder
50 Github New Feature Profile Summary/Mini-Resume - Profile Views
Github New Feature Profile Summary/Mini-Resume - Profile Views
1littlecoder
51 Otto ML Assistant, GPT-3 on Philosophers, Nvidia-ARM - 3 ML Tech News
Otto ML Assistant, GPT-3 on Philosophers, Nvidia-ARM - 3 ML Tech News
1littlecoder
52 What is OpenAI GPT-3 - Hype, Examples, Worries
What is OpenAI GPT-3 - Hype, Examples, Worries
1littlecoder
53 Julia 1.5, Datamuse API, Live HDR+ Pixel 4a - Machine Learning Tech News
Julia 1.5, Datamuse API, Live HDR+ Pixel 4a - Machine Learning Tech News
1littlecoder
54 Self-driving Car Engineer sentenced, arXiv Dataset, AI/ML Startup Idea - Machine Learning Tech News
Self-driving Car Engineer sentenced, arXiv Dataset, AI/ML Startup Idea - Machine Learning Tech News
1littlecoder
55 GPT-3 Explorer, Ciphey (Automated Decryption), Py-Sudoku - ML Tech News
GPT-3 Explorer, Ciphey (Automated Decryption), Py-Sudoku - ML Tech News
1littlecoder
56 How to use Advanced Google Search to extract Email Ids from Linkedin
How to use Advanced Google Search to extract Email Ids from Linkedin
1littlecoder
57 Cartoonizer Toon-IT (AI Web App), GPT-3 Advice, Android Earthquake Detection - ML Tech News
Cartoonizer Toon-IT (AI Web App), GPT-3 Advice, Android Earthquake Detection - ML Tech News
1littlecoder
58 Flow - R Package to visualize code logic, functions as a Flow Diagram
Flow - R Package to visualize code logic, functions as a Flow Diagram
1littlecoder
59 Build GPT-3-like Language Model on Google Colab with minGPT [PyTorch]
Build GPT-3-like Language Model on Google Colab with minGPT [PyTorch]
1littlecoder
60 Create a Pencil Sketch Portrait with Python OpenCV
Create a Pencil Sketch Portrait with Python OpenCV
1littlecoder

Related AI Lessons

How We Translate 300-Page Books Using Claude Without Hitting Token Limits
Learn how to translate long documents using Claude without hitting token limits by breaking them into overlapping chunks
Dev.to · 龚旭东
Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking
Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve model performance
Medium · AI
Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking
Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve LLM performance
Medium · LLM
A simple way to test model fallbacks with RouterBase
Learn to test model fallbacks with RouterBase using a simple fallback wrapper and OpenAI-compatible API surface
Dev.to · routerbasecom
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →