Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020

JupyterCon · Intermediate ·📐 ML Fundamentals ·5y ago

Key Takeaways

The video demonstrates how to supercharge data science workflow using tools like Jupyter notebooks, VS Code, and GitHub Code Spaces, highlighting features such as variable explorer, data viewer, and remote compute power.

Full Transcript

hi everyone my name is jeffrey a pm at microsoft on the python data science team within visual studio code and specifically i'm working on the jupyter notebooks experience within business code in this talk we're going to be showcasing some of our brand new jupiter notebooks features in the python extension for vs code as well as github code spaces for data science work within your browser and on the go and finally i'm going to be showcasing how you can leverage all these tools to supercharge your own data science workflow to become more productive so before we get started to actually access these jupiter notebooks features you'll first need to install vs code of course um it's cross-platform completely free and open source that's really easy to get and secondly you'll need to install the python extension because that's where the jupyter notebooks features live so to do so you can just go to the extensions tab on the left and just search for the keyword python once you have the python extension installed to get access to the notebook editor or the jupiter notebook interface you can just open up any existing notebook you have or you can go to the command palette using the shortcut ctrl shift key or command shift p if you're on a mac and just search for the keyword create new blank trooper notebook and this will launch in what we call our notebook editor it's a jupiter notebook ui interface um as you can tell so you have your cells you have your input and output and this combines the flexibility of notebooks with the power of vs code as editor id so to see what that all means let's bring in a real world scenario so let's say you're a data scientist and you're trying to graph out and predict coded cases and try to model all that data within the us i have a notebook already created with that data already so let's just close this one up and open this existing notebook and again like i mentioned if you already have a notebook you can open that up and it'll also open up in this notebook editor as well so within your cells one of the main advantages of this notebook editor is you have full intellisense and autocomplete support so if i want to do a pandas.vcsv i can type pd dot and you can see it gives the top suggestions based on what um what functions are most used from the pandas library so here i want to do read csv so i correctly predict that so i can type in read csv and as you can see i'm not pushing tab this automatically just gives me these suggestions on the fly as i type which is great so here i want to want to read this csv so i'm just going to copy paste this file and paste this in so i can run the cell by just clicking this run cell icon or we also support jupiter hotkeys so i can also run the cell by doing uh control enter as well or shift enter and we support um all the many of jupiter hotkey so you can just push escape you can go into command mode you can move around um you can add a sound blow by pushing b and then again you can just delete the sound by pushing double d so although notebooks are flexible where you can run cells out of order and multiple times one of the biggest annoyances with data scientists with jupiter notebooks is keeping the track keeping track of the state of all your variables so we made that a lot easier by building a simple variable explorer within the notebook editor so all you need to do is just click on this variable explorer icon at the very top and it will show you all the active variables within your notebook so you can quickly see a lot of useful information such as the name of the variable the type of the variable um the size and then as well if you get you're going to preview the value if you want to take a deeper dive on what your data actually is we have something called the data viewer so if you click on this icon on the left it will open up in what we call our data viewer so this is like an excel like interface um for array like data so this is a data frame and it puts in its excel like format and what's really great is it's more a lot more human readable for you as a user you can sort by different columns and the other really great feature is you can filter rows so one really great example is maybe you want to make sure once you have a like a values column you want to make sure there's no negative values you just search for any values less than zero so make sure nothing's negative and you can see that this filter features really can be a really powerful tool in your arsenal so let's go back to our notebook another feature i wanted to point out is something called run by line so run by line is a simplified notebook debugging experience that lets you step through your code one cell at a time and examine the state of your code at each step so it's actually invoked run by line you just click on the run by line icon in the cell and you can see it's going to stop at the first line and you can keep on clicking it to go and step through each line each line of your code cell and as you stop at that you can actually open up the variable explorer and it will show you the current state of your that variable this is a really triple example but as you can imagine if you have like something like a training loop you can step through each line of that training loop to ensure that your forward and back propagation calculations are correct and so that'd be one really great use case for this so another cell below here that prints out a matplotlib figure of the top 10 worst case states in terms of cases so again i can quickly click shift enter to run that cell and again um this brings up the shows this matplotlib if i wanted to look at this map a little more detail i can click on the top left icon and it'll open up in what we call a plot viewer so here it maintains a state and history of all my graphs so maybe let's say i change the graph i can view the changes that are in the output as well and within the plot view i can do things like such as zooming in so this is a trivial example like i mentioned but if you have if you want to have more data you want to look at your x-intercepts or see where it crosses each of the axis this is where it can be really useful for another great feature is that you can also save and export and share this plot with others if i can click on save i can share save this as a pdf or image file and then quickly share that with others as well so heading back to our plot again or our notebook again sorry we also have full support now for custom ipad widgets so what this means is that you can actually work with interactive plots and interactive data so down here we can see i have a cell right now of an interactive map of the united states relative to the number of kobe cases per state and here i can just play with this graph i can zoom in as well and this just makes it a lot easier for the data scientist to work with with this interactive plot rather than a static map so finally once i've examined all my data done a lot of the data analysis to visualize the data like i mentioned if i wanted to actually model the data do some model training um with a maybe a ml model instead of traveling to train on my local machine which is just a laptop which might take many hours to do i can actually leverage the remote compute power of a remote server so let's say i want to use a server with a gpu that i have in the cloud i quickly do do so by opening up the command palette again searching for the keyword specify local remote jupyter server for connections and if i click on that i can click on existing and here is where you can just type in the ip address or the uri of your remote server so if you have a on-prem server from your work or company or if you have a remote server in the cloud you can leverage that gpu compute again once you're on that remote compute you can also click on the kernel switcher and again it detects all the kernels in your system and you can switch between the kernels there as well last really cool feature i want to point out is something that we call gather so gather analyzes notebooks and code dependencies within your notebook it helps you extract only the relevant code segments that are required to recreate a particular cell output so to actually for example let's say i want to run a gather on this cell let's say i only want to get me the code that was required to generate this visualizer or this interactive map and i want to share that with somebody i can just click gather by clicking the gather icon in the top left in the cell or sorry top right in the cell and open up a new notebook and as you see it will only extract out the relevant code cells and the relevant lines of code within that cell that are required to generate this output so you can see here it left out a lot of the imports i didn't need such as the matplotlibs left out the code cells that were plotting out or printing out this matplotlib and only got me the cells and code that required to generate this output again so to access gather all you need to do is uh go to the extensions tab and search for the keyword gather as it's a separate free extension as well and if you click on this and install it you'll see the gather icon appear on already run cells and you can just click on that to gather all that code finally once you're done uh with your experimentation code when you've gathered your code and you're ready to turn your experimentation code into production python code we made that again super simple by just clicking this export icon and you can export it to the python script where you can either just check it into github or do whatever you want with it so let's switch gears a bit one last thing i wanted to point out is that everything you just saw me do within vs code you can have the exact same experience in your browser with github code spaces so github codespaces is an all-in-one integrated dev environment within your browser so what this means is you don't need to worry about setting up your machine worrying about package management worrying about compute et cetera all you need is a github repo with your notebook and you're good to go so the main benefit with code spaces is that you can work on your notebooks anywhere just because it's in the browser and as well you don't need to worry about compute since you can leverage the compute power of the vm to actually train your model or do a lot of the compute heavy tasks that you need to do and not having to worry about migrating your project afterwards to like a remote machine that has more powerful compute again if you want to check out code spaces all you need to do is just go to github.com features slash code spaces and you can see this is the exact same experience um exact same ui of everything we just did previously so you can try out the jupyter notebook editor today all you need is vs code or get code spaces and the python extension and all you need to do is just quickly create a new jupyter notebook or just bring in your existing ip1 via drop your notebook you can find all the information you need to get started at the link on the screen so aka dot ms notebook super easy to remember and you can find my contact information if you ever want to reach out at j-e-m-e-w microsoft.com so thank you everyone for watching

Original Description

Brief Summary Visual Studio Code and GitHub codespaces now offers a first-class Notebooks experience along with many great features for Data Scientists and Python developers alike. Come learn how you can explore and experiment with your Python code using the flexibility of Notebooks combined with the power and productivity of VS Code. Outline Background Knowledge: - Python - Jupyter Notebooks Outline: Jupyter Notebooks and interactive programming have become one of the most popular tools for developing Python due to its flexibility and ease of use. Visual Studio Code now offers a first-class Notebooks experience along with many great features for Data Scientists and Python developers alike. Come see how you can explore, experiment, and productionize machine learning models using the flexibility of Notebooks combined with the power and productivity of VS Code for data science workloads. Some of the features we will be showcasing are: - Intellisense/Intellicode (smart code completion) - Variable explorer / dataframe viewer - Remote development (remote Jupyter servers) - Run-by-line (Simplified notebook debugging) - Gather (Automatic notebook cleanup/dependency analysis) - Interactive Programming in the Interactive Window - GitHub Codespaces Relevant Links: - Jupyter Notebooks in VS Code: https://code.visualstudio.com/docs/python/jupyter-support - Codespaces: https://github.com/features/codespaces ___ JupyterCon brings together data scientists, business analysts, researchers, educators, developers, core Project contributors, and tool creators for in-depth training, insightful keynotes, networking, and practical talks exploring the Project Jupyter ecosystem. https://jupytercon.com/ JupyterCon is possible thanks to the generous support of our sponsors, and the labor of many volunteer organizers. https://jupytercon.com/sponsors/ https://jupytercon.com/about/#Organizing%20Committee
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from JupyterCon · JupyterCon · 3 of 60

1 Interview   Joshua Patterson NVIDIA
Interview Joshua Patterson NVIDIA
JupyterCon
2 Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020
Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020
JupyterCon
Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020
Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020
JupyterCon
4 Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020
Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020
JupyterCon
5 Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community  | JupyterCon 2020
Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community | JupyterCon 2020
JupyterCon
6 Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020
Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020
JupyterCon
7 Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020
Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020
JupyterCon
8 Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020
Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020
JupyterCon
9 Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020
Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020
JupyterCon
10 Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020
Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020
JupyterCon
11 Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020
Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020
JupyterCon
12 Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020
Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020
JupyterCon
13 Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020
Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020
JupyterCon
14 Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020
Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020
JupyterCon
15 Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020
Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020
JupyterCon
16 Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020
Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020
JupyterCon
17 TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020
TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020
JupyterCon
18 Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework  | JupyterCon 2020
Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework | JupyterCon 2020
JupyterCon
19 Rebecca Kelly- A shared Python, R and Q  Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020
Rebecca Kelly- A shared Python, R and Q Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020
JupyterCon
20 Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020
Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020
JupyterCon
21 Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020
Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020
JupyterCon
22 Chiin Rui Tan- From Zero to Hero | JupyterCon 2020
Chiin Rui Tan- From Zero to Hero | JupyterCon 2020
JupyterCon
23 Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020
Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020
JupyterCon
24 Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020
Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020
JupyterCon
25 Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020
Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020
JupyterCon
26 Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020
Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020
JupyterCon
27 Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020
Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020
JupyterCon
28 Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era  | JupyterCon 2020
Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era | JupyterCon 2020
JupyterCon
29 Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020
Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020
JupyterCon
30 Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020
Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020
JupyterCon
31 David Pugh - Best practices for managing Jupyter-based data science  | JupyterCon 2020
David Pugh - Best practices for managing Jupyter-based data science | JupyterCon 2020
JupyterCon
32 Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020
Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020
JupyterCon
33 Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab  | JupyterCon 2020
Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab | JupyterCon 2020
JupyterCon
34 Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020
Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020
JupyterCon
35 Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020
Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020
JupyterCon
36 Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020
Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020
JupyterCon
37 Sheeba Samuel- ProvBook |JupyterCon 2020
Sheeba Samuel- ProvBook |JupyterCon 2020
JupyterCon
38 Philipp Rudiger - To Jupyter and back again | JupyterCon 2020
Philipp Rudiger - To Jupyter and back again | JupyterCon 2020
JupyterCon
39 Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020
Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020
JupyterCon
40 Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020
Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020
JupyterCon
41 Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020
Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020
JupyterCon
42 Devin Robison - Optimizing model performance | JupyterCon 2020
Devin Robison - Optimizing model performance | JupyterCon 2020
JupyterCon
43 Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020
Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020
JupyterCon
44 April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020
April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020
JupyterCon
45 Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020
Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020
JupyterCon
46 Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020
Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020
JupyterCon
47 Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020
Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020
JupyterCon
48 Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020
Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020
JupyterCon
49 Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020
Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020
JupyterCon
50 Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020
Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020
JupyterCon
51 Kenton McHenry - From Papers to Notebooks | JupyterCon 2020
Kenton McHenry - From Papers to Notebooks | JupyterCon 2020
JupyterCon
52 Ryan Herr - After model.fit, before you deploy| JupyterCon 2020
Ryan Herr - After model.fit, before you deploy| JupyterCon 2020
JupyterCon
53 Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020
Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020
JupyterCon
54 Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020
Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020
JupyterCon
55 Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020
Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020
JupyterCon
56 Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020
Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020
JupyterCon
57 Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020
Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020
JupyterCon
58 Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020
Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020
JupyterCon
59 Aaron Bray - Pulse Physiology Engine | JupyterCon 2020
Aaron Bray - Pulse Physiology Engine | JupyterCon 2020
JupyterCon
60 Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020
Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020
JupyterCon

The video teaches how to enhance data science workflow using Jupyter notebooks, VS Code, and GitHub Code Spaces, covering features like variable explorer, data viewer, and remote compute power. It provides a comprehensive overview of how to leverage these tools for efficient data analysis, visualization, and model training. By following the steps outlined in the video, viewers can optimize their workflow and improve productivity.

Key Takeaways
  1. Install VS Code and Python extension
  2. Open Jupyter notebook
  3. Use variable explorer to track variables
  4. Use data viewer to preview and explore data
  5. Leverage remote compute power by specifying a local or remote Jupyter server
  6. Use the 'Gather' extension to extract relevant code segments
  7. Export experimentation code to a Python script
💡 The video highlights the importance of leveraging remote compute power and using tools like Jupyter notebooks and VS Code to optimize data science workflow, making it easier to analyze, visualize, and train models.

Related Reads

📰
What Is MLIR and Why Does It Exist?
Learn about MLIR, a intermediate representation for machine learning models, and its purpose in optimizing ML workflows
Dev.to · Fedor Nikolaev
📰
Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model
Choosing the right machine learning development company is crucial for turning AI investments into measurable results, as it can make or break the success of AI projects
Medium · Machine Learning
📰
Data privacy in AI training: federated learning, differential privacy, and synthetic data
Learn how federated learning, differential privacy, and synthetic data preserve data privacy in AI training, and why they matter for secure machine learning
Dev.to AI
📰
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Learn to preprocess data by encoding and scaling features for better machine learning model performance
Medium · Machine Learning
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →