Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020

JupyterCon · Intermediate ·📄 Research Papers Explained ·5y ago

Key Takeaways

The video introduces Elyra, an AI-centric extension for JupyterLab, which provides a visual notebook pipeline editor and supports running notebooks as batch jobs, and demonstrates its application in a data science process pipeline with tools like JupyterLab, Elyra, and Python.

Full Transcript

hello everyone today i want to bring you intro to envira which is ai centric extensions to jupiter my name is even and today i have my colleague here and edward and stachoozi i'm a data scientist at ibm codex team hey everybody my name is ed and i'm also a data scientist on the kuday team hello everyone i'm sashathi i'm a developer advocate and data scientist in kodai team and first of all thank you so much for joining us today and i hope you are having a great time in the conference so let's quickly go over today's agenda starting with our team's introduction we will introduce you to two of our open source projects elira and data asset exchange and end with a demo showing how to use them together with reference link that you can take back from the talk so kodaite center for open source data and a technologies we are 30 plus developers and data scientists working on contributing to open source projects that covers entire data science life cycle as you can see on the screen on the right side of the slide this is the projects that we are contributing our goal is to make ai systems easy to create deploy and maintain today we are going to focus on two of our in-house open source projects first is elira our nexus data asset exchange and they both are marked on the slide here and now over to ed and he will introduce you all to elira so what exactly is a lyra well at its core allyra is simply a set of lab extensions catered to people who like to work with artificial intelligence and machine learning using notebooks our goal is to help data scientists machine learning engineers and software developers through the most common model development life cycle complexities alara was officially announced as an open source project by ibm this past april if you'd like to access the github repo or the docs you can follow the links here we've shared on the screen the lyra boasts a host of features to achieve its goal which we will discuss shortly for now notice from the preview of a lyra's ui here on the right that a alara doesn't radically reinvent jupiter lab to provide you with all of its functionalities it simply extends the distinct jupiter lab environment that we've all come to know and love over the past few years added tabs toolbars and launchers so what can actually do for you well you can really break down a lyra's core features into five items the first alara provides an easy to use notebook pipeline visual editor why build your next ai model using a pipeline editor well it's because a lyra provides a visual way to convert multiple notebooks covering pre-processing steps experimentation optimization and deployment onto batch jobs or workflows second allyra supports rounding notebooks as batch jobs directly in the ui making model training easier the iris supports easy creation and insertion of reusable code snippets fourth allyra pipelines support git version control allowing rollbacks to working versions of the code backups and most importantly sharing amongst the members fifth and finally are exposes python scripts as first-class citizens this allows users to locally edit their scripts and execute them against local or cloud-based resource seamlessly even will demonstrate all these features for you shortly in our demo but for now i'll pass it back to shruti to talk about ibm's data asset exchange that's it so let's have a quick overview of data asset exchange we called as tax so tax offers high quality wetted data by wetted i mean we start by tracing origin and merit of the data set learn about the usability rights and ownership from here we take all the information and we create a standard metadata and perform internal legal review before releasing the data sets to data asset exchange data sets here have clearly defined open data license we provide exclusive access to ibm research datasets that have played crucial role in building popular ai systems like ai debater entity recognition and so on so what do we offer do we offer only data sets no not really we offer much more than just raw data set these data sets comes with tutorials that demonstrates the usage of data and can be directly exported as watson studio project or a watson studio compatible notebook we also have data glossary that that will be that you can use for learning more about the attributes of the data set we also have an option to preview the data that can help you understand your data before using them in your project so this is our watson studio project so you can directly use all the notebooks in your watching studio project industrial use cases have been created using dax datasets and they are available as industrial accelerators which you can directly use with cloudback for data we have resources listed in slides that you can refer to know more about it but now over to even for the demo thanks sashusi so before i show you the demo i want to introduce you how the data science process look like so majorly a data science pipelines generally include five steps from data extraction to result interpretation as a data scientist i have to run multiple notebooks be one project especially if i change anything in one notebook then i have to run the other notebooks one by one it i will save the efforts by constructing them as a pipeline and run them all with one click so now let me walk you through the demo so this demo will show you by running ford notebooks using a lyra i will introduce you two ways to run your pipeline both on local machine and on coop floor pipelines to build a pipeline you can simply drag and drop the notebooks onto the canvas and connect them just as easy as drawing a graph you can arrange the notebooks in sequential or parallel order need to configure the notebook properties for several informations we put the environment variable dataset url as a dataset download link from dax we copy paste the link into the section this notebook produces output files i specify the file name as jfk weather.csv which is download through the first notebook into the directory the output file uploaded to cloud storage were saved in local directory after notebook processing complete we also need to select a docker image that will be used to run a notebook you can bring your own image or choose from the predefined public images such as a panda image a tensorflow image or pytorch image we choose pandas as a data image here because we mainly use pandas and numpy packages in the notebook we could only select one docker image each time however if you're running on your local machine the image you choose does not matter also you can declare the file dependencies as jfkweatherclaim.csv but this is not necessary since the files are already in the same folder i redefine the name of the pipeline by right-click my trackpad change the name to even dot pipeline also you can add comments to provide short descriptions this helps your colleague knows the function of each node before going into it now everything is all set let's save the pipeline and submit it to run locally the running logs are shown in terminal if everything completes successfully a message will pop up the run outputs will be shown in the notebook cells see on the left side the notebooks are just updated a few seconds ago now let me quickly walk you through the content of each notebook firstly the load data notebook downloads extract the zip file through the dataset url link and save it as jfkweather.csv file the part 1 notebook loads the data set downloaded from the previous notebook replace wrong values such as non t 0.02.01 second filter out everything out of the range reverse the column into numerical type clears out the missing values rename the columns and then save the clean data set as jfkweatherclaim.csv file the part 2 notebook selects five columns from the clean data and visualize the trend it also explores the dependencies between those columns then visualize the trend of rolling average in 2017. the parts3 notebook let's explore the approaches to predict future temperature by using time series data set creates the training validation and test splits train the data set and then compare the performances between different baseline models using mean squared air and it build arima model predictions for the first 48 hours of validation set you can get this notebook on the asset exchange page by clicking on the preview notebooks you can see we have street notebooks very similar to what we have in the demo pipeline the notebooks were designed to run on whatsapp studio and ibm cloud i made minor changes to those notebooks and to fit this into the jupiter environment so now let's submit this pipeline again on cool flow before running you need to configure your runtime by putting quidditch into each section i already pre-configured my runtime so we can just submit the pipeline on group flow in the run process eliro will generate gather and package the required artifacts upload them to cloud storage and triggers the pipeline execution in a selected crypto pipeline environment pipeline runs are listed in the experiments panel the graph panel displays the execution status of each node you can see the part 1 notebook is completed and has green check mark at the right top after all run is complete you can access the pipeline's output artifacts using the supported s3 client clicking on data folder which includes the jfk jfk data file and also the clean version and also on this page we have all of the inputs and output files listed all terribles are input files by clicking one of the output files and download you can see the result of the notebook is saved in the output cells is the demo i i would like to show you today and you're if you're interested and want to try this by yourself please feel free to install your lyra and clone my library examples repo to get the notebooks you can also get the notebooks on data asset exchange and run from there elira is an open source project so please feel free to open issues or enhancement requests on the elia repo there are also a few other talks related to elira from ibm code team please also feel free to check this out and this is everything i want to show you today and thanks for coming

Original Description

Brief Summary Have you ever wanted to run multiple notebooks in sequential and parallel order with one click? If so, come join our Intro to Elyra - an AI centric extension for JupyterLab session to learn how you can get set up and running with Elyra! Outline This workshop introduces users how to run data science notebooks with Elyra - an AI centric extension for JupyterLab. The notebook pipeline downloads a free dataset from Data Asset eXchange, then extracts, cleanses and analyzes the data file. The cleaned data file is subsequently used to predict certain weather features. In this session you will have a hands-on opportunity to learn how to build an Elyra pipeline. Get a head start today by checking out Elyra AI github repo→ https://github.com/elyra-ai/elyra What you'll learn, and how you can apply it In this hands-on workshop developers will learn: What is Elyra used for? What functions does Elyra have? e.g. such as Kubeflow runtime, pipelines, ENV setup What are the benefits of Elyra? How a developer/data scientist can use Jupyter notebooks, such as The Weather Project from the Data Asset eXchange? How to contribute to the open source project Elyra and how to potentially become a committer How to use Kubeflow to view results? ----- JupyterCon brings together data scientists, business analysts, researchers, educators, developers, core Project contributors, and tool creators for in-depth training, insightful keynotes, networking, and practical talks exploring the Project Jupyter ecosystem. https://jupytercon.com/ JupyterCon is possible thanks to the generous support of our sponsors, and the labor of many volunteer organizers. https://jupytercon.com/sponsors/ https://jupytercon.com/about/#Organizing%20Committee
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from JupyterCon · JupyterCon · 7 of 60

1 Interview   Joshua Patterson NVIDIA
Interview Joshua Patterson NVIDIA
JupyterCon
2 Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020
Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020
JupyterCon
3 Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020
Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020
JupyterCon
4 Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020
Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020
JupyterCon
5 Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community  | JupyterCon 2020
Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community | JupyterCon 2020
JupyterCon
6 Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020
Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020
JupyterCon
Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020
Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020
JupyterCon
8 Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020
Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020
JupyterCon
9 Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020
Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020
JupyterCon
10 Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020
Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020
JupyterCon
11 Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020
Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020
JupyterCon
12 Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020
Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020
JupyterCon
13 Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020
Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020
JupyterCon
14 Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020
Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020
JupyterCon
15 Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020
Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020
JupyterCon
16 Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020
Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020
JupyterCon
17 TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020
TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020
JupyterCon
18 Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework  | JupyterCon 2020
Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework | JupyterCon 2020
JupyterCon
19 Rebecca Kelly- A shared Python, R and Q  Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020
Rebecca Kelly- A shared Python, R and Q Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020
JupyterCon
20 Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020
Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020
JupyterCon
21 Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020
Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020
JupyterCon
22 Chiin Rui Tan- From Zero to Hero | JupyterCon 2020
Chiin Rui Tan- From Zero to Hero | JupyterCon 2020
JupyterCon
23 Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020
Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020
JupyterCon
24 Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020
Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020
JupyterCon
25 Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020
Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020
JupyterCon
26 Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020
Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020
JupyterCon
27 Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020
Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020
JupyterCon
28 Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era  | JupyterCon 2020
Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era | JupyterCon 2020
JupyterCon
29 Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020
Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020
JupyterCon
30 Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020
Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020
JupyterCon
31 David Pugh - Best practices for managing Jupyter-based data science  | JupyterCon 2020
David Pugh - Best practices for managing Jupyter-based data science | JupyterCon 2020
JupyterCon
32 Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020
Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020
JupyterCon
33 Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab  | JupyterCon 2020
Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab | JupyterCon 2020
JupyterCon
34 Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020
Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020
JupyterCon
35 Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020
Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020
JupyterCon
36 Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020
Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020
JupyterCon
37 Sheeba Samuel- ProvBook |JupyterCon 2020
Sheeba Samuel- ProvBook |JupyterCon 2020
JupyterCon
38 Philipp Rudiger - To Jupyter and back again | JupyterCon 2020
Philipp Rudiger - To Jupyter and back again | JupyterCon 2020
JupyterCon
39 Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020
Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020
JupyterCon
40 Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020
Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020
JupyterCon
41 Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020
Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020
JupyterCon
42 Devin Robison - Optimizing model performance | JupyterCon 2020
Devin Robison - Optimizing model performance | JupyterCon 2020
JupyterCon
43 Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020
Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020
JupyterCon
44 April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020
April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020
JupyterCon
45 Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020
Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020
JupyterCon
46 Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020
Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020
JupyterCon
47 Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020
Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020
JupyterCon
48 Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020
Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020
JupyterCon
49 Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020
Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020
JupyterCon
50 Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020
Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020
JupyterCon
51 Kenton McHenry - From Papers to Notebooks | JupyterCon 2020
Kenton McHenry - From Papers to Notebooks | JupyterCon 2020
JupyterCon
52 Ryan Herr - After model.fit, before you deploy| JupyterCon 2020
Ryan Herr - After model.fit, before you deploy| JupyterCon 2020
JupyterCon
53 Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020
Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020
JupyterCon
54 Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020
Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020
JupyterCon
55 Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020
Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020
JupyterCon
56 Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020
Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020
JupyterCon
57 Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020
Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020
JupyterCon
58 Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020
Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020
JupyterCon
59 Aaron Bray - Pulse Physiology Engine | JupyterCon 2020
Aaron Bray - Pulse Physiology Engine | JupyterCon 2020
JupyterCon
60 Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020
Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020
JupyterCon

The video introduces Elyra, an AI-centric extension for JupyterLab, and demonstrates its application in a data science process pipeline. It covers topics such as visual notebook pipeline editing, batch jobs, and pipeline execution. The video provides a comprehensive overview of Elyra and its potential in data science.

Key Takeaways
  1. Run multiple notebooks as a pipeline with one click
  2. Configure notebook properties
  3. Build a pipeline by dragging and dropping notebooks onto a canvas
  4. Connect notebooks in sequential or parallel order
  5. Select a Docker image for running a notebook
  6. Download and extract zip file
  7. Load and clean dataset
  8. Visualize trend and dependencies
  9. Create training, validation, and test splits
  10. Train data set and compare baseline models
💡 Elyra provides a visual notebook pipeline editor and supports running notebooks as batch jobs, making it a powerful tool for data science pipelines.

Related Reads

📰
Follow-up: The ArxivLens Protocol: Transforming Research Nois
Learn how to apply the ArxivLens Protocol to create dynamic grant-allocation pools that rebalance based on citation-impact signals, transforming research noise into actionable insights
Dev.to AI
📰
On July 1, 2026, arXiv will spin out from Cornell University, its home for the past 25 years, to become an independent nonprofit organization. Major funding support from Simons Foundation and Schmidt Sciences. Ditching the red for their website. [N]
arXiv is becoming an independent nonprofit organization after 25 years at Cornell University, backed by major funding, which will impact the future of research and academia
Reddit r/MachineLearning
📰
CS-NRRM™ Official Publications: Paper 1 and Paper 2 Are Now Available
Learn about the CS-NRRM's official publications on a 12-year longitudinal human observation archive and its significance in research and development
Medium · Data Science
📰
Found a potential mistake in an ICLR 2026 blogpost [D]
Verify a potential mistake in an ICLR 2026 blog post and learn how to effectively report errors in academic publications
Reddit r/MachineLearning
Up next
How to get started With Drug Discovery using BioAI: Computational Biology ( 4K UHD Med Masterclass )
Sudarshan's Multiverse
Watch →