Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020
Key Takeaways
The video introduces Elyra, an AI-centric extension for JupyterLab, which provides a visual notebook pipeline editor and supports running notebooks as batch jobs, and demonstrates its application in a data science process pipeline with tools like JupyterLab, Elyra, and Python.
Full Transcript
hello everyone today i want to bring you intro to envira which is ai centric extensions to jupiter my name is even and today i have my colleague here and edward and stachoozi i'm a data scientist at ibm codex team hey everybody my name is ed and i'm also a data scientist on the kuday team hello everyone i'm sashathi i'm a developer advocate and data scientist in kodai team and first of all thank you so much for joining us today and i hope you are having a great time in the conference so let's quickly go over today's agenda starting with our team's introduction we will introduce you to two of our open source projects elira and data asset exchange and end with a demo showing how to use them together with reference link that you can take back from the talk so kodaite center for open source data and a technologies we are 30 plus developers and data scientists working on contributing to open source projects that covers entire data science life cycle as you can see on the screen on the right side of the slide this is the projects that we are contributing our goal is to make ai systems easy to create deploy and maintain today we are going to focus on two of our in-house open source projects first is elira our nexus data asset exchange and they both are marked on the slide here and now over to ed and he will introduce you all to elira so what exactly is a lyra well at its core allyra is simply a set of lab extensions catered to people who like to work with artificial intelligence and machine learning using notebooks our goal is to help data scientists machine learning engineers and software developers through the most common model development life cycle complexities alara was officially announced as an open source project by ibm this past april if you'd like to access the github repo or the docs you can follow the links here we've shared on the screen the lyra boasts a host of features to achieve its goal which we will discuss shortly for now notice from the preview of a lyra's ui here on the right that a alara doesn't radically reinvent jupiter lab to provide you with all of its functionalities it simply extends the distinct jupiter lab environment that we've all come to know and love over the past few years added tabs toolbars and launchers so what can actually do for you well you can really break down a lyra's core features into five items the first alara provides an easy to use notebook pipeline visual editor why build your next ai model using a pipeline editor well it's because a lyra provides a visual way to convert multiple notebooks covering pre-processing steps experimentation optimization and deployment onto batch jobs or workflows second allyra supports rounding notebooks as batch jobs directly in the ui making model training easier the iris supports easy creation and insertion of reusable code snippets fourth allyra pipelines support git version control allowing rollbacks to working versions of the code backups and most importantly sharing amongst the members fifth and finally are exposes python scripts as first-class citizens this allows users to locally edit their scripts and execute them against local or cloud-based resource seamlessly even will demonstrate all these features for you shortly in our demo but for now i'll pass it back to shruti to talk about ibm's data asset exchange that's it so let's have a quick overview of data asset exchange we called as tax so tax offers high quality wetted data by wetted i mean we start by tracing origin and merit of the data set learn about the usability rights and ownership from here we take all the information and we create a standard metadata and perform internal legal review before releasing the data sets to data asset exchange data sets here have clearly defined open data license we provide exclusive access to ibm research datasets that have played crucial role in building popular ai systems like ai debater entity recognition and so on so what do we offer do we offer only data sets no not really we offer much more than just raw data set these data sets comes with tutorials that demonstrates the usage of data and can be directly exported as watson studio project or a watson studio compatible notebook we also have data glossary that that will be that you can use for learning more about the attributes of the data set we also have an option to preview the data that can help you understand your data before using them in your project so this is our watson studio project so you can directly use all the notebooks in your watching studio project industrial use cases have been created using dax datasets and they are available as industrial accelerators which you can directly use with cloudback for data we have resources listed in slides that you can refer to know more about it but now over to even for the demo thanks sashusi so before i show you the demo i want to introduce you how the data science process look like so majorly a data science pipelines generally include five steps from data extraction to result interpretation as a data scientist i have to run multiple notebooks be one project especially if i change anything in one notebook then i have to run the other notebooks one by one it i will save the efforts by constructing them as a pipeline and run them all with one click so now let me walk you through the demo so this demo will show you by running ford notebooks using a lyra i will introduce you two ways to run your pipeline both on local machine and on coop floor pipelines to build a pipeline you can simply drag and drop the notebooks onto the canvas and connect them just as easy as drawing a graph you can arrange the notebooks in sequential or parallel order need to configure the notebook properties for several informations we put the environment variable dataset url as a dataset download link from dax we copy paste the link into the section this notebook produces output files i specify the file name as jfk weather.csv which is download through the first notebook into the directory the output file uploaded to cloud storage were saved in local directory after notebook processing complete we also need to select a docker image that will be used to run a notebook you can bring your own image or choose from the predefined public images such as a panda image a tensorflow image or pytorch image we choose pandas as a data image here because we mainly use pandas and numpy packages in the notebook we could only select one docker image each time however if you're running on your local machine the image you choose does not matter also you can declare the file dependencies as jfkweatherclaim.csv but this is not necessary since the files are already in the same folder i redefine the name of the pipeline by right-click my trackpad change the name to even dot pipeline also you can add comments to provide short descriptions this helps your colleague knows the function of each node before going into it now everything is all set let's save the pipeline and submit it to run locally the running logs are shown in terminal if everything completes successfully a message will pop up the run outputs will be shown in the notebook cells see on the left side the notebooks are just updated a few seconds ago now let me quickly walk you through the content of each notebook firstly the load data notebook downloads extract the zip file through the dataset url link and save it as jfkweather.csv file the part 1 notebook loads the data set downloaded from the previous notebook replace wrong values such as non t 0.02.01 second filter out everything out of the range reverse the column into numerical type clears out the missing values rename the columns and then save the clean data set as jfkweatherclaim.csv file the part 2 notebook selects five columns from the clean data and visualize the trend it also explores the dependencies between those columns then visualize the trend of rolling average in 2017. the parts3 notebook let's explore the approaches to predict future temperature by using time series data set creates the training validation and test splits train the data set and then compare the performances between different baseline models using mean squared air and it build arima model predictions for the first 48 hours of validation set you can get this notebook on the asset exchange page by clicking on the preview notebooks you can see we have street notebooks very similar to what we have in the demo pipeline the notebooks were designed to run on whatsapp studio and ibm cloud i made minor changes to those notebooks and to fit this into the jupiter environment so now let's submit this pipeline again on cool flow before running you need to configure your runtime by putting quidditch into each section i already pre-configured my runtime so we can just submit the pipeline on group flow in the run process eliro will generate gather and package the required artifacts upload them to cloud storage and triggers the pipeline execution in a selected crypto pipeline environment pipeline runs are listed in the experiments panel the graph panel displays the execution status of each node you can see the part 1 notebook is completed and has green check mark at the right top after all run is complete you can access the pipeline's output artifacts using the supported s3 client clicking on data folder which includes the jfk jfk data file and also the clean version and also on this page we have all of the inputs and output files listed all terribles are input files by clicking one of the output files and download you can see the result of the notebook is saved in the output cells is the demo i i would like to show you today and you're if you're interested and want to try this by yourself please feel free to install your lyra and clone my library examples repo to get the notebooks you can also get the notebooks on data asset exchange and run from there elira is an open source project so please feel free to open issues or enhancement requests on the elia repo there are also a few other talks related to elira from ibm code team please also feel free to check this out and this is everything i want to show you today and thanks for coming
Original Description
Brief Summary
Have you ever wanted to run multiple notebooks in sequential and parallel order with one click? If so, come join our Intro to Elyra - an AI centric extension for JupyterLab session to learn how you can get set up and running with Elyra!
Outline
This workshop introduces users how to run data science notebooks with Elyra - an AI centric extension for JupyterLab. The notebook pipeline downloads a free dataset from Data Asset eXchange, then extracts, cleanses and analyzes the data file. The cleaned data file is subsequently used to predict certain weather features.
In this session you will have a hands-on opportunity to learn how to build an Elyra pipeline. Get a head start today by checking out Elyra AI github repo→ https://github.com/elyra-ai/elyra
What you'll learn, and how you can apply it In this hands-on workshop developers will learn: What is Elyra used for? What functions does Elyra have? e.g. such as Kubeflow runtime, pipelines, ENV setup What are the benefits of Elyra? How a developer/data scientist can use Jupyter notebooks, such as The Weather Project from the Data Asset eXchange? How to contribute to the open source project Elyra and how to potentially become a committer How to use Kubeflow to view results?
-----
JupyterCon brings together data scientists, business analysts, researchers, educators, developers, core Project contributors, and tool creators for in-depth training, insightful keynotes, networking, and practical talks exploring the Project Jupyter ecosystem.
https://jupytercon.com/
JupyterCon is possible thanks to the generous support of our sponsors, and the labor of many volunteer organizers.
https://jupytercon.com/sponsors/
https://jupytercon.com/about/#Organizing%20Committee
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from JupyterCon · JupyterCon · 7 of 60
1
2
3
4
5
6
▶
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Interview Joshua Patterson NVIDIA
JupyterCon
Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020
JupyterCon
Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020
JupyterCon
Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020
JupyterCon
Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community | JupyterCon 2020
JupyterCon
Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020
JupyterCon
Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020
JupyterCon
Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020
JupyterCon
Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020
JupyterCon
Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020
JupyterCon
Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020
JupyterCon
Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020
JupyterCon
Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020
JupyterCon
Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020
JupyterCon
Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020
JupyterCon
Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020
JupyterCon
TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020
JupyterCon
Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework | JupyterCon 2020
JupyterCon
Rebecca Kelly- A shared Python, R and Q Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020
JupyterCon
Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020
JupyterCon
Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020
JupyterCon
Chiin Rui Tan- From Zero to Hero | JupyterCon 2020
JupyterCon
Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020
JupyterCon
Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020
JupyterCon
Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020
JupyterCon
Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020
JupyterCon
Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020
JupyterCon
Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era | JupyterCon 2020
JupyterCon
Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020
JupyterCon
Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020
JupyterCon
David Pugh - Best practices for managing Jupyter-based data science | JupyterCon 2020
JupyterCon
Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020
JupyterCon
Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab | JupyterCon 2020
JupyterCon
Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020
JupyterCon
Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020
JupyterCon
Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020
JupyterCon
Sheeba Samuel- ProvBook |JupyterCon 2020
JupyterCon
Philipp Rudiger - To Jupyter and back again | JupyterCon 2020
JupyterCon
Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020
JupyterCon
Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020
JupyterCon
Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020
JupyterCon
Devin Robison - Optimizing model performance | JupyterCon 2020
JupyterCon
Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020
JupyterCon
April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020
JupyterCon
Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020
JupyterCon
Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020
JupyterCon
Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020
JupyterCon
Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020
JupyterCon
Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020
JupyterCon
Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020
JupyterCon
Kenton McHenry - From Papers to Notebooks | JupyterCon 2020
JupyterCon
Ryan Herr - After model.fit, before you deploy| JupyterCon 2020
JupyterCon
Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020
JupyterCon
Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020
JupyterCon
Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020
JupyterCon
Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020
JupyterCon
Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020
JupyterCon
Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020
JupyterCon
Aaron Bray - Pulse Physiology Engine | JupyterCon 2020
JupyterCon
Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020
JupyterCon
More on: Reading ML Papers
View skill →Related Reads
📰
📰
📰
📰
Follow-up: The ArxivLens Protocol: Transforming Research Nois
Dev.to AI
On July 1, 2026, arXiv will spin out from Cornell University, its home for the past 25 years, to become an independent nonprofit organization. Major funding support from Simons Foundation and Schmidt Sciences. Ditching the red for their website. [N]
Reddit r/MachineLearning
CS-NRRM™ Official Publications: Paper 1 and Paper 2 Are Now Available
Medium · Data Science
Found a potential mistake in an ICLR 2026 blogpost [D]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI