Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020

JupyterCon · Intermediate ·📄 Research Papers Explained ·5y ago

Key Takeaways

This video presents a notebook style guide for research papers, emphasizing the importance of managing state, using pure functions, and documenting dependencies, with demonstrations using Jupyter Notebooks and Python.

Full Transcript

hi i'm kunal i'll be walking you through a style guide for jupiter notebooks i hope you find it useful for writing readable and maintainable notebooks particularly if you're new to jupiter if you're an old hand i'd love to hear what you thought about it and if you had any feedback before we dive in i should introduce myself i've been playing with notebooks for a fairly long time i've used them for everything from debugging android applications to actually diving into data that said i don't think i'm particularly qualified to write a style guide but i felt that one would be very valuable and i decided to take a stab at it anyways let's start by talking about why we need a style guide in the first place jupiter is extraordinarily flexible so having a style guide lets you structure your notebook in a way that you hopefully don't regret when you come back to your notebook it also lets you maintain consistency across your notebooks which can be really valuable in a team so it's very easy to pick up someone else's notebook and ramp up on it because you have certain expectations around how it works i'd recommend applying the style guide for any notebooks that you or someone else plans to use again in the future otherwise it's not worth wasting the time polishing it up and finally i've tried to keep the style guide as client agnostic as possible we are spoiled for choice these days and you can use notebooks using vs code emacs jupiter lab jupiter notebooks hydrogen and all these other lovely clients so it should work just about anywhere [Music] let's get started the most important point which is why i've put it first is that you should be very very careful managing state in your notebook i'm not the only one who worries a lot about this joel brought this up really well in his talk from 2018 where he talked about the tons of hidden and hard to reason about straight in notebooks the way i try to mitigate this is to only have global state that represents the sole purpose of that notebook for example if i'm doing a data analysis then my global state is going to be the data frame that contains everything needed for the analysis and nothing else if i'm building a notebook that is a tool then my global state is the input provided to the notebook and nothing else all the other functions are happening as pure functions that explicitly define their inputs and outputs that way i don't accidentally mutate the behavior of any other part of the notebook without knowing about it i'm going to try to make this a little more concrete with a dramatic reenactment which is also very very contrived so it might not work that well let's say i have a really slow function that fetches some data for me and i want to plot the square values of that data for some analysis i fetched the data i copy paste in some code from stack overflow to plot a graph and then i get a nice red line all right now i also want the cubes so i duplicate the cells i had i change the value of exp to 3 i get rid of the data fetch because i don't want to spend another hour waiting for it to download and then i run the cell then i get a nice cubic graph i publish these graphs and share them with my colleagues and they give me some feedback square and cubic look really similar i should change the color of at least one of the graphs to distinguish them so i come back i go back to my original cell i get rid of the line that fetches the data because i already have it in memory i add color equal to blue so that i plot a blue graph and then i run the cell again and i publish my notebook and go on my way unfortunately i was depending on exp which was still set to three and if i didn't remember the notebook and if i was being careless which i normally am i would have just not noticed this problem if i was following the style guide instead of having the duplicated code and the random exp value floating around in the ether i would have extracted an explore data function that explicitly defines that it depends on the data being passed in which is xs exp the value that is being raised to and a color that is optional that way i can come back modify this and play around with it i know exactly what's going into the function and i know what i'm going to get and it's much harder to break there are more advantages once you have small functions throughout your notebook you can have tests right next to those functions and be sure to debug the notebook very very quickly if anything breaks you can isolate it much faster than otherwise another advantage of having tests right next to your functions is that you can iterate on them with a very tiny test cell driven development kind of workflow i like to have the test right below the function definition and then i keep using ctrl enter to re-execute the function and the cell as i iterate on the function score once you have small functions that define your notebook you want to club them together into meaningful structure the way it adds structure to a notebook is to have lots and lots of headings if you have a good client then you'll also have a table of contents extension and that makes it much easier to jump around the notebook which is very good for long notebooks or tools you have a well-structured notebook that has lots of tiny tested functions now you should make sure it's not sloppy and that you can execute it from top to bottom that reduces a lot of cognitive load of running the notebook you don't have to worry about any special magical ways to run the notebook instead you can just press a run all and it will execute cleanly this is also the way i try to measure if a notebook is sloppy or not this does break the fact that you can't do pure literate programming anymore you have to define functions before you use them and you can't rely on some form of dangling to bring them back up but on the other hand it forces a slightly different style of programming which is programming bottom up now something paul graham describes really well and in it you're basically building increasing layers of abstractions towards what you need to accomplish and you can generally share those abstractions very well with util as utility functions now that you have a well-structured clean notebook that you can execute simply you want to make sure that you can execute the notebook in the future as well and that means you should document all the dependencies of your notebook notebooks have a lot of dependencies you want to make sure that you've documented the dependencies on code maybe you capture all the modules depending on with the requirements.txt you need to make sure that someone else running the notebook can fetch the data for it so document how you got the data or possibly include a dump of the data right next to the notebook does your notebook need a quantum computer to run on maybe it needs 20 gigs of ram you should at least call it out so that someone doesn't spend hours debugging why the notebook is failing does it need internet access is it relying on websites are those websites guaranteed to be up in the future do you need to give alternatives to fetch the data from those websites in some notebooks you might even want to go as far as controlling randomness if you want the results to be exactly the same and this might be particularly valuable for something like generative art now that you have a notebook that you know will run in the future it's time to polish it a little bit if you look at any english style guide the first thing they'll tell you is to remove any superfluous words and that also applies to the rest of the notebook if you have code you want to make sure that the code inside the notebook is towards the point of the notebook if there are a lot of utility functions they're just going to act as noise so you might as well extract them to an external python library for language of course just go and strip superfluous words and follow english style guides and also for the outputs a lot of libraries will throw a lot of logging information in their output cells you should decide if you actually need to show it to anyone else reading the notebook or not and set the log level accordingly you can even get rid of outputs completely by using the percentage percentage capture cell magic notebooks are both pros and code so you should make sure that you take care of both of them for pros you should follow really good english style guides i'm not going to waste your time trying to talk about english and instead i'll just point you to really good books on writing well is my favorite and of course there's always the classic elements of style and they'll have a much better return for your time investor similarly you should be very careful with the code you're in the notebook but that doesn't mean you forget all your software engineering principles so you should make sure you follow pep 8 which is why this is 0.8 all the lints i had some inline documentation follow naming conventions and all the other things you know from good programming practices be good with your abstractions i like to keep them minimal because in a notebook you're unlikely to reuse them apply kiss don't over abstract or use you ain't gonna need it notebooks also introduce additional structure because you're running them top to bottom and there's some space related relationship between cells so i try to keep cells that are close together highly cohesive and cells that are far apart lower coupling that way i can modify parts of the notebook without worrying about the rest of the notebook and if i'm looking at one section of the notebook all of that makes sense together again i'll try not to waste your time here and instead point you to a book that will give you a much better return for the effort you would invest and here i'd recommend go read being the pragmatic programmer which is an excellent book taking it from the top be very careful about how you manage global state only modify it with pure functions and transform it instead have tests throughout your notebook add headings and structure your notebook carefully make sure that the notebook can be executed from top to bottom make sure that the notebook can be executed in the future and document all your dependencies eliminate any clutter in the notebook and polish it be careful with the language in the notebook follow english style guides be careful with the code in the notebook and follow engineering best practices only take all of this effort when you plan to use the notebook again in the future there are several excellent notebooks online some of my favorites are by peter norvig you should check out his pie tues they're excellent both in terms of really good notebooks as well as really interesting things to read about fast dot ai's books are obviously amazing as well and they've taught so many people deep learning there were a bunch of references i checked out online before writing the style guide i particularly enjoyed reading the space telescope science institute style guide and of course clean code and jupyter notebooks was excellent as well i also wanted to thank my teammates for reviewing this presentation and giving me feedback thank you for your time if you have any questions comments or any advice please reach out to me on twitter or drop me an email this presentation started as a blog post a really really long time ago and i'm hoping that it evolves further based on what you tell me thank you you

Original Description

Brief Summary After writing several different notebooks for very different uses over several years, certain patterns have stood out that make notebooks easier to write, maintain and reason about. We'll go over the patterns I've observed so far -- carefully stolen both from good programming and prose style guides by far better authors, and make your next notebook much more elegant. Outline It's pretty common to have a style guide for code: for example, Google's are publicly available. Prose has a significantly richer history with several style guides from Strunk & White to The Chicago Manual of Style. At the same time, there doesn't seem to be any for literate programming; nor for notebooks in general. My experience with writing notebooks, and reading those by far more gifted authors has been that there are several useful patterns that make notebooks much more maintainable, easier to reason about and iterate on. Some of those patterns even make them easier to read. This talk is a first attempt at collecting some of these ideas: and collecting feedback to further refine them. Like all good style guides, this one is opinionated, and is meant to be ignored when appropriate. Outline Keep minimal global state core to the notebook, manipulate it with pure functions. Assertions and tests throughout the notebook, preferably at the end of each cell. Structure it appropriately with headings. Make sure it runs cleanly with a run-all. Make sure it's meaningfully reproducible. Minimize noise from unnecessary output, like logging. Follow best practices for prose. Follow best practices for programming: PEP8, YAGNI, etc. Hopefully you find these valuable in making your next notebook more elegant! ----- JupyterCon brings together data scientists, business analysts, researchers, educators, developers, core Project contributors, and tool creators for in-depth training, insightful keynotes, networking, and practical talks exploring the Project Jupyter ecosystem. https://jupytercon.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from JupyterCon · JupyterCon · 29 of 60

1 Interview   Joshua Patterson NVIDIA
Interview Joshua Patterson NVIDIA
JupyterCon
2 Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020
Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020
JupyterCon
3 Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020
Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020
JupyterCon
4 Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020
Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020
JupyterCon
5 Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community  | JupyterCon 2020
Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community | JupyterCon 2020
JupyterCon
6 Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020
Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020
JupyterCon
7 Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020
Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020
JupyterCon
8 Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020
Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020
JupyterCon
9 Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020
Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020
JupyterCon
10 Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020
Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020
JupyterCon
11 Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020
Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020
JupyterCon
12 Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020
Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020
JupyterCon
13 Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020
Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020
JupyterCon
14 Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020
Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020
JupyterCon
15 Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020
Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020
JupyterCon
16 Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020
Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020
JupyterCon
17 TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020
TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020
JupyterCon
18 Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework  | JupyterCon 2020
Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework | JupyterCon 2020
JupyterCon
19 Rebecca Kelly- A shared Python, R and Q  Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020
Rebecca Kelly- A shared Python, R and Q Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020
JupyterCon
20 Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020
Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020
JupyterCon
21 Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020
Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020
JupyterCon
22 Chiin Rui Tan- From Zero to Hero | JupyterCon 2020
Chiin Rui Tan- From Zero to Hero | JupyterCon 2020
JupyterCon
23 Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020
Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020
JupyterCon
24 Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020
Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020
JupyterCon
25 Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020
Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020
JupyterCon
26 Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020
Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020
JupyterCon
27 Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020
Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020
JupyterCon
28 Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era  | JupyterCon 2020
Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era | JupyterCon 2020
JupyterCon
Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020
Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020
JupyterCon
30 Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020
Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020
JupyterCon
31 David Pugh - Best practices for managing Jupyter-based data science  | JupyterCon 2020
David Pugh - Best practices for managing Jupyter-based data science | JupyterCon 2020
JupyterCon
32 Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020
Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020
JupyterCon
33 Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab  | JupyterCon 2020
Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab | JupyterCon 2020
JupyterCon
34 Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020
Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020
JupyterCon
35 Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020
Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020
JupyterCon
36 Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020
Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020
JupyterCon
37 Sheeba Samuel- ProvBook |JupyterCon 2020
Sheeba Samuel- ProvBook |JupyterCon 2020
JupyterCon
38 Philipp Rudiger - To Jupyter and back again | JupyterCon 2020
Philipp Rudiger - To Jupyter and back again | JupyterCon 2020
JupyterCon
39 Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020
Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020
JupyterCon
40 Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020
Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020
JupyterCon
41 Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020
Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020
JupyterCon
42 Devin Robison - Optimizing model performance | JupyterCon 2020
Devin Robison - Optimizing model performance | JupyterCon 2020
JupyterCon
43 Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020
Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020
JupyterCon
44 April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020
April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020
JupyterCon
45 Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020
Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020
JupyterCon
46 Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020
Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020
JupyterCon
47 Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020
Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020
JupyterCon
48 Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020
Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020
JupyterCon
49 Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020
Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020
JupyterCon
50 Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020
Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020
JupyterCon
51 Kenton McHenry - From Papers to Notebooks | JupyterCon 2020
Kenton McHenry - From Papers to Notebooks | JupyterCon 2020
JupyterCon
52 Ryan Herr - After model.fit, before you deploy| JupyterCon 2020
Ryan Herr - After model.fit, before you deploy| JupyterCon 2020
JupyterCon
53 Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020
Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020
JupyterCon
54 Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020
Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020
JupyterCon
55 Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020
Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020
JupyterCon
56 Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020
Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020
JupyterCon
57 Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020
Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020
JupyterCon
58 Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020
Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020
JupyterCon
59 Aaron Bray - Pulse Physiology Engine | JupyterCon 2020
Aaron Bray - Pulse Physiology Engine | JupyterCon 2020
JupyterCon
60 Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020
Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020
JupyterCon

This video presents a notebook style guide for research papers, covering topics such as state management, pure functions, and documenting dependencies. The guide is demonstrated using Jupyter Notebooks and Python, and is influenced by references such as the Space Telescope Science Institute style guide and Clean Code.

Key Takeaways
  1. Extract an explore data function that explicitly defines its dependencies
  2. Club small functions together into meaningful structure with headings and a table of contents
  3. Document dependencies, including code and data, to ensure the notebook can be executed in the future
  4. Remove superfluous words from code and outputs
  5. Follow English style guides for code and outputs
  6. Use external Python libraries for utility functions
  7. Set log level and use percentage capture cell magic
  8. Follow PEP 8 and good programming practices
💡 A well-structured notebook with small functions and tests can be executed from top to bottom for reduced cognitive load, and documenting dependencies is crucial for ensuring the notebook can be executed in the future

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
1942: Hitler's Gamble for Victory by Richard Hargreaves · Audiobook preview
Google Play Books
Watch →