My R Workflow for Reproduce-able & Portable Analysis

Bryan Jenks · Intermediate ·🛠️ AI Tools & Apps ·6y ago

Key Takeaways

Bryan Jenks demonstrates his R workflow for reproducible and portable analysis, utilizing tools such as R, Twitter, and VS Code, and covering concepts like package management, to-do management, and markdown documents. He also showcases the use of the `here` package, R's unit testing framework, and roxygen2 for documentation.

Full Transcript

what's going on so one of my other projects I've been working on recently is just from my own reference from my own purely for me reference some of the most useful code snippets that are not necessarily like snippets like having you know tabs stopped deep liar filter with whatever and like no like I'm talking like snippets like a comprehensive you know multi-line a chunk of code that serves a encapsulated purpose that is reproducible portable and useful pretty much everywhere ubiquitously in all of my projects but also the steps i personally follow for my own individualized workflow right now when i'm doing something in our that needs to be comprehensive this could be something for work it could be something personal where I actually want to develop a package for right now I'm kind of it's it's probably not complete I tend to intend to add to this over time but my workflow is destroyed right here I actually used this today even I went here I copied just a chunk of code I needed for a document and I put it in there and followed a couple of those other steps and got about my day so what does this look like so this whole repo is really just some images and my readme document is even the get ignore I don't even need but all this is just a readme document the table of contents hyperlinks to all the sections but really this is just my way of doing an analysis just like I want to perform an analysis on something you know create a new package fill out your description use I'll just go through it so you know create a new package here's what like a sample of a bare-bones description might look like for me I can just copy this mostly for like my person details and like the typical license I choose whatever for loading table of contents table contents for loading packages I found this on Twitter somebody posted this where you can actually use this to find all of your install packages and if there are any that are not installed it will install them and then it will load them all and it's doing this instead having endless library calling each individual package this will just el apply all library to each of your your package names here so I can just make a character vector of all my packages list it out there and just efficiently load anything or install anything that's not installed and then load everything incredibly useful I am now using this as like a standard like this is how I do things now for to do management if I want to do like if I have a bunt like a repo full of like a bunch of our markdown documents which this happened recently actually it was really helpful when I was doing book down I kind of found a hacky way of doing search for all our markdown documents in the directory and then do to do our or to do or on it and this would show me all of my to-do items in all my our marketing documents now with to do or our it's basically like one of those vs code plugins where if you have because markdown is basically HTML it's just simplified syntax for HTML tags which is why you can easily compile and do things with HTML like actually use CSS to customize your output of HTML documents written in our markdown which I do because it's all basically HTML so because your comments in a in our markdown are HTML comments unless they're in a specific code chunk you can use the same sort of tags where it says you know in a in a HTML comment to do and then whatever your to-do item is now this package to do are will actually pick up multiple types of tags to do bug fix hack fix me whatever all those things get picked up and then itemized by you know where what file is located in and that way you could easily like set yourself reminders and things like this that don't appear in your output on an ongoing basis use this as your to-do management and I do use this and I really enjoy that as well if you have like a package like a whole package you could just do a single one-liner and it works so this would be like a less hacky way of doing it I'm not sure if a book down project counts as a package for this or not I haven't tested it but either way you know whichever option it just finding all your to-do items really great just advice to create a dative directory and our directory for to put all your our scripts in updating your our building nor or get ignore and/or then we move into I was all just set up so now we move into like actually doing some analysis we can write begin writing our our marketing document you know give you know overview some initial analysis where you got your data what what you've done with it or whatever some common advice never use require our library in a packaged analysis well I mean that's like if you're using a package if you're distributing a package to somebody like on cran or something you don't do that but I mean my package loading thing is kind of like you know if I'm sending somebody in the office a packaged analysis of something to run the analysis you need to have those libraries which is why you load or install those required libraries so in that case kind of don't follow my own advice but if you're gonna write like a package for distribution to the free internet you might want to reconsider using those at all using our project file even in a package obviously you do that so that you can in our studio go up here and all of your recent packages you can just open up and it takes you to that directory all those files everything the way you left it is there and it's in that package it's also really useful because you can use the here package and that will let you use the or project wherever the are proj is which is the root directory it uses that as the root directory for your project so this way if I have a data directory in an our directory for my data and my our scripts that are processing the data and then being the our scripts are being sourced in my our markdown document in my root directory then in our markdown the are marked I can use here to say you know here which is basically outputting a string a filepath string that is to the root directory and then in just quotes I can just do data or R and then in comma it's vectorized comma next a double quote is the name of the script I am sourcing and there you go two pieces of string two strings and using the here function then you have basically relative file paths with minimal overhead and it's I like it better than the like dot dot slash dot slash whatever like it's just it makes more sense to me I like it better than writing out a file path of all the slashes like we take with typically do with HTML in the web languages so that and then you know create your are functions as you need to I also I do our functions sometimes but also i write whole our scripts for each data source so if I'm like importing a table or some sort of like if I have a like a sequel query and I'm returning a bunch of records that have been you know twisted formatted returned cross apply join whatever I all this stuff and I have a tabular data set to return I might have a separate our script that will then run through like the DB I and ODBC packages it will run that sequel code as like a string of text through my DVI connection and it will take that code and then run that through that connection and then now I have a like a data frame or whatever and then I can take that data frame and if I didn't do all of my stuff in sequel you could you know pipe that data frame through multiple deep layer steps and then finally do an assignment and assign that to a variable and because it's assigned to a variable in an our script when you source that script in your are marked and document all you get is the end product data frame variable with all the data so in that way you could easily just outsource all of your data manipulation data munging all that stuff to an R script and with each our script for each data source so it's all encapsulated in one area something goes wrong you can easily pinpoint the source and you don't have all that code filling up your our Marco document which is really just for like your analysis and your final report product and then if you have multiple types of scripts like data import scripts or custom functions you could easily just append the or prepend then the name of the our function with some sort of acronym that makes something sense say in some sense to you like you could prepend one with like data or like data this and the name of the connection or like if you end for function and then the actual name of the our function so you can easily like separate out in a single directory because you can't have route direct you can't have sub directories in your are directory in a r package or project at least that's what the documentation says so yeah writing unit tests with tests that i love test that it actually makes writing unit tests enjoyable it reads like english the function is test that so it's like okay we're testing that something equals something and that's the nay that's what the same of the functions is the names of the functions are expect something and it could be expect equals and then you put in two things and they should be equal if they're if they are equal the test is silent because there's nothing to talk about it passed but if they're not equal then raise a flag or an air or something and that's what makes unit testing and are so enjoyable is that it's just it reads pretty much like english and the test suite is just great to attest that so test that I am expecting well actually you start test that with like test that and then in quotes what you're testing test that the DB I sequel connection functions and that could just be like expect equals where you call your DVI connection damn can't you call your DVI connection and then in the other argument of expect equals you actually have a variable that has a completed dbi connection or you know that's a bad example but this is it is actually a real test i wrote at work to test our sequel connection so there is that the next step would be just like testing things fixing and iterating basically fix all your errors warnings and notes when you run our command check or just dev tools check document all of your our functions so if you have data import you can actually do a specific type of our oxygen to documentation on data files specifically I have never done that yet unlike a custom data source because typically we're just importing from sequel at work so that's really all I really do at this point but for each of your custom functions you can provide our oxygen to documentation so you actually produce documents like law tech documents and there's like a little hyperlink for info compiling your documentation it'll create a man directory and each of your functions will have their own documentation so like for my runes package that's a call it rooms and then I could do rooms and like this documentation here this documentation you're used to seeing in our this is actually produced by our oxygen to documentation and what does that look like like what is the documentation syntax to look like so for my our package if I was actually doing some development on it dududududu I don't have it open our room so my runes function the actual function itself here's where the documentation looks like it's just all this stuff as the header and then the actual function itself so all of that there you go that's it you document above your function and provide all this stuff and it produces basically law tech output and then that's what you see in the help area in our whenever you call you know question mark what the heck is this thing I have a this is actually stole this I think from the book down from book down which is just a way of documenting with biblia with a bib file your packages you've used in our so because we actually have a packages variable from the loading packages section in the beginning of my document you could technically just use instead of writing all this you could actually just write bib with a vector of instead of a vector you could just use the variable of the package names and then write the bid package alright the right the bib bib file and then I have an additional bib file for like for like academic references or whatever and then in the yellow portion having those two big files referenced in a yam L array and there you go all your references packages academic sources etc all that encapsulated in there and boom and then just like some different clean code related things if you've read clean code by Steven C Martin I have I don't really do any object-oriented programming so it's not completely relevant but like in general the idea of having clean legible human readable and human intended code running Deb tools check checking and make sure that all of your stuff is fixed and there's nothing to be worried about or it's documented about why it's there using lint are for linting a whole package so you can see which like common conventions you should employ to make more legible code having a readme or like an arm arc net document that compiles to mark down for a readme with this whole readme I have right here I was preferred to write a plain readme instead of using our markdown on compiling into it it's just easier to do it this way keeping a change log in my news markdown news is to the typical announcement document for our packages I basically just treat news as my change log in my runes package and I use this is basically a snippet of a beginning change log for semantic versioning where you keep every little items itemized thing unreleased released when you released it and then each like facet of like what type of change was occurred or has occurred and like what about it so keeping a change log so that you know what you did when you did it I mean yeah you get is a change log but sometimes you don't want to have to scroll through a git log or read all your commits and then open up the files you can usually just see like a human readable format here and if you're curious about something you have the day you can go and look at the commit history and look at the like the code diff whatever your workflow might be but and please for me I keep a change log and I use this change log format for all of our work stuff - and then I'm gonna have a ever-growing tip section like that you can use this exact piece of text here a comment space text whatever it is space and then four or five of these dashes inside of an our code chunk actually adds an item to the what is it what do you call that like a outline document outline so if I I don't think I have an R markdown document there I can't do that oh my CV that'll work so if you actually add that piece it'll add a new section here so let's see employment is where we're currently at and then we have the employment header the employment code chunk and you can see both of those here nested but inside the employment code chunk I can actually do hello YouTube space and then one two three four five and it actually adds that to your outline so you can actually document pieces of your code chunk without actually having to have like you're only markdown headers in your document outline it also will add it to this table of contents here in our studio so all useful tips and that is my workflow document it's a work in progress I intend to continually update this because I actually I even used it today at work to grab my package loading script right here that I jacked from Twitter yeah I use this and I'm going to be using this in the future to just keep my own head straight when it comes to like what what is the next step I typically do in development for some comprehensive our thing it this is how I'm doing things or at least this is not going to be like how you should do things this is how I do things and if you have suggestions please feel free to contribute but this is my workflow as it stands right now and to be edited more in the future so read through that if you like enjoy use it whatever have fun

Original Description

▬▬▬▬▬▬▬▬▬▬ ► CHECK THESE OUT ◀︎▬▬▬▬▬▬▬▬▬▬ 📧️ NEWSLETTER: https://bryanjenkstech.ck.page/d4ec0713d5 💬 DISCORD: https://discord.gg/MxCVshN 🗣️ SOCIALS: https://streamerlinks.com/tallguyjenks ▬▬▬▬▬▬▬▬▬▬ ► SUPPORT THE CHANNEL ◀︎▬▬▬▬▬▬▬▬▬▬ 👨🏻‍💻️ GITHUB SPONSOR: https://github.com/sponsors/tallguyjenks 🙏🏻️ AMAZON WISHLIST: https://www.amazon.com/hz/wishlist/ls/17FRLE35NC7G8?ref_=wl_share 😇 PATREON: https://www.patreon.com/bryanjenks?fan_landing=true 🙌🏻️ YOUTUBE MEMBERSHIP: https://www.youtube.com/c/BryanJenksTech/join ☕ BUY ME A COFFEE: https://www.buymeacoffee.com/tallguyjenks 💵 PAYPAL: https://www.paypal.me/tallguyjenks 📊️ FREE STOCKS: http://join.robinhood.com/bryanj67 ▬▬▬▬▬▬▬▬▬▬ ► My Newsletter ◀︎▬▬▬▬▬▬▬▬▬▬ 📧️ NEWSLETTER: https://bryanjenkstech.ck.page/d4ec0713d5 ▬▬▬▬▬▬▬▬▬▬ ► My Gear ◀︎▬▬▬▬▬▬▬▬▬▬ ⚙️GEAR: https://kit.co/tallguyjenks/my-gear ▬▬▬▬▬▬▬▬▬▬ ► Questions? ◀︎▬▬▬▬▬▬▬▬▬▬ ❓️FAQ: https://github.com/BryanJenksCommunity/FAQ/discussions ▬▬▬▬▬▬▬▬▬▬ ► Social ◀︎▬▬▬▬▬▬▬▬▬▬ 💬 DISCORD: https://discord.gg/MxCVshN 🐦 TWITTER: https://twitter.com/tallguyjenks 📺 TWITCH: https://www.twitch.tv/tallguyjenks 📜️ MEDIUM: https://medium.com/@tallguyjenks 💼️ LINKEDIN: https://www.linkedin.com/in/bryanjenks/ 🖥️ GITHUB: https://github.com/tallguyjenks 🌎 WEBSITE: https://www.bryanjenks.dev/ ▬▬▬▬▬▬▬▬▬▬ ► The Rest ◀︎▬▬▬▬▬▬▬▬▬▬ Thanks for watching and if you liked this video please leave a 👍🏻 Subscribe to my channel and click the 🔔 icon for notifications when I post a new video If you read this far put a 🐄 in the comments! ▬▬▬▬▬▬▬▬▬▬ ► TAGS ◀︎▬▬▬▬▬▬▬▬▬▬ #obsidian #zettelkasten #bryanjenks
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Bryan Jenks · Bryan Jenks · 20 of 60

1 rsync for Linux Backups - The Final Barrier to Migration
rsync for Linux Backups - The Final Barrier to Migration
Bryan Jenks
2 (un)Installing Packages From The (AUR) Arch Linux User Repository
(un)Installing Packages From The (AUR) Arch Linux User Repository
Bryan Jenks
3 Full RStudio Set Up On Arch Linux
Full RStudio Set Up On Arch Linux
Bryan Jenks
4 Fix RMarkdown Compilation Outside Of RStudio on Arch Linux
Fix RMarkdown Compilation Outside Of RStudio on Arch Linux
Bryan Jenks
5 Markdown Document Autocompilation on Arch Linux
Markdown Document Autocompilation on Arch Linux
Bryan Jenks
6 Cronjobs, Cronie, & Crontab on Arch Linux
Cronjobs, Cronie, & Crontab on Arch Linux
Bryan Jenks
7 Setting Up Slack With i3 on Arch Linux
Setting Up Slack With i3 on Arch Linux
Bryan Jenks
8 VS Codium (VS Code) on Arch Linux With A Shell Script
VS Codium (VS Code) on Arch Linux With A Shell Script
Bryan Jenks
9 Vimwiki Plugin For Vim Research Management on Arch Linux
Vimwiki Plugin For Vim Research Management on Arch Linux
Bryan Jenks
10 Neomutt with Protonmail on Arch Linux - LARBS - Thinkpad x220
Neomutt with Protonmail on Arch Linux - LARBS - Thinkpad x220
Bryan Jenks
11 Command Line Task Management With Taskwarrior and Taskell On Arch Linux / Thinkpad x220
Command Line Task Management With Taskwarrior and Taskell On Arch Linux / Thinkpad x220
Bryan Jenks
12 Exploring My Fitbit Data With R in RStudio on Arch Linux
Exploring My Fitbit Data With R in RStudio on Arch Linux
Bryan Jenks
13 Tellico Collections On Arch Linux
Tellico Collections On Arch Linux
Bryan Jenks
14 LaTeX, Biber, and Live Compilation on Arch Linux
LaTeX, Biber, and Live Compilation on Arch Linux
Bryan Jenks
15 R Markdown Programming Language Support
R Markdown Programming Language Support
Bryan Jenks
16 R Markdown to make HTML Wiki's with Tabbed pages
R Markdown to make HTML Wiki's with Tabbed pages
Bryan Jenks
17 Announcement: New Video Series on R - "Comprehensive R Package Reviews"
Announcement: New Video Series on R - "Comprehensive R Package Reviews"
Bryan Jenks
18 R Package Review Episode 1: Magrittr
R Package Review Episode 1: Magrittr
Bryan Jenks
19 R Package Review Episode 2: Vitae
R Package Review Episode 2: Vitae
Bryan Jenks
My R Workflow for Reproduce-able & Portable Analysis
My R Workflow for Reproduce-able & Portable Analysis
Bryan Jenks
21 R Package Review Episode 2: Here
R Package Review Episode 2: Here
Bryan Jenks
22 Introduction to Regular Expressions
Introduction to Regular Expressions
Bryan Jenks
23 My Workflow for Reading, Organizing, and Maintaining Articles, Papers, & Books
My Workflow for Reading, Organizing, and Maintaining Articles, Papers, & Books
Bryan Jenks
24 My First Python Project Dealing With Finance Data
My First Python Project Dealing With Finance Data
Bryan Jenks
25 R Package Review Episode 4: Beepr
R Package Review Episode 4: Beepr
Bryan Jenks
26 RMarkdown Customized Styles with CSS and HTML Output
RMarkdown Customized Styles with CSS and HTML Output
Bryan Jenks
27 RMarkdown Custom ID Selectors for Dynamic Headers and CSS
RMarkdown Custom ID Selectors for Dynamic Headers and CSS
Bryan Jenks
28 HTML Headers in RMarkdown Documents For Personal/Corporate Branding
HTML Headers in RMarkdown Documents For Personal/Corporate Branding
Bryan Jenks
29 My Semi-Complete VimWiki Workflow
My Semi-Complete VimWiki Workflow
Bryan Jenks
30 How To Make An Automated Resume With Github
How To Make An Automated Resume With Github
Bryan Jenks
31 How I Use Fuzzy Finding In the Terminal with fzf (workflow++)
How I Use Fuzzy Finding In the Terminal with fzf (workflow++)
Bryan Jenks
32 How I Organize and Create My Research Notes (Research Workflow++)
How I Organize and Create My Research Notes (Research Workflow++)
Bryan Jenks
33 How I Use fzf.vim To Improve My Programming Workflow
How I Use fzf.vim To Improve My Programming Workflow
Bryan Jenks
34 Website Updates, JavaScript, R, Shiny, Vue.js And More
Website Updates, JavaScript, R, Shiny, Vue.js And More
Bryan Jenks
35 How To Use AWK (Tutorial)
How To Use AWK (Tutorial)
Bryan Jenks
36 Bash Script Review: My Battery Power i3Blocks Module
Bash Script Review: My Battery Power i3Blocks Module
Bryan Jenks
37 Channel Updates, Where I've Been, And Where I Want To Go With YouTube
Channel Updates, Where I've Been, And Where I Want To Go With YouTube
Bryan Jenks
38 How To Use Neomutt 📨 From MuttWizard  (Basics Tutorial)
How To Use Neomutt 📨 From MuttWizard (Basics Tutorial)
Bryan Jenks
39 How To Use Jupyter Notebooks 📔 (Basics Tutorial)
How To Use Jupyter Notebooks 📔 (Basics Tutorial)
Bryan Jenks
40 How To Use Trello In 2020 (The Definitive Guide)
How To Use Trello In 2020 (The Definitive Guide)
Bryan Jenks
41 Macbook Pro 16 Inch 2020: Unboxing and Review
Macbook Pro 16 Inch 2020: Unboxing and Review
Bryan Jenks
42 How To Use Github's New Personal README and Wakatime
How To Use Github's New Personal README and Wakatime
Bryan Jenks
43 How I Set Up My 2020 Macbook Pro 16
How I Set Up My 2020 Macbook Pro 16
Bryan Jenks
44 My First Week At WGU (Western Governors University), Coffee, And Channel Updates
My First Week At WGU (Western Governors University), Coffee, And Channel Updates
Bryan Jenks
45 The Best Academic Resources & Citation Managers: OrcID, Zotero, Mendeley & More!
The Best Academic Resources & Citation Managers: OrcID, Zotero, Mendeley & More!
Bryan Jenks
46 R Package Review Episode 5: TodoR
R Package Review Episode 5: TodoR
Bryan Jenks
47 R Package Review Episode 6: Patchwork
R Package Review Episode 6: Patchwork
Bryan Jenks
48 Interview With Bryan of Norseman Leather Works
Interview With Bryan of Norseman Leather Works
Bryan Jenks
49 Zettelkasten Work in Obsidian for Research | VOD
Zettelkasten Work in Obsidian for Research | VOD
Bryan Jenks
50 How I Live With Adult ADHD (Attention Deficit Hyperactivity Disorder) [Time Stamped]
How I Live With Adult ADHD (Attention Deficit Hyperactivity Disorder) [Time Stamped]
Bryan Jenks
51 Zettelkasten Research Work in Obsidian | VOD
Zettelkasten Research Work in Obsidian | VOD
Bryan Jenks
52 Obsidian VS Roam Research: Why I Chose Obsidian
Obsidian VS Roam Research: Why I Chose Obsidian
Bryan Jenks
53 My 2020 Comprehensive Obsidian Workflow For Zettelkasten and Evergreen Notes
My 2020 Comprehensive Obsidian Workflow For Zettelkasten and Evergreen Notes
Bryan Jenks
54 How I Use Raindrop.io As The Entry Point of My Zettelkasten Workflow In Obsidian
How I Use Raindrop.io As The Entry Point of My Zettelkasten Workflow In Obsidian
Bryan Jenks
55 Comprehensive Overview | Obsidian Block References & Transclusion | Sorry Roam!
Comprehensive Overview | Obsidian Block References & Transclusion | Sorry Roam!
Bryan Jenks
56 Easy YouTube Timestamps From Final Cut Pro X With Python!
Easy YouTube Timestamps From Final Cut Pro X With Python!
Bryan Jenks
57 NEW | Obsidian Insiders Release 0.9.10 | Plugins & Official API
NEW | Obsidian Insiders Release 0.9.10 | Plugins & Official API
Bryan Jenks
58 TOP 5️⃣️ | FAVORITE THINGS IN OBSIDIAN
TOP 5️⃣️ | FAVORITE THINGS IN OBSIDIAN
Bryan Jenks
59 Comprehensive Obsidian & Git Sync Workflow 🔄️ | Your Mind Under Version Control
Comprehensive Obsidian & Git Sync Workflow 🔄️ | Your Mind Under Version Control
Bryan Jenks
60 Obsidian Mermaid Livestream Highlights | Zettelkasten Resources, YouTube Advice, Data Science
Obsidian Mermaid Livestream Highlights | Zettelkasten Resources, YouTube Advice, Data Science
Bryan Jenks

Bryan Jenks shares his R workflow for reproducible and portable analysis, covering topics like package management, to-do management, and markdown documents. He demonstrates the use of various tools and techniques, including the `here` package, R's unit testing framework, and roxygen2 for documentation. By following this workflow, viewers can improve their own analysis and documentation skills.

Key Takeaways
  1. Create a new package
  2. Fill out description and license
  3. Load packages using a Twitter-posted script
  4. Use custom to-do management system for markdown documents
  5. Create a `data` directory and an `R` directory for scripts and data
  6. Use `here` package to load required libraries in a packaged analysis
  7. Write R scripts for data manipulation and data munging
  8. Use R's unit testing framework to write tests
  9. Use roxygen2 to document R functions
  10. Use devtools to check and document R packages
💡 Using a comprehensive workflow like Bryan Jenks' can improve the reproducibility and portability of analysis, making it easier to share and collaborate with others.

Related AI Lessons

Up next
How to Open HPL Files (HP-GL Plotter)
File Extension Geeks
Watch →