Reproducibility with renv

Adrian Zetner

2023-07-04

Sharing R Analyses with Collaborators

  • “I tried to run your code but it says I’m missing X” 😟

  • “Your code used to run fine but now it’s not working” 😖

  • “I upgraded my software and now your code is junk” 💔

Reproducibility in Data Driven Analysis

  • Crucial for collaboration and transparency in science

  • Software and package versions can differ between computers

  • Potential code failure or inconsistent results

  • Solution: Replicate the environment

Replicate the environment ♊

  • Manual replication of the environment is impractical

  • Potential solutions

    • Virtual environments (Python, Conda, etc)

    • Docker

    • Binder/Jupyter Notebooks

Virtual Environments 🌌

Containers 🎁

Notebooks 📑

R Projects 📂

Packages 📦

  • Extend base R with more functionality

  • Called libraries or packages in other languages

  • Can require compilation on your machine (source) or pre-compiled as a binary package

Repositories 🏛️

  • Source of packages 📦

  • Many different available

    • CRAN

    • Bioconductor

    • Posit Public Package Manager

    • R Universe

Libraries 📚

  • Store packages installed for current R version

  • System libraries

    • User

    • Site

    • Default

Project Local Libraries with renv 📚

  • Project local dependency management
  • Store and restore your project dependencies
  • Mimic Packrat
  • Little to no change to workflows

Workflow 🔀

  1. Initialize a new project local environment with a private R library with renv::init()
  2. Work in the project as normal adding packages with install.packages()
  3. Save the state of the working project with renv::snapshot() to lock file (renv.lock)
  4. Keep working with the option to save state after successful changes (renv::snapshot()) or revert to previous if updates introduce new problems (renv::restore())

Initialize ✨

  1. Searches R scripts for implicitly included dependencies using dependencies()
    • library("dplyr")
    • dplyr::mutate()
  2. Copy discovered packages into the renv global package cache for re-use
  3. Missing R package dependencies are installed into the project’s private library
  4. Initial lockfile capturing the state of the project’s library is created
  5. The project is activated with activate()

Initialize - Infrastructure & Files ✨

File Usage
.Rprofile Updated to activate renv for new project R sessions
renv.lock The lockfile
renv/ Folder containing all environment details
renv/activate.R Activation script run by the project .Rprofile.
renv/library The private project library
renv/settings.json Project settings
renv/.gitignore renv specific gitignore

renv.lock 🔒

  • Version of renv
  • Version of R
  • R repositories active for lockfile
  • Package records
{
  "R": {
    "Version": "4.2.3",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cloud.r-project.org"
      }
    ]
  },
  "Packages": {
    "markdown": {
      "Package": "markdown",
      "Version": "1.0",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "4584a57f565dd7987d59dda3a02cfb41"
    },
    "here": {
      "Package": "here",
      "Version": "0.7",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "908d95ccbfd1dd274073ef07a7c93934"
    }
  }
}

Cache 🏦

  • Global package installation location shared across all projects
  • Project specific libraries built from symlinks to cache
  • Primary benefits:
    • Speed up renv::restore() and renv::install()
    • Save disk space

Cache 🏦

  • Install process
    • Installation requested
    • Available? Link, otherwise install.
    • Copy to cache
    • Link back to project
  • Location
    • Default to ~/.local/share/renv
    • Multiple locations allowed

Shims 🔼

Function Shim
install.packages() renv::install()
remove.packages() renv::remove()
update.packages() renv::update()

Isolation 🫧

  • Require that all packages be distributed with a project
  • renv::isolate(): copies all dependencies into the local library
  • Vastly increases project folder size
  • No reliance on external libraries

Collaboration - Setup ⚒️

  • Create a new project repository and folder
  • One user explicitly initializes renv in the version controlled project folder
  • Commit all files including renv generated ones
  • renv will now bootstrap the project environment on collaborators computers

Collaboration - Workflow 🔀

  • Ensure all collaborators using the same package version
    • Install and test locally
    • Snapshot project
    • Share lockfile
    • Restore locally
  • Lockfile changes can be viewed with renv::history()

Caveats ☠️

  • Not a panacea for reproducibility
  • Solves one part of the problem
    • Records R and package versions
    • Tools to reinstall above
  • Problems
    • Results may depend on other system components
    • Packages may be removed from repositories

Use Case - Shiny🌟

  • Shiny apps utilize site library
  • Changes to one app’s dependencies can break others
    • Eg. dplyr update from version 1.0.10 to 1.1.12 replaced summarise() with reframe()
    • Installing new dplyr to system library breaks older apps
  • Solution:
    • Initialize renv project in shiny app dir (shared shiny user cache)
    • Restore lockfile to symlink and install all dependencies to app local library
    • Include sourcing of renv/activate.R in app.R
  • Now Shiny apps can rely on some shared packages and some unique packages
  • Easier addition of more shiny apps without clobbering existing

When to Use renv

  • Collaborating on project and sharing results

  • Ensure forward compatibility

  • Project small enough to not warrant using more intense encapsulation

  • Use of more robust encapsulation problematic (eg. Docker on Shiny Server)

Questions?

Use Case - Operational Reports

  • Updating dplyr caused a change in the functionality of if_else() where Surveiller relied on the