Technical Paper
Supporting distributed, interactive Jupyter and RStudio in a scheduled HPC environment with Spark using Open OnDemand
Event Type
Technical Paper
Facilitation Tags
Technical Paper Tags
Technical Paper
TimeTuesday, July 2410:45am - 11am
DescriptionOpen OnDemand supports Interactive HPC web applications enabling the
interactive and distributed environments for Jupyter and RStudio running
on an HPC cluster. These web applications provide a simple
user-interface for building and submitting the batch job responsible
for launching the interactive environment as well as proxying the
connection between the user's browser and the web server running on
the cluster. Support for distributive computing through a Jupyter
notebook and RStudio session is provided by an Apache Spark cluster
launched concurrently in standalone mode on the allocated nodes within
the batch job. Alternatively, users can directly use the corresponding
MPI bindings for either R or Python.

This paper describes the design of Interactive HPC web applications on
an Open OnDemand deployment for launching and connecting to Jupyter
notebooks and RStudio sessions as well as the architecture and
software required for supporting Jupyter, RStudio, and Apache Spark on
the corresponding HPC cluster. Singularity can be leveraged for
packaging and portability of this architecture across HPC
clusters. This paper also discusses the challenges encountered in
providing interactive access to HPC resources that are in need of
general solutions.