Close

Presentation

Student Technical Paper
Technical Paper
:
Combining HPC and Big Data Infrastructures in Large-Scale Post-Processing of Simulation Data: A Case Study
Event Type
Student Technical Paper
Technical Paper
Application Tags
Applications
HPC Applications
Technical Paper Tags
Technical Paper
Student Technical Paper
TimeWednesday, July 2511:30am - 11:45am
DescriptionAdvances in scientific instruments, simulation software, and computing infrastructures have enabled science to simulate and model highly complex systems. At the same time, increases in simulation duration and scale have led to significant increases in sizes of output data, which can be in the range of hundreds of gigabytes or more. While there exist solutions to assist with most standard post-simulation analytics, researchers must develop their own code to support customized analytical tasks. Given the nature of these output data, most naïve in-house sequential codes end up being inefficient, and in most cases, time-consuming. In this paper, we propose a solution to this issue by transparently combining the strengths of a high-performance computing cluster and a big data infrastructure to support an end-to-end scientific workflow. More specifically, we present a case study around the design of a research computing environment at Clemson University where these two computing systems are integrated and accessible from one another. This environment allows simulation data to be automatically transferred across systems and complex analytical tasks on these data to be developed using the Hadoop/Spark frameworks. Results show that a hybrid workflow for molecular dynamics simulation can provide significant performance improvements over a traditional workflow. Furthermore, code complexity of Hadoop/Spark solutions is shown to be less than that of a traditional solution. This helps researchers, who are usually without formal software engineering training, to better extend and maintain these codes.