Presentation
Automatic Characterization of HPC Job Parallel Filesystem I/O Patterns
SessionFacilitation: HPC
Event Type
Technical Paper
Facilitation
HPC Facilitation
HPC Workforce
TimeTuesday, July 241:45pm - 2pm
LocationGrand Ballroom 3
DescriptionAs part of the NSF funded XMS project, we are actively researching automatic detection of poorly performing HPC jobs. To aid the analysis we have generated a taxonomy of the temporal I/O patterns for HPC jobs. In this paper we describe the design of temporal pattern characterization algorithms for HPC job I/O. We have implemented these algorithms in the Open XDMoD job analysis framework. These I/O classifications include periodic patterns and a variety of characteristic non-periodic patterns. We present an analysis of the I/O patterns observed on the /scratch filesystem on an academic HPC cluster. This type of analysis can be extended to other HPC usage data such as memory, CPU and interconnect usage. Ultimately this analysis will be used to improve HPC throughput and efficiency by, for example, automatically identifying anomalous HPC jobs.
Authors