Document Type

Dissertation

Date of Award

12-31-2021

Degree Name

Doctor of Philosophy in Computer Engineering - (Ph.D.)

Department

Electrical and Computer Engineering

First Advisor

Qing Gary Liu

Second Advisor

Nirwan Ansari

Third Advisor

MengChu Zhou

Fourth Advisor

Roberto Rojas-Cessa

Fifth Advisor

Xiaoning Ding

Abstract

As high-performance computing (HPC) is being scaled up to exascale to accommodate new modeling and simulation needs, I/O has continued to be a major bottleneck in the end-to-end scientific processes. To bridge the widening gap between compute and I/O, and enable data to be more efficiently stored and analyzed, simulation outputs need to be refactored, reduced, and appropriately mapped to storage tiers. Also, a major question that the community is striving to answer is how to co-design data storage and complex physics-rich analytics in a way that the time to knowledge can be minimized in post-processing. As HPC storage systems have become deeper and more complex, it requires fundamentally rethinking new paradigms and methods for data storage and analysis. To that end, this dissertation develops SIRIUS, a progressive JPEG-like data management scheme for storing and analyzing big scientific data and a coordinated cross-layer approach that reacts to storage interference from both storage and application layers.

For data storage, with reasonably low overhead, the proposed approach refactors simulation data into a much smaller, reduced-accuracy base dataset, and a series of deltas that is used to augment the accuracy if needed. The base dataset and deltas are compressed and written to multiple storage tiers. For data analysis, this work aims to address the issue of I/O interference for data analytics on cgroups-based storage. In particular, this work explores the emerging scenario of containerization on HPC systems where the local storage is shared by multiple containers. By decomposing and distributing analysis data across the storage hierarchy, data analytics can adapt to the interference by reducing or completely avoiding the access to lower tiers whenever there is a high interference, while maintaining a prescribed error bound to limit the information loss. Meanwhile, proper actions are also taken at the storage layer to ensure sufficient bandwidth are allocated for retrieving an augmentation.

This dissertation evaluates three data analytics, XGC, GenASiS, and CFD to understand the impact of SIRIUS and quantitatively demonstrate that the I/O performance can be vastly improved, e.g., by 57% versus no adaptivity and 41% versus single-layer adaptivity, while maintaining acceptable outcomes of data analysis.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.