Tango: A Cross-layer Approach to Managing I/O Interference over Local Ephemeral Storage

Document Type

Conference Proceeding

Publication Date

1-1-2024

Abstract

As simulation-based scientific discovery advances to exascale, a major question that the community is striving to answer is how to co-design data storage and complex physicsrich analytics in a way that the time to knowledge can be minimized for post-processing. A particular challenge is how to accommodate a broad spectrum of data analytics needsparticularly those that become clear only until very late during the post-processing, a scenario where existing methods, such as in situ processing, are unable or less effective in supporting data analytics. As HPC storage systems have become deeper and more complex with the recent addition of NVMe, die-stacked memory, and burst buffer, it requires fundamentally rethinking new paradigms and methods for data storage and analysis. This paper aims to address the issue of I/O interference for data analytics over local ephemeral storage, which is shared by multiple applications in a non-exclusive node usage scenario-often configured for small- to medium-sized clusters. At the core of this work is a coordinated cross-layer approach that reacts to storage interference from both storage and application layers. By decomposing and distributing analysis data across the storage hierarchy, data analytics can adapt to the interference by reducing or completely avoiding access to lower tiers whenever there is a high interference, while maintaining a prescribed error bound to limit the information loss. Meanwhile, proper actions are also taken at the storage layer to ensure sufficient bandwidth is allocated for retrieving an augmentation, which is based upon the cardinality and accuracy of the augmentation as well as the nature of an application. We evaluate three realworld data analytics, XGC, GenASiS, and CFD, on Chameleon, and quantitatively demonstrate that the I/O performance can be vastly improved, e.g., by 52% versus no adaptivity and 36% versus single-layer adaptivity, while maintaining acceptable outcomes of data analysis.

Identifier

85214978210 (Scopus)

ISBN

[9798350352917]

Publication Title

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

External Full Text Location

https://doi.org/10.1109/SC41406.2024.00020

e-ISSN

21674337

ISSN

21674329

Grant

OAC-2311757

Fund Ref

Advanced Scientific Computing Research

This document is currently not available here.

Share

COinS