Canopus: Enabling extreme-scale data analytics on big HPC storage via progressive refactoring

Document Type

Conference Proceeding

Publication Date

1-1-2017

Abstract

High accuracy scientific simulations on high performance computing (HPC) platforms generate large amounts of data. To allow data to be efficiently analyzed, simulation outputs need to be refactored, compressed, and properly mapped onto storage tiers. This paper presents Canopus, a progressive data management framework for storing and analyzing big scientific data. Canopus allows simulation results to be refactored into a much smaller dataset along with a series of deltas with fairly low overhead. Then, the refactored data are compressed, mapped, and written onto storage tiers. For data analytics, refactored data are selectively retrieved to restore data at a specific level of accuracy that satisfies analysis requirements. Canopus enables end users to make trade-offs between analysis speed and accuracy on-the-fly. Canopus is demonstrated and thoroughly evaluated using blob detection on fusion simulation data.

Identifier

85084161356 (Scopus)

Publication Title

9th Usenix Workshop on Hot Topics in Storage and File Systems Hotstorage 2017 Co Located with Usenix Atc 2017

Grant

17-SC-20-SC

Fund Ref

U.S. Department of Energy

This document is currently not available here.

Share

COinS