Document Type

Dissertation

Date of Award

5-31-2024

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Chase Qishi Wu

Second Advisor

Guiling Wang

Third Advisor

Senjuti Basu Roy

Fourth Advisor

Yi Chen

Fifth Advisor

Hui Wang

Abstract

In next-generation scientific applications, the exponential growth of big data necessitates advanced techniques for efficient data storage, processing, and analysis. This has led to the construction of intricate computing workflows, managed and orchestrated by powerful engines in big data systems as exemplified by Hadoop. As scientific applications increasingly shift towards simulation-centric approaches, traditional methodologies face new challenges in accommodating the complexity of extreme-scale numerical modeling with numerous tunable parameters. To address these challenges, this dissertation propose to develop a machine learning-assisted framework that enables autonomous computational steering of scientific simulations and optimized execution of big-data workflows on heterogeneous platforms. This framework integrates three main technical components. 1) A computational steering strategy employs reinforcement learning to realize dynamic parameter tuning for accurate modeling in complex and distributed environments. 2) A workflow mapping scheme determines job or task assignment and on-node scheduling and resource allocation to minimize end-to-end delay. 3) A class of novel algorithms based on dueling double deep Q-networks with Gaussian Process Regression optimize data block distribution and recovery in Hadoop Distributed File System (HDFS) on heterogeneous clusters with diverse capacities of data nodes and disparate patterns of data access. Moreover, this dissertation formulate some of these problems within our framework as optimization problems, prove their NP-completeness, and design approximation algorithms with robust performance guarantees. Experimental results from real-life scientific simulations demonstrate the efficacy of our proposed methods, showcasing their superiority over existing algorithms and affirming the validity of our theoretical analyses. This dissertation research contributes to advancing the big data computing process in scientific disciplines and also highlights its potential applications to big data-driven industrial and business processes.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.