Date of Award
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Chase Qishi Wu
Jason T. L. Wang
Senjuti Basu Roy
Due to the rapid transition from traditional experiment-based approaches to large-scale, computational intensive simulations, next-generation scientific applications typically involve complex numerical modeling and extreme-scale simulations. Such model-based simulations oftentimes generate colossal amounts of data, which must be transferred over high-performance network (HPN) infrastructures to remote sites and analyzed against experimental or observation data on high-performance computing (HPC) facility. Optimizing the performance of both data transfer in HPN and simulation-based model development on HPC is critical to enabling and accelerating knowledge discovery and scientific innovation. However, such processes generally involve an enormous set of attributes including domain-specific model parameters, network transport properties, and computing system configurations. The vast space of model parameters, the sheer volume of generated data, the limited amount of allocatable bandwidths, and the complex settings of computing systems make it practically infeasible for domain experts to manually deploy and optimize big data transfer and computing solutions in next-generation scientific applications.
The research in this dissertation identifies such attributes in networks, systems, and models, conducts in-depth exploratory analysis of their impacts on data transfer throughput, computing efficiency, and modeling accuracy, and designs and customizes various machine learning techniques to optimize the performance of big data transfer in HPN, big data computing on HPC, and model development through large-scale simulations. Particularly, unobservable latent factors such as competing loads on end hosts are investigated and an algorithm named Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is employed to eliminate their negative impacts on performance prediction using machine learning models such as Support Vector Regression (SVR). Based on such analysis results, a customized, domain-specific loss function is employed within machine learning models such as Stochastic Gradient Descent Regression for throughput prediction to advise bandwidth allocation in HPN. A Bayesian Optimization (BO)-based online computational steering framework is also designed to facilitate the process of scientific simulations and improve the accuracy of model development. The solution proposed in this dissertation provides an additional layer of intelligence in big data transfer and computing, and the resulted machine learning techniques help guide strategic provisioning of high-performance networking and computing resources to maximize the performance of next-generation scientific applications.
Liu, Wuji, "On resource-efficiency and performance optimization in big data computing and networking using machine learning" (2021). Dissertations. 1572.