Dissertations

On resource-efficiency and performance optimization in big data computing and networking using machine learning

Wuji Liu, New Jersey Institute of TechnologyFollow

Document Type

Dissertation

Date of Award

12-31-2021

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Chase Qishi Wu

Second Advisor

Jason T. L. Wang

Third Advisor

Senjuti Basu Roy

Fourth Advisor

Qing Gary Liu

Fifth Advisor

Hui Zhao

Abstract

Due to the rapid transition from traditional experiment-based approaches to large-scale, computational intensive simulations, next-generation scientific applications typically involve complex numerical modeling and extreme-scale simulations. Such model-based simulations oftentimes generate colossal amounts of data, which must be transferred over high-performance network (HPN) infrastructures to remote sites and analyzed against experimental or observation data on high-performance computing (HPC) facility. Optimizing the performance of both data transfer in HPN and simulation-based model development on HPC is critical to enabling and accelerating knowledge discovery and scientific innovation. However, such processes generally involve an enormous set of attributes including domain-specific model parameters, network transport properties, and computing system configurations. The vast space of model parameters, the sheer volume of generated data, the limited amount of allocatable bandwidths, and the complex settings of computing systems make it practically infeasible for domain experts to manually deploy and optimize big data transfer and computing solutions in next-generation scientific applications.

The research in this dissertation identifies such attributes in networks, systems, and models, conducts in-depth exploratory analysis of their impacts on data transfer throughput, computing efficiency, and modeling accuracy, and designs and customizes various machine learning techniques to optimize the performance of big data transfer in HPN, big data computing on HPC, and model development through large-scale simulations. Particularly, unobservable latent factors such as competing loads on end hosts are investigated and an algorithm named Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is employed to eliminate their negative impacts on performance prediction using machine learning models such as Support Vector Regression (SVR). Based on such analysis results, a customized, domain-specific loss function is employed within machine learning models such as Stochastic Gradient Descent Regression for throughput prediction to advise bandwidth allocation in HPN. A Bayesian Optimization (BO)-based online computational steering framework is also designed to facilitate the process of scientific simulations and improve the accuracy of model development. The solution proposed in this dissertation provides an additional layer of intelligence in big data transfer and computing, and the resulted machine learning techniques help guide strategic provisioning of high-performance networking and computing resources to maximize the performance of next-generation scientific applications.

Recommended Citation

Liu, Wuji, "On resource-efficiency and performance optimization in big data computing and networking using machine learning" (2021). Dissertations. 1572.
https://digitalcommons.njit.edu/dissertations/1572

Download

Included in

Computational Engineering Commons, Computer Engineering Commons, Electrical and Computer Engineering Commons

COinS

Dissertations

On resource-efficiency and performance optimization in big data computing and networking using machine learning

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations

On resource-efficiency and performance optimization in big data computing and networking using machine learning

Author

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links