Date of Award
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Chase Qishi Wu
Extreme-scale e-Science applications in various domains such as earth science and high energy physics among multiple national institutions within the U.S. are generating colossal amounts of data, now frequently termed as “big data”. The big data must be stored, managed and moved to different geographical locations for distributed data processing and analysis. Such big data transfers require stable and high-speed network connections, which are not readily available in traditional shared IP networks such as the Internet. High-performance networking technologies and services featuring high bandwidth and advance reservation are being rapidly developed and deployed across the nation and around the globe to support such scientific applications. However, these networking technologies and services have not been fully utilized, mainly because: i) the use of these technologies and services often requires considerable domain knowledge and many application users are even not aware of their existence; and ii) the end-to-end data transfer performance largely depends on the transport protocol being used on the end hosts. The high-speed network path with reserved bandwidth in High-performance Networks has shifted the data transfer bottleneck from network segments in traditional IP networks to end hosts, which most existing transport protocols are not well suited to handle.
In this dissertation, an integrated transport solution is proposed in support of data- and network-intensive applications in various science domains. This solution integrates three major components, i.e., i) transport-support workflow optimization, ii) transport profile generation, and iii) transport protocol design, into a unified framework. Firstly, a class of transport-support workflow optimization problems are formulated, where an appropriate set of resources and services are selected to compose the best transport-support workflow to meet user’s data transfer request in terms of various performance requirements. Secondly, a transport profiler named Transport Profile Generator (TPG) and its extended and accelerated version named FastProf are designed and implemented to characterize and enhance the end-to-end data transfer performance of a selected transport method over an established network path. Finally, several approaches based on rate and error threshold control are proposed to design a suite of data transfer protocols specifically tailored for big data transfer over dedicated connections. The proposed integrated transport solution is implemented and evaluated in: i) a local testbed with a single 10 Gb/s back-to-back connection and dual 10 Gb/s NIC-to-NIC connections; and ii) several wide-area networks with 10 Gb/s long-haul connections at collaborative sites including Oak Ridge National Laboratory, Argonne National Laboratory, and University of Chicago.
Yun, Daqing, "An integrated transport solution to big data movement in high-performance networks" (2016). Dissertations. 84.