Date of Award
Doctor of Philosophy in Electrical Engineering - (Ph.D.)
Electrical and Computer Engineering
Michael Abueg Palis
John D. Carpinelli
The problem of scheduling parallel programs for execution on distributed memory parallel architectures has become the subject of intense research in recent, years. Because of the high inter-processor communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing heavily communicating fine grain tasks into coarser ones in order to reduce the communication overhead so that the overall execution time is minimized.
The thesis of this research is that the task of exposing the parallelism in a given application should be left to the algorithm designer. On the other hand, the task of limiting the parallelism in a chosen parallel algorithm is best handled by the compiler or operating system for the target parallel machine. Toward this end, we have developed CASS (for Clustering And Scheduling System), a. task management system that provides facilities for automatic granularity optimization and task scheduling of parallel programs on distributed memory parallel architectures.
In CASS, a task graph generated by a profiler is used by the clustering module to find the best granularity al which to execute the program so that the overall execution time is minimized. The scheduling module maps the clusters onto a. fixed number of processors and determines the order of execution of tasks in each processor. The output of scheduling module is then used by a code generator to generate machine instructions.
CASS employs two efficient heuristic algorithms for clustering static task graphs: CASS-I for clustering with task duplication, and CASS-II for clustering without task duplication. It is shown that the clustering algorithms used by CASS outperform the best known algorithms reported in the literature. For the scheduling module in CASS, a heuristic algorithm based on load balancing is used to merge clusters such that the number of clusters matches the number of available physical processors.
We also investigate task clustering algorithms for dynamic task graphs and show that it is inherently more difficult than the static case.
Liou, Jing-Chiou, "Grain-size optimization and scheduling for distributed memory architectures" (1995). Dissertations. 1121.