Architecture independent parallel algorithm design: Theory vs practice
Document Type
Article
Publication Date
4-1-2002
Abstract
We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some dense matrix computations such as matrix multiplication, LU decomposition and matrix inversion. Although optimized algorithms for these problems have been extensively examined before, a systematic study of an architecture independent design and analysis of parallel algorithms and their performance (including matrix computations) has not been undertaken. Even though more refined algorithms and implementations (sequential or parallel) for the stated problems exist, the complexity and performance of the introduced algorithms is sufficient to raise the issues that are important in architecture independent parallel algorithm design. Two established distributions of an input matrix among the processors of a parallel machine are examined and the particular theoretical and practical merits of each one are also discussed. The algorithms we propose have been implemented and tested on a variety of parallel systems that include the SGI Power Challenge, the IBM SP2 and the Cray T3D. Our experimental results support our claims of efficiency, portability and reusability of the presented algorithms. © 2002 Elsevier Science B.V. All rights reserved.
Identifier
0036532526 (Scopus)
Publication Title
Future Generation Computer Systems
External Full Text Location
https://doi.org/10.1016/S0167-739X(01)00068-1
ISSN
0167739X
First Page
573
Last Page
593
Issue
5
Volume
18
Grant
421350
Fund Ref
Engineering and Physical Sciences Research Council
Recommended Citation
Gerbessiotis, Alexandros V., "Architecture independent parallel algorithm design: Theory vs practice" (2002). Faculty Publications. 14707.
https://digitalcommons.njit.edu/fac_pubs/14707
