Architecture independent parallel algorithm design: Theory vs practice

Document Type

Article

Publication Date

4-1-2002

Abstract

We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some dense matrix computations such as matrix multiplication, LU decomposition and matrix inversion. Although optimized algorithms for these problems have been extensively examined before, a systematic study of an architecture independent design and analysis of parallel algorithms and their performance (including matrix computations) has not been undertaken. Even though more refined algorithms and implementations (sequential or parallel) for the stated problems exist, the complexity and performance of the introduced algorithms is sufficient to raise the issues that are important in architecture independent parallel algorithm design. Two established distributions of an input matrix among the processors of a parallel machine are examined and the particular theoretical and practical merits of each one are also discussed. The algorithms we propose have been implemented and tested on a variety of parallel systems that include the SGI Power Challenge, the IBM SP2 and the Cray T3D. Our experimental results support our claims of efficiency, portability and reusability of the presented algorithms. © 2002 Elsevier Science B.V. All rights reserved.

Identifier

0036532526 (Scopus)

Publication Title

Future Generation Computer Systems

External Full Text Location

https://doi.org/10.1016/S0167-739X(01)00068-1

ISSN

0167739X

First Page

573

Last Page

593

Issue

5

Volume

18

Grant

421350

Fund Ref

Engineering and Physical Sciences Research Council

This document is currently not available here.

Share

COinS