Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

Abstract

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors (from 1 to 64) and across the entire range of instance sizes tested. This linear speedup with the number of processors is, to our knowledge, the first ever attained in practice for intricate combinatorial problems. The example we present in detail here is a graph decomposition algorithm that also requires the computation of a spanning tree; this problem is not only of interest in its own right, but is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms. Supported in part by NSF CAREER 00-93039, NSF ITR 00-81404, NSF DEB 99-10123, and DOE CSRI-14968

Publication
Algorithm Engineering, 5th International Workshop, WAE 2001 Aarhus, Denmark, August 28-31, 2001, Proceedings