Parallel Algorithms for Personalized Communication and Sorting with an Experimental Study (Extended Abstract)

Abstract

A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which exe- cute efficiently on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel al- gorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non- uniformly sized personalized messages to exchange with each other. We first present an algorithm for the h-relation per- sonalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. We then consider how to effectively use these communi- cation primitives to address the problem of sorting. Previ- ous schemes for sorting on general-purpose parallel machines have had to choose between poor load balancing and irreg- ular communication or multiple rounds of all-to-all person- alized communication. In this paper, we introduce a novel variation on sample sort which uses only two rounds of reg- ular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. Another variation using regular sampling for choosing the splitters has similar performance with deterministic guar- anteed bounds on the memory and communication require- ments. Both of these variations efficiently handle the pres- ence of duplicates without the overhead of tagging each el- ement. The personalized communication and sorting algorithms presented in this paper have been coded in SPLIT-C and run on a variety of platforms, including the Thinking Ma- chines CM-5, IBM SP-2, Gray Research T3D, Meiko Scien- tific CS-2, and the Intel Paragon. Our experimental results are consistent with the theoretical analyses and illustrate " The support by NASA Graduate Student Researcher Fellowship

Publication
Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ‘96, Padua, Italy, June 24-26, 1996