A common statistical problem is that of finding the median element in a set of data. This paper presents an efficient randomized high-level parallel algorithms for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily given integer k. Our general framework is an SPMD distributed memory programming model that is enhanced by a set of communication primitives. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The algorithms have been coded in the message passing standard MPI, and our experimental results from the IBM SP-2 illustrate the scalability and efficiency of our algorithm and improve upon all the related experimental results known to the authors.
David A. Bader is an Assistant Professor in Electrical & Computer Engineering. David received his Ph.D. in Electrical Engineering in 1996 from University of Maryland, College Park. David’s recent research experiences highlight his ability to bridge the gap between application and computer science. For example, while an NSF CISE Postdoctoral Research Associate in in Experimental Computer Science, David worked closely with Earth scientists at NASA/GSFC and University of Marylands’s Geography Department to develop a high performance system for on-demand queries of terascale remotely-sensed Earth science data, such as 4km AVHRR global coverage, used for monitoring long-term, global studies of the Earth. In addition to developing some of the fastest known portable, high-performance algorithms for image segmentation and classification applications, combinatorial problems such as sorting and selection, and data communication primitives, David’s recent research has produced a new, preliminary methodology for programming clusters of SMP nodes.