Evaluating Multicore Processors and Accelerators for Dense Numerical Computations

Seunghwa Kang, Nitin Arora, Aashay Shringarpure, Richard W. Vuduc, David A. Bader

January, 2013

Abstract

In this chapter, we empirically evaluate fundamental design trade-offs among current multicore processors and accelerator technologies and their impact on dense numerical computations. The main objectives of this work are to understand the differences in the implementation techniques required to achieve good performance on a variety of current multicore and accelerator platforms and to aid application designers in better mapping their software to the most suitable architecture. We also aim to influence future computing system design. We present interarchitectural comparisons of dense numerical kernels from computational statistics and direct n-body problems using a spectrum of multicore and accelerator platforms, including those based on the Intel Harpertown and Nehalem architectures, the AMD Barcelona architecture, the Sony-Toshiba-IBM Cell Broadband Engine, and the second-generation PowerXCell/8i and the NVIDIA Tesla C870 and C1060. We illustrate the software implementation process on each platform; measure and analyze the performance, coding complexity, and energy efficiency of each implementation; and discuss the impact of different architectural design choices on each implementation.

Type

Book section

Publication

Multicore Computing: Algorithms, Architectures, and Applications