On the random access performance of Cell Broadband Engine with graph analysis application

Abstract

The Cell Broad Engine (BE) Processor has unique memory access architecture besides its powerful computing engines. Many computing-intensive applications have been ported to Cell/BE successfully. But memory-intensive applications are rarely investigated except for several micro benchmarks. Since Cell/BE has powerful software visible DMA engine, this paper studies on whether Cell/BE is suit for applica- tions with large amount of random memory accesses. Two benchmarks, GUPS and SSCA#2, are used. The latter is a rather complex one that in representative of real world graph analysis applications. We find both benchmarks have good performance on Cell/BE based IBM QS20/22. Com- pared with 2 conventional multi-processor systems with the same core/thread number, GUPS is about 40-80% fast and SSCA#2 about 17-30% fast. The dynamic load balanc- ing and software pipeline for optimizing SSCA#2 are intro- duced. Based on the experiment, the potential of Cell/BE for random access is analyzed in detail as well as its limita- tions of memory controller, atomic engine and TLB manage- ment.Our research shows although more programming effort are needed, Cell/BE has the potencial for irregular memory access applications.

Publication
CoRR