Large scale complex network analysis using the hybrid combination of a MapReduce cluster and a highly multithreaded system


Complex networks capture interactions among entities in various application areas in a graph representation. Analyzing large scale complex networks often answers important questions-e.g. estimate the spread of epidemic diseases-but also imposes computing challenges mainly due to large volumes of data and the irregular structure of the graphs. In this paper, we aim to solve such a challenge: finding relationships in a subgraph extracted from the data. We solve this problem using three different platforms: a MapReduce cluster, a highly multithreaded system, and a hybrid system of the two. The MapReduce cluster and the highly multithreaded system reveal limitations in efficiently solving this problem, whereas the hybrid system exploits the strengths of the two in a synergistic way and solves the problem at hand. In particular, once the subgraph is extracted and loaded into memory, the hybrid system analyzes the subgraph five orders of magnitude faster than the MapReduce cluster.

24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19-23 April 2010 - Workshop Proceedings