A Deployment Tool for Large Scale Graph Analytics Framework Arachne

Abstract

Data sets have grown exponentially in size, rapidly surpassing the scale at which traditional exploratory data analysis (EDA) tools can be used effectively to analyze real-world graphs. This led to the development of Arachne, a user-friendly tool enabling interactive graph analysis at terabyte scales while using familiar Python code and utilizing a high-performance back-end powered by Chapel that can be run on nearly any *nix-like system. Various disciplines, including biological, information, and social sciences, use large-scale graphs to represent the flow of information through a cell, connections between neurons, interactions between computers, relationships between individuals, etc. To take advantage of Arachne, however, a new user has to go through a long and convoluted installation process, which often takes a week or more to complete, even with assistance from the developers. To support Arachne’s mission of being an easy-to-use exploratory graph analytics tool that increases accessibility to high performance computing (HPC) resources, a better deployment experience was needed for users and developers. In this paper, we propose a tool specially designed to greatly simplify the deployment of Arachne for users and offer the ability to rapidly and automatically test the software for compatibility with new releases of its dependencies. The highly portable nature of Arachne necessitates that this deployment tool be able to install and configure the software in diverse combinations of hardware, operating system, initial system environment, and the evolving packages and libraries in Arachne. The tool was tested in both virtual and real-world environments, where its success was evaluated by an improvement to efficiency and productivity by both users and developers. Current results show that the installation and configuration process was greatly improved, with a significant reduction in the time and effort spent by both users and developers.

Publication
28th Annual IEEE High Performance Extreme Computing Conference