SC18 Invited Panelist

Abstract

Government agencies and a variety of institutions from disparate areas (finance, healthcare, insurance, tele-communication, computing, etc.) have recognized the necessity of a new generation of systems and tools targeted at efficiently solving large scale graph and combinatorial problems. While infrastructures like GraphLab, GraphX (on top of Spark), Pregel, Giraph, and, in general, vertex or edge-centric approaches based on MapReduce-like data-parallel models have started to provide valuable results and found rapid adoption also on distributed memory systems, thanks to their higher-level programming abstractions, they have not yet demonstrated the ability to significantly scale the size of the problems addressed while maintaining nearly constant throughput as more computational elements are added. However, systems targeted at identifying threats to national security, as well as at data mining for commercial applications, are required to provide insights in actionable timeframes. Computing infrastructures targeted at solving scientific problems need to organize, filter, and process efficiently the increasing amount of scientific data (provided, for example, by more precise acquisition systems) at performance levels that could significantly extend the scope of the analyses (e.g., increasing the number of computational simulations). A new set of requirements, such as the support for streaming graphs and the ability to deal with attributed graphs are emerging. Additionally, the exponential growth of the interest in machine learning approaches is calling for a more effective integration with graph methods to understand relationships and organizational structures in the available data, thus enabling more effective workflows. The DOE and the DOD have started funding various initiatives to bridge these gaps, including projects such as the ECP codesign center ExaGraph and the DARPA Hierarchical Identify Verify Exploit (HIVE), and more initiatives in the area are expected. This BOF aims at gathering the community of people interested in frameworks and workflows for large-scale graph analytics, with the objective to discuss the current situation, identifying new challenges and opportunities, and laying the path towards future and interoperable infrastructures. This is the first time this BOF is being proposed. The intended audience includes domain scientists, academic and industrial researchers, potential end users, representative from government and public institutions, and funding agencies. We expect it to be extremely relevant to the expected HPC audience, touching a topic of intense (and growing) interest to the community. Our lineup of speakers will touch key themes such as applications, use cases, programming models, application programming interfaces and libraries (e.g., GraphBLAS), data structures and algorithms, and integration of tools, including common data structures, data storages, and data frames. While remaining vendor agnostic, we expect to touch also architectural requirements and architectural support. Since this is the first time such a BOF is proposed, its main outcomes will be: effectively gathering the community, providing an initial identification of abstractions and potential use cases of graph toolkits, and the identification of common layers and interfaces (e.g., algorithms, graph primitives, runtime functionalities) across components and tools. We will provide a report of the identified features at the end of the gathering.

Date
Nov 13, 2018 5:15 PM — Nov 13, 2017 6:45 PM
Location
Dallas, TX
David A. Bader
David A. Bader
Distinguished Professor and Director of the Institute for Data Science

David A. Bader is a Distinguished Professor in the Department of Computer Science at New Jersey Institute of Technology.