CLSAC 2016 Invited Talk: Massive-Scale Streaming Analytics

Abstract

Emerging real-world graph problems include: detecting community structure in large social networks; improving the resilience of the electric power grid; and detecting and preventing disease in human populations. Unlike traditional applications in computational science and engineering, solving these problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for additional research on scalable algorithms and development of frameworks for solving these problems on high performance computers, and the need for improved models that also capture the noise and bias inherent in the torrential data streams. In this talk, I will discuss opportunities and challenges in massive data-intensive computing for applications in computational science and engineering.

Date
Oct 25, 2016 4:00 PM — 5:00 PM
Location
Annapolis, MD

Building on last year’s theme of Productive Analytics, the fifth CLSAC workshop will address the need for large-scale streaming analytics in the operation and management of complex systems. Driverless cars, robots, early-warning systems, and exascale computers, to name but a few, will require the analysis of multiple streams of data in real-time. The volume, velocity, and variety of real-time, data streams pose unique hardware and software challenges different from those faced when analyzing volumes of available data. Hardware will need to provide sufficient computing, memory, communication, and error detection capabilities to meet hard deadlines. Runtime systems will have to be self-aware rescheduling for high-priority events and recovering from faults all within strict a power limit. Scalable, resilient algorithms capable of returning actionable decisions, steering computations, and classifying events must be developed. Finally, software platforms and tools useable by subject matter experts are essential to lower the cost of entry. ​ The workshop’s goals are to bring together thought leaders across government, industry, and academia to discuss key challenges for streaming analytics. Important questions include:

  • What are the scientific, government and commercial application drivers?
  • What are the requirements imposed by the volume, velocity and variety of data at ingest?
  • What are hardware requirements for processing at scale to meet hard deadlines?
  • How can data errors be discovered and fixed?
  • How can faulty hardware be detected and circumvented?
  • What are the basic analytics methods and what classes of problems to they address?
  • Are there a set of security best-practices to protect complex systems?
  • How can platforms be made simple to use?
  • How do we enable computation across the entire network enterprise (from the sensor to the data center)?
  • How do we deal with non-technical and policy constraints (legal, privacy and ownership)
David A. Bader
David A. Bader
Distinguished Professor and Director of the Institute for Data Science

David A. Bader is a Distinguished Professor in the Department of Computer Science at New Jersey Institute of Technology.