Emerging real-world graph problems include: detecting community structure in large social networks; improving the resilience of the electric power grid; and detecting and preventing disease in human populations. Unlike traditional applications in computational science and engineering, solving these problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for additional research on scalable algorithms and development of frameworks for solving these problems on high performance computers, and the need for improved models that also capture the noise and bias inherent in the torrential data streams. In this talk, I will discuss opportunities and challenges in massive data-intensive computing for applications in computational science and engineering.
Building on last year’s theme of Productive Analytics, the fifth CLSAC workshop will address the need for large-scale streaming analytics in the operation and management of complex systems. Driverless cars, robots, early-warning systems, and exascale computers, to name but a few, will require the analysis of multiple streams of data in real-time. The volume, velocity, and variety of real-time, data streams pose unique hardware and software challenges different from those faced when analyzing volumes of available data. Hardware will need to provide sufficient computing, memory, communication, and error detection capabilities to meet hard deadlines. Runtime systems will have to be self-aware rescheduling for high-priority events and recovering from faults all within strict a power limit. Scalable, resilient algorithms capable of returning actionable decisions, steering computations, and classifying events must be developed. Finally, software platforms and tools useable by subject matter experts are essential to lower the cost of entry. The workshop’s goals are to bring together thought leaders across government, industry, and academia to discuss key challenges for streaming analytics. Important questions include: