UC Berkeley, Invited Talk: Opportunities and Challenges in Massive Data-Intensive Computing


Emerging real-world graph problems include detecting community structure in large social networks, improving the resilience of the electric power grid, and detecting and preventing disease in human populations. Unlike traditional applications in computational science and engineering, solving these problems at scale often raises new challenges because of sparsity and the lack of locality in the data, the need for additional research on scalable algorithms and development of frameworks for solving these problems on high performance computers, and the need for improved models that also capture the noise and bias inherent in the torrential data streams. In this talk, the speaker will discuss the opportunities and challenges in massive data-intensive computing for applications in computational biology, genomics, and security. The explosion of real-world graph data poses a substantial challenge: How can we analyze constantly changing streaming graphs with billions of vertices? Our approach leverages fine-grained parallelism, lightweight synchronization, and shared memory, to scale to massive graphs.

May 8, 2012 1:00 PM — 2:00 PM
Berkeley, CA

Video Lecture

Upcoming Conference: “Machine-Learning with Real-time & Streaming Applications” FIRST CONFERENCE ANNOUNCEMENT:

From Data to Knowledge: Machine-Learning with Real-time & Streaming Applications May 7-11 2012
On the Campus of the University of California, Berkeley



Olfa Nasraoui (Louisville), Petros Drineas (RPI), Muthu Muthukrishnan (Rutgers), Alex Szalay (John Hopkins), David Bader (Georgia Tech), Eamonn Keogh (UC Riverside), Joao Gama (Univ. of Porto, Portugal), Michael Franklin (UC Berkeley), Ziv Bar-Joseph (Carnegie Mellon University)


We are experiencing a revolution in the capacity to quickly collect and transport large amounts of data. Not only has this revolution changed the means by which we store and access this data, but has also caused a fundamental transformation in the methods and algorithms that we use to extract knowledge from data. In scientific fields as diverse as climatology, medical science, astrophysics, particle physics, computer vision, and computational finance, massive streaming data sets have sparked innovation in methodologies for knowledge discovery in data streams. Cutting-edge methodology for streaming data has come from a number of diverse directions, from on-line learning, randomized linear algebra and approximate methods, to distributed optimization methodology for cloud computing, to multi-class classification problems in the presence of noisy and spurious data.

This conference will bring together researchers from applied mathematics and several diverse scientific fields to discuss the current state of the art and open research questions in streaming data and real-time machine learning. The conference will be domain driven, with talks focusing on well-defined areas of application and describing the techniques and algorithms necessary to address the current and future challenges in the field.

Sessions will be accessible to a broad audience and will have a single track format with additional rooms for breakout sessions and posters. There will be no formal conference proceedings, but conference applicants are encouraged to submit an abstract and present a talk and/or poster.


  • Feb 29 : Initial registration ends, participants announced.
  • May 7 – 11 : Conference.


  • Stochastic Data Streams
    • Muthu Muthukrishnan: (Dept. of Computer Science, Rutgers University)
  • Real-Time Machine Learning in Astrophysics
    • Alex Szalay: (Dept. of Physics and Astronomy, John Hopkins University)
  • Real-Time Analytics with Streaming Databases
    • Michael Franklin: (Computer Science Dept., UC Berkeley)
  • Classification of Sensor Network Data Streams
    • Joao Gama: (Lab. of A.I. & Decision Support, Economics at Univ. of Porto)
  • Randomized and Approximation Algorithms
    • Petros Drineas: (Computer Science Dept., Rensselaer Polytechnic Institute)
  • Time-Series Clustering and Classification
    • Eamonn Keogh: (Computer Science and Engineering Dept., UC Riverside)
  • Time Series in the Biological and Medical Sciences
    • Ziv Bar-Joseph: (Computer Science Dept., Carnegie Mellon University)
  • Streaming Graph/Network Data & Architectures
    • David Bader: (College of Computing, Georgia Tech)
  • Data Mining of Data Streams
    • Olfa Nasraoui: (Dept. of CS & Computer Engineering, Univ. of Louisville)

Local Organizing Committee

  • Joshua Bloom: (Dept. of Astronomy, UC Berkeley)
  • Damian Eads: (Dept. of CS, UC Santa Cruz; Dept. of Eng, Univ. of Cambridge)
  • Berian James: (Dept. of Astr, UC Berkeley; Dark Cosmology Centre, U Copenhagen)
  • Peter Nugent: (Comp. Cosmology, Lawrence Berkeley National Lab.)
  • John Rice: (Dept. of Statistics, UC Berkeley)
  • Joseph Richards: (Dept. of Astronomy & Dept. of Statistics, UC Berkeley)
  • Dan Starr: (Dept. of Astronomy, UC Berkeley)

Scientific Organizing Committee

  • Leon Bottou: (NEC Labs)
  • Emmanuel Candes: (Stanford)
  • Brad Efron: (Stanford)
  • Alex Gray: (Georgia Tech)
  • Michael Jordan: (Berkeley)
  • John Langford: (Yahoo)
  • Fernando Perez: (Berkeley)
  • Ricardo Vilalta: (Houston)
  • Larry Wasserman: (CMU)
David A. Bader
David A. Bader
Distinguished Professor and Director of the Institute for Data Science

David A. Bader is a Distinguished Professor in the Department of Computer Science at New Jersey Institute of Technology.