How to find the needle in a big data haystack

David Bader, professor and chair of the School of Computational Science and Engineering at Georgia Tech, delivered the keynote address during the recent “International Opportunities in Cloud Computing & Big Data” conference. Bader, executive director of High Performance Computing, explains why graph analysis is crucial for big data.

Social networks use big data to spot influencers and target advertising to the right user. Biologists need data to understand common drug interactions and design better medication. Utility providers use it to monitor disruptions and improve resiliency. Graphs are the unifying motif for data analysis, and as real-world circumstances constantly change, dynamic graphs are essential.

Businesses with global challenges need graph analysis of big data.

At Georgia Tech, many globally significant challenges – across a variety of industries – are being modeled by spatio-temporal interaction networks or graphs, which we call “STING” at the School of Computational Science and Engineering. STING graphs are accessible through “STINGER” (http://www.stingergraph.com) – a freely available, open source software project. STINGER – developed by Georgia Tech with colleagues in academia, government and industry – reveals dynamic temporal and semantic relationships between datasets in a way that makes unseen activity, or hidden threats, observable.

These streaming graphs solve problems that typically are hard to address because of the massive amounts of data involved and the need for supercomputers. The graphs are optimized to update large amounts of constantly fluctuating data at rates of more than 3 million edges per second on graphs of 1 billion edges.

It can help businesses observe the number of users on a social networking platform at a given time or the number of kilowatts in demand by a neighborhood. For example, for social good, the U.S. Centers for Disease Control and Prevention was able to observe public tweets on Twitter to track flu outbreaks during the 2009-10 H1N1 health emergency. Consulting firms, aeronautical engineers and energy researchers all have found STINGER useful for their business.

The volume of data available to us is growing at an overwhelming rate. Graph analytics can discover the value within massive datasets. It finds the proverbial needle in a haystack.

David A. Bader
David A. Bader
Distinguished Professor and Director of the Institute for Data Science

David A. Bader is a Distinguished Professor in the Department of Computer Science at New Jersey Institute of Technology.