David A. Bader

Distinguished Professor and Director of the Institute for Data Science

New Jersey Institute of Technology

Biography

David A. Bader is a Distinguished Professor and founder of the Department of Data Science in the Ying Wu College of Computing and Director of the Institute for Data Science at New Jersey Institute of Technology. Prior to this, he served as founding Professor and Chair of the School of Computational Science and Engineering, College of Computing, at Georgia Institute of Technology. Bader is an elected Board Member of the Computing Research Association (CRA). He is a Fellow of the IEEE, ACM, AAAS, and SIAM; a recipient of the IEEE Sidney Fernbach Award; and the 2022 Innovation Hall of Fame inductee of the University of Maryland’s A. James Clark School of Engineering. The Computer History Museum recognizes Bader for developing the first Linux-based supercomputer which became the predominant architecture for all major supercomputers in the world.

Interests

Data Science
High Performance Computing
Real-World Analytics

Education

PhD in Electrical Engineering, 1996
University of Maryland
MS in Electrical Engineering, 1991
Lehigh University
BS in Computer Engineering, 1990
Lehigh University

Biography

David A. Bader is a Distinguished Professor and founder of the Department of Data Science and inaugural Director of the Institute for Data Science at New Jersey Institute of Technology. Prior to this, he served as founding Professor and Chair of the School of Computational Science and Engineering, College of Computing, at Georgia Institute of Technology. Bader is an elected Board Member of the Computing Research Association (CRA), and previously served on the IEEE Computer Society Board of Governors.

Dr. Bader is a Fellow of the IEEE, ACM, AAAS, and SIAM; a recipient of the IEEE Sidney Fernbach Award; and the 2022 Innovation Hall of Fame inductee of the University of Maryland’s A. James Clark School of Engineering. He advises the White House, most recently on the National Strategic Computing Initiative (NSCI) and Future Advanced Computing Ecosystem (FACE). Bader is a leading expert in solving global grand challenges in science, engineering, computing, and data science. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics, and he has co-authored over 300 scholarly papers and has best paper awards from ISC, IEEE HPEC, and IEEE/ACM SC. Dr. Bader has served as a lead scientist in several DARPA programs including High Productivity Computing Systems (HPCS) with IBM, Ubiquitous High Performance Computing (UHPC) with NVIDIA, Anomaly Detection at Multiple Scales (ADAMS), Power Efficiency Revolution For Embedded Computing Technologies (PERFECT), Hierarchical Identify Verify Exploit (HIVE), and Software-Defined Hardware (SDH). Recently, Bader received an NVIDIA AI Lab (NVAIL) award, and a Facebook Research AI Hardware/Software Co-Design award.

Dr. Bader is Editor-in-Chief of the ACM Transactions on Parallel Computing, and previously served as Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems. He serves on the leadership team of Northeast Big Data Innovation Hub as the inaugural chair of the Seed Fund Steering Committee. ROI-NJ recognized Bader as a technology influencer on its 2021 inaugural and 2022 lists. In 2012, Bader was the inaugural recipient of University of Maryland’s Electrical and Computer Engineering Distinguished Alumni Award. In 2014, Bader received the Outstanding Senior Faculty Research Award from Georgia Tech. Bader is a member of Tau Beta Pi (National Engineering Honor Society), Eta Kappa Nu (Electrical Engineering Honor Society), and Omicron Delta Kappa (National Leadership Honor Society). Bader has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor and Director of an NVIDIA GPU Center of Excellence. In 1998, Bader built the first Linux supercomputer that led to a high-performance computing (HPC) revolution, and Hyperion Research estimates that the total economic value of Linux supercomputing pioneered by Bader has been over $100 trillion over the past 25 years. The Computer History Museum recognizes Bader for developing the first Linux-based supercomputer which became the predominant architecture for all major supercomputers in the world. Bader is a cofounder of the Graph500 List for benchmarking “Big Data” computing platforms. He is recognized as a “RockStar” of High Performance Computing by InsideHPC and as HPCwire’s People to Watch in 2012 and 2014.

Media Appearances

Experience

Distinguished Professor

New Jersey Institute of Technology

July 2019 – Present Newark, NJ

Department of Data Science, Ying Wu College of Computing

Professor

Georgia Institute of Technology

August 2005 – June 2019 Atlanta, GA

Chair, School of Computational Science and Engineering.

Associate Professor and Regents’ Lecturer

University of New Mexico

January 1998 – July 2005 Albuquerque, NM

Department of Electrical and Computer Engineering.

Recent Boards

Scientific Advisory Board Member

Flatiron Institute, Simons Foundation

July 2023 – Present New York, NY

Committee Member

Information Systems Engineering, Johns Hopkins University

January 2023 – Present Baltimore, MD

Advisory Council Member

EdgeDiscovery, NJEdge Inc.

August 2020 – Present Newark, NJ

Advisory Board Member

ARLIS, University of Maryland

July 2020 – Present College Park, MD

Steering Committee Chair, Seed Fund

Northeast Big Data Innovation Hub

May 2020 – Present New York, NY

Advisory Board Member

OpenCilk

March 2020 – Present Cambridge, MA

Strategic Advisory Board Member

Open Source Election Technology Institute

September 2019 – Present Palo Alto, CA

Advisory Board Member

Trovares

January 2019 – April 2020 Seattle, WA

Advisory Council Member

Electrical and Computer Engineering Department, Lehigh University

January 2018 – Present Bethlehem, PA

Advisory Board Member

Accelogic, LLC

June 2015 – June 2019 Weston, FL

Advisory Committee on High Performance Computing

Council on Competitiveness

January 2014 – June 2019 Washington, DC

Advisory Committee on Cyberinfrastructure

National Science Foundation

January 2014 – December 2017

Board of Governors

IEEE Computer Society

January 2014 – December 2016

Board Member

Computing Research Association

January 2013 – December 2014 Washington, DC

Advisory Council Member

Internet2

January 2007 – December 2011

Advisory Board Member

DSPlogic, Inc.

August 2006 – June 2019 Frederick, MD

People

Faculty

David A. Bader

Distinguished Professor and Director of the Institute for Data Science

Staff

Selenny Fabre

Business Manager

Zhihui Du

Principal Research Scientist

Postdoctoral Alumni

Henning Meyerhenke

Professor

Tanya Berger-Wolf

Director, Translational Data Analytics Institute

Tiffani L. Williams

Teaching Professor and Director of Onramp Programs

Yuzhong Sun

Professor

PhD Students

Asha Saxena

PhD Student

Fuhuan Li

PhD Candidate

Mohammad Dindoost

PhD Student

Oliver Alvarado Rodriguez

PhD Candidate

PhD Alumni

Adam McLaughlin

Research Scientist / Engineer

Anita Zakrzewska

Senior Member of Technical Staff

David Ediger

Senior Research Engineer

Eisha Nathan

Computational Scientist

Emily Rogers

Researcher

Guojing Cong

Senior Staff

James Fairbanks

Assistant Professor

Jinyang Liu

Senior Software Engineer

Kamesh Madduri

Associate Professor

Lluís-Miquel Munguía

Senior Software Engineer

Matthew Sottile

Affiliate Graduate Faculty

Mi Yan

Senior Applied Research Engineer

Oded Green

Senior Solutions Architect

Seunghwa Kang

Senior Software Engineer

Vipin Sachdeva

Researcher

Virat Agarwal

Executive Directory, Head of Commodities Structuring

Xuefei Wang

Zhaoming Yin

Software Engineer

Projects

Cyber-Infrastructure for Community Detection, Extraction, and Search in Large Networks

Community detection methods enable an understanding of the structure of networks at multiple scales. While many methods exist, only a few are able to scale to large networks and/or are implemented in large computational infrastructure.

High Performance Algorithms for Interactive Data Science at Scale

A real-world challenge in data science is to develop interactive methods for quickly analyzing new and novel data sets that are potentially of massive scale. This award will design and implement fundamental algorithms for high performance computing solutions that enable the interactive large-scale data analysis of massive data sets.

NVIDIA AI Lab (NVAIL) for Scalable Graph Algorithms

Research Directions Graph algorithms represent some of the most challenging known problems in computer science for modern processors. These algorithms contain far more memory access per unit of computation than traditional scientific computing.

Facebook Research

Facebook AI Systems Hardware/Software Co-Design research award on Scalable Graph Learning Algorithms https://research.fb.com/blog/2019/05/announcing-the-winners-of-the-ai-system-hardware-software-co-design-research-awards/ Deep learning has boosted the machine learning field at large and created significant increases in the performance of tasks including speech recognition, image classification, object detection, and recommendation.

HORNET

High-Performance Streaming Graph Analytics on GPUs

STINGER

Dynamic graphs are all around us. Social networks containing interpersonal relationships and communication patterns. Information on the Internet, Wikipedia, and other datasources. Disease spread networks and bioinformatics problems. Business intelligence and consumer behavior.

cuSTINGER

dynamic graph data structures and streaming algorithms for GPU

GTfold

Scalable Multicore Code for RNA Secondary Structure Prediction

GraphBLAS

The GraphBLAS Forum is an open effort to define standard building blocks for graph algorithms in the language of linear algebra. We believe that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. We believe that it is critical to move quickly and define such a standard, thereby freeing up researchers to innovate and diversify at the level of higher level algorithms and graph analytics applications. This effort was inspired by the Basic Linear Algebra Subprograms (BLAS) of dense Linear Algebra, and hence our working name for this standard is “the GraphBLAS”.

Graph500

GraphCT: Graph Characterization Toolkit

Cray XMT software developed in collaboration with PNNL

Multicore SWARM: Software and Algorithms for Running on Multicore Processors

an open source library for developing efficient and portable implementations that make use of multi-core processors

GRAPPA: Genome Rearrangements Analysis under Parsimony and other Phylogenetic Algorithms

High-performance software for computational phylogeny

Books

Massive Graph Analytics (Chapman & Hall / CRC Press), 2022

Expertise in massive scale graph analytics is key for solving real-world grand challenges from health to sustainability to detecting insider threats, cyber defense, and more. Massive Graph Analytics provides a comprehensive introduction to massive graph analytics, featuring contributions from thought leaders across academia, industry, and government.

Graph Partitioning and Graph Clustering (American Mathematical Society), 2013

10th DIMACS Implementaiton Challenge Workshop

Scientific Computing with Multicore and Accelerators (Chapman & Hall / CRC Press), 2011

The hybrid/heterogeneous nature of future microprocessors and large high-performance computing systems will result in a reliance on two major types of components: multicore/manycore central processing units and special purpose hardware/massively parallel accelerators.

Petascale Computing: Algorithms and Applications (Chapman & Hall / CRC Press), 2008

Although the highly anticipated petascale computers of the near future will perform at an order of magnitude faster than today’s quickest supercomputer, the scaling up of algorithms and applications for this class of computers remains a tough challenge.

Featured Publications

David Bader

August, 2021 IEEE Annals on the History of Computing

Linux and Supercomputing: How my passion for building COTS systems led to an HPC revolution

David A. Bader built the first Linux Supercomputer.

Adam McLaughlin, David A. Bader

January, 2014 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21, 2014

Scalable and High Performance Betweenness Centrality on the GPU (Best Student Paper Finalist)

Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is between ness centrality, which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost that prevents the examination of large graphs of interest. Prior GPU implementations suffer from large local data structures and inefficient graph traversals that limit scalability and performance. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running between ness centrality on 192 GPUs.

David Ediger, Robert McColl, E. Jason Riedy, David A. Bader

January, 2012 IEEE Conference on High Performance Extreme Computing, HPEC 2012, Waltham, MA, USA, September 10-12, 2012

STINGER: High performance data structure for streaming graphs (Best Paper Award)

The current research focus on “big data” problems highlights the scale and complexity of analytics required and the high rate at which data may be changing. In this paper, we present our high performance, scalable and portable software, Spatio-Temporal Interaction Networks and Graphs Extensible Representation (STINGER), that includes a graph data structure that enables these applications. Key attributes of STINGER are fast insertions, deletions, and updates on semantic graphs with skewed degree distributions. We demonstrate a process of algorithmic and architectural optimizations that enable high performance on the Cray XMT family and Intel multicore servers. Our implementation of STINGER on the Cray XMT processes over 3 million updates per second on a scale-free graph with 537 million edges.

Virat Agarwal, David A. Bader, Lin Dan, Lurng-Kuo Liu, Davide Pasetto, Michael Perrone, Fabrizio Petrini

January, 2009 24th International Supercomputing Conference (ISC), Hamburg, Germany, June 23-26, 2009

Faster FAST: Multicore Acceleration of Streaming Financial Data (Best Paper Award)

David A. Bader, Kamesh Madduri

January, 2005 High Performance Computing - HiPC 2005, 12th International Conference, Goa, India, December 18-21, 2005, Proceedings

Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors (HiPC Most Impactful Papers Award)

Graph theoretic problems are representative of fundamental computations in traditional and emerging scientific disciplines like scientific computing and computational biology, as well as applications in national security. We present our design and implementation of a graph theory application that supports the kernels from the Scalable Synthetic Compact Applications (SSCA) benchmark suite, developed under the DARPA High Productivity Computing Systems (HPCS) program. This synthetic benchmark consists of four kernels that require irregular access to a large, directed, weighted multi-graph. We have developed a parallel implementation of this benchmark in C using the POSIX thread library for commodity symmetric multiprocessors (SMPs). In this paper, we primarily discuss the data layout choices and algorithmic design issues for each kernel, and also present execution time and benchmark validation results.