Recent & Upcoming Talks

NSA Invited Talk: Arachne: An Open-Source Framework for Interactive Massive-Scale Graph Analytics

A real-world challenge in data science is to develop interactive methods for quickly analyzing new and novel data sets that are potentially of massive scale. In this talk, Bader will discuss his development of graph algorithms in the context of Arkouda, an open-source NumPy-like replacement for interactive data science on tens of terabytes of data. Massive-scale analytics is an emerging field that integrates the power of high-performance computing and mathematical modeling to extract key insights and information from large-scale data sets. Productivity in massive-scale analytics entails quick interpretation of results through easy-to-use frameworks, while also adhering to design principles that combine high-performance computing and user-friendly simplicity. However, data scientists often encounter challenges, especially with graph analytics, which require the analysis of complex data from various domains, such as the cybersecurity, natural and social sciences. To address this issue, we introduce Arachne, an open-source framework that enhances accessibility and usability in massive-scale graph analytics. Arachne offers novel algorithms and implementations of graph kernels for efficient data analysis, such as connected components, breadth-first search, triangle counting, k-truss, among others. The high-performance algorithms are integrated into a back-end server written in HPE/Cray’s Chapel language and can be accessed through a Python application programming interface (API). Arachne’s back-end server is compatible with Linux supercomputers, is easy to set up, and can be utilized through either Python scripts or Jupyter notebooks, which makes it a desirable tool for data scientists who have access to high performance computers. In this talk, Bader presents an overview of the algorithms his research group has implemented into Arachne and, if applicable, the algorithmic innovations of each. Further, Bader will discuss improvements to our graph data structure to store extra information such as node labels, edge relationships, and node and edge properties. Arachne is built as an extension to the open-source Arkouda framework and allows for graphs to be generated from Arkouda dataframes. The open-source code for Arachne can be found at https://github.com/Bears-R-Us/arkouda-njit. This is joint work with Oliver Alvarado Rodriguez, Zhihui Du, Joseph Patchett, Naren Khatwani, Fuhuan Li, Bader is supported in part by the National Science Foundation award CCF-2109988.

NSA Invited Talk: Arachne: An Open-Source Framework for Interactive Massive-Scale Graph Analytics
IEEE CS SYP Talk: Solving Global Grand Challenges with High Performance Data Analytics

Meet the speaker for the event, 𝗦𝗼𝗹𝘃𝗶𝗻𝗴 𝗚𝗹𝗼𝗯𝗮𝗹 𝗚𝗿𝗮𝗻𝗱 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝘄𝗶𝘁𝗵 𝗛𝗶𝗴𝗵-𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗗𝗮𝘁𝗮 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀, A distinguished Professor and Director, 𝗗𝗿. 𝗗𝗮𝘃𝗶𝗱 𝗔. 𝗕𝗮𝗱𝗲𝗿. David A. Bader is a Professor in the Department of Computer Science and founder of the Department of Data Science and inaugural Director of the 𝗜𝗻𝘀𝘁𝗶𝘁𝘂𝘁𝗲 𝗳𝗼𝗿 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 at 𝗡𝗲𝘄 𝗝𝗲𝗿𝘀𝗲𝘆 𝗜𝗻𝘀𝘁𝗶𝘁𝘂𝘁𝗲 𝗼𝗳 𝗧𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆. Dr. Bader is also a Fellow of the 𝗜𝗘𝗘𝗘, 𝗔𝗔𝗔𝗦, and 𝗦𝗜𝗔𝗠, and received the 𝗜𝗘𝗘𝗘 𝗖𝗦 𝗦𝗶𝗱𝗻𝗲𝘆 𝗙𝗲𝗿𝗻𝗯𝗮𝗰𝗵 𝗔𝘄𝗮𝗿𝗱 and the best paper awards from 𝗜𝗦𝗖, 𝗜𝗘𝗘𝗘 𝗛𝗣𝗘𝗖, 𝗮𝗻𝗱 𝗜𝗘𝗘𝗘/𝗔𝗖𝗠 𝗦𝗖. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics. Dr. Bader has also served as a lead scientist in several 𝗗𝗔𝗥𝗣𝗔 𝗽𝗿𝗼𝗴𝗿𝗮𝗺𝘀 and has received awards like 𝗡𝗩𝗜𝗗𝗜𝗔 𝗔𝗜 𝗟𝗮𝗯 (𝗡𝗩𝗔𝗜𝗟) and 𝗙𝗮𝗰𝗲𝗯𝗼𝗼𝗸 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗔𝗜 𝗛𝗮𝗿𝗱𝘄𝗮𝗿𝗲/𝗦𝗼𝗳𝘁𝘄𝗮𝗿𝗲 𝗖𝗼-𝗗𝗲𝘀𝗶𝗴𝗻 𝗮𝘄𝗮𝗿𝗱. Dr. Bader is 𝗘𝗱𝗶𝘁𝗼𝗿-𝗶𝗻-𝗖𝗵𝗶𝗲𝗳 of the 𝗔𝗖𝗠 𝗧𝗿𝗮𝗻𝘀. on Parallel Computing, and previously served as 𝗘𝗜𝗖 𝗼𝗳 𝘁𝗵𝗲 𝗜𝗘𝗘𝗘 𝗧𝗿𝗮𝗻𝘀. on Parallel and Distributed Systems. Read more about him here: (https://lnkd.in/drtzsT_r)

IEEE CS SYP Talk: Solving Global Grand Challenges with High Performance Data Analytics
NJBDA Talk: Large-Scale Graph Analytics in Arkouda

Oliver Alvarado Rodriguez (presenter), Zhihui Du and David Bader; New Jersey Institute of Technology

Exploratory graph analytics is a much sought out approach to help extract useful information from graphs. One of its main challenges arises when the size of the graph expands outside of the memory capacity that a typical computer can handle. Solutions must then be developed to allow data scientists to efficiently handle and analyze large graphs in a short period of time, using machines that have the capacity to handle massive file sizes. Arkouda is a software package under early development created with the intent to bridge the gap between massive parallel computations and data scientists wishing to perform exploratory data analysis (EDA). The communication system between the Chapel back-end and the Python front-end helps to create an easy-to-use interface for data scientists that does not require knowledge of the underlying Chapel code and instead allows them to utilize the simple Python front-end to carry out all their large file and graph EDA needs. In this work, a graph data structure is designed and implemented into the Arkouda framework at both the Chapel back-end and the Python front-end. The main attraction of this data structure is its ability to occupy less memory space and perform efficient adjacency edge searching. A parallel breadth-first search (BFS) algorithm is also presented to help demonstrate how easily one can implement parallel algorithms in Arkouda to increase EDA productivity with graphs. Lastly, real-world graphs from different domains, such as biology and social networks, are utilized to evaluate the efficiency of the graph data structure and the BFS algorithm. The results obtained from this benchmarking help show that the Arkouda overhead is almost negligible, and data scientists can utilize Arkouda for large scale graph analytics. This work can help further bridge the gap between high-performance computing (HPC) software and data science to create a framework that is straightforward for all data scientists to use. All of the code in this project and in Arkouda is open source and can be found on GitHub. This is joint work with Mike Merrill and William Reus. We acknowledge the support of National Science Foundation grant award CCF- 2109988.

NJBDA Talk: Large-Scale Graph Analytics in Arkouda
HCW 2012 Keynote Talk: Analyzing Massive Data on Heterogeneous Computing

Brief Biography

David A. Bader is a Full Professor in the School of Computational Science and Engineering, College of Computing, at Georgia Institute of Technology, and Executive Director for High Performance Computing. Dr. Bader is a lead scientist in the DARPA Ubiquitous High Performance Computing (UHPC) program. He received his Ph.D. in 1996 from The University of Maryland, and his research is supported through highlycompetitive research awards, primarily from NSF, NIH, DARPA, and DOE. Dr. Bader serves on the Research Advisory Council for Internet2, the Steering Committees of the IPDPS and HiPC conferences, the General Chair of IPDPS 2010 and Chair of SIAM PP12. He is an associate editor for several high impact publications including the Journal of Parallel and Distributed Computing (JPDC), ACM Journal of Experimental Algorithmics (JEA), IEEE DSOnline, Parallel Computing, and Journal of Computational Science, and has been an associate editor for the IEEE Transactions on Parallel and Distributed Systems (TPDS). He was elected as chair of the IEEE Computer Society Technical Committee on Parallel Processing (TCPP) and as chair of the SIAM Activity Group in Supercomputing (SIAG / SC). Dr. Bader’s interests are at the intersection of high-performance computing and real-world applications, including computational biology and genomics and massive-scale data analytics. He has co-chaired a series of meetings, the IEEE International Workshop on High-Performance Computational Biology (HiCOMB), coorganized the NSF Workshop on Petascale Computing in the Biological Sciences, written several book chapters, and co-edited special issues of the Journal of Parallel and Distributed Computing (JPDC) and IEEE TPDS on high-performance computational biology. He is also a leading expert on multicore, manycore, and multithreaded computing for data-intensive applications such as those in massive-scale graph analytics. He has co-authored over 100 articles in peer-reviewed journals and conferences, and his main areas of research are in parallel algorithms, combinatorial optimization, massive-scale social networks, and computational biology and genomics. Prof. Bader is a Fellow of the IEEE and AAAS, a National Science Foundation CAREER Award recipient, and has received numerous industrial awards from IBM, NVIDIA, Intel, Cray, Oracle/Sun Microsystems, and Microsoft Research. He served as a member of the IBM PERCS team for the DARPA High Productivity Computing Systems program, was a distinguished speaker in the IEEE Computer Society Distinguished Visitors Program, and has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor. Bader is recognized as a “RockStar” of High Performance Computing by InsideHPC and as HPCwire’s People to Watch in 2012.

HCW 2012 Keynote Talk: Analyzing Massive Data on Heterogeneous Computing
NCSA Chautauqua Talk 1: SuperClusters: A New Approach for High-Performance Computing

https://web.archive.org/web/19991126000736/http://chautauqua.ahpcc.unm.edu/

CLUSTERS - The Most Rapidly Growing Architecture of High-End Computing

Dr. David A. Bader, University of New Mexico, and lead for the UNM Roadrunner Linux-based Supercluster, will talk about the “next wave” in high performance computing.

NCSA Chautauqua Talk 1: SuperClusters: A New Approach for High-Performance Computing