1

Large Scale String Analytics In Arkouda

Large scale data sets from the web, social networks, and bioinformatics are widely available and can often be represented by strings and suffix arrays are highly efficient data structures enabling string analysis. But, our personal devices and …

Enabling Exploratory Large Scale Graph Analytics through Arkouda

Exploratory graph analytics helps maximize the informational value from a graph. However, the increasing graph size makes it impossible for existing popular exploratory data analysis tools to handle dozens-of-terabytes or even larger data sets in the …

A GraphBLAS implementation of Triangle Centrality

Identifying key members in large social network graphs is an important graph analytic. Recently, a new centrality measure called triangle centrality finds members based on the triangle support of vertices in graph. In this paper, we …

K-Truss Implementation in Arkouda (Extended Abstract)

K-Truss Implementation in Arkouda, a data science framework for graph analytics. Arkouda is an open source framework.

Exploratory Large Scale Graph Analytics in Arkouda

Exploratory graph analytics helps maximize the informational value for a graph. However, the increasing graph size makes it impossible for existing popular exploratory data analysis tools to handle dozens-of-terabytes or even larger data sets in the …

LAGraph: Linear Algebra, Network Analysis Libraries, and the Study of Graph Algorithms

Graph algorithms can be expressed in terms of linear algebra. GraphBLAS is a library of low-level building blocks for such algorithms that targets algorithm *developers*. LAGraph builds on top of the GraphBLAS to target users of graph algorithms with …

An Efficient LP Rounding Scheme for Replica Placement

Large fault-tolerant network systems with high Quality of Service (QoS) guarantee are critical in many real world applications and entail diverse replica placement problems. In this paper, the replica placement problem in terms of minimizing the …

GPU Accelerated Anomaly Detection of Large Scale Light Curves

Identifying anomalies in millions of stars in real time is a great challenge. In this paper, we develop a matched filtering based algorithm to detect a typical anomaly, microlensing. The algorithm can detect short timescale microlensing events with …

Using RAPIDS AI to Accelerate Graph Data Science Workflows

This Scale free networks are abundant in many natural, social, and engineering phenomena for which there exists a substantial corpus of theory able to elucidate many of their underlying properties. In this paper we study the scalability of some …

Accelerating and Expanding End-to-End Data Science Workflows with DL/ML Interoperability Using RAPIDS

The lines between data science (DS), machine learning (ML), deep learning (DL), and data mining continue to be blurred and removed. This is great as it ushers in vast amounts of capabilities, but it brings increased complexity and a vast number of …

Performance Impact of Memory Channels on Sparse and Irregular Algorithms

Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we show that …

Skip the Intersection: Quickly Counting Common Neighbors on Shared-Memory Systems

Counting common neighbors between all vertex pairs in a graph is a fundamental operation, with uses in similarity measures, link prediction, graph compression, community detection, and more. Current shared-memory approaches either rely on set …

A New Algorithmic Model for Graph Analysis of Streaming Data

The constant and massive influx of new data into analysis systems needs to be addressed without assuming we can pause the onslaught. Here we consider one aspect: non-stop graph analysis of streaming data. We formalize a new and practical algorithm …

Fast and Adaptive List Intersections on the GPU

List intersections are ubiquitous and can be found in a wide range of applications, including triangle counting and finding the maximal k-truss, both of which are part of the HPEC Static Graph Challenge. For many graph based problems it is necessary …

Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs

Sparse data computations are ubiquitous in science and engineering. Unlike their dense data counterparts, sparse data computations have less locality and more irregularity in their execution, making them significantly more challenging to parallelize …

Introduction to HiCOMB 2018

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

Logarithmic Radix Binning and Vectorized Triangle Counting

Triangle counting is a building block for numerous graph applications and given the fact that graphs continue to grow in size, its scalability is important. As such, numerous algorithms have been designed for triangle counting - some of which are …

Massive-scale Streaming Analytics: Models, Parallelism, & Real-world Applications

Emerging real-world graph problems include: detecting and preventing disease in human populations; revealing community structure in large social networks; and improving the resilience of the electric power grid. Unlike traditional applications in …

Ranking in Dynamic Graphs Using Exponential Centrality

Many large datasets from several fields of research such as biology or society can be represented as graphs. Additionally in many real applications, data is constantly being produced, leading to the notion of dynamic graphs. A heavily studied problem …

Scalable Katz Ranking Computation in Large Static and Dynamic Graphs

Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness and betweenness, Katz centrality …

A Dynamic Algorithm for Updating Katz Centrality in Graphs

Many large datasets from a variety of fields of research can be represented as graphs. A common query is to identify the most important, or highly ranked, vertices in a graph. Centrality metrics are used to obtain numerical scores for each vertex in …

Approximating Personalized Katz Centrality in Dynamic Graphs

Dynamic graphs can capture changing relationships in many real datasets that evolve over time. One of the most basic questions about networks is the identification of the “most important” vertices in a network. Measures of vertex importance called …

Design and implementation of parallel PageRank on multicore platforms

PageRank is a fundamental graph algorithm to evaluate the importance of vertices in a graph. In this paper, we present an efficient parallel PageRank design based on an edge-centric scatter-gather model. To overcome the poor locality of PageRank and …

Exact and Parallel Triangle Counting in Dynamic Graphs

Triangle counting is an important building block for finding key players in a graph. It is an integral part of the popular clustering coefficient analytic and can be used for pattern matching in social networks. A triangle, which is also a 3-clique, …

Introduction to EMBRACE Workshop

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

Quickly finding a truss in a haystack

The k-truss of a graph is a subgraph such that each edge is tightly connected to the remaining elements in the k-truss. The k-truss of a graph can also represent an important community in the graph. Finding the k-truss of a graph can be done in a …

Streaming Graph Sampling with Size Restrictions

Many graph datasets originating from online social network, financial or biological sources are too large to store or analyze. The analysis of such networks may be made more tractable if they are reduced to smaller subgraphs via sampling. While most …

When Good Enough Is Better: Energy-Aware Scheduling for Multicore Servers

Power is a primary concern for mobile, cloud, and high-performance computing applications. Approximate computing refers to running applications to obtain results with tolerable errors under resource constraints, and it can be applied to balance …

A local measure of community change in dynamic graphs

In this work we present a new local, vertex-level measure of community change. Our measure detects vertices that change community membership due to the actions (edges) of a vertex itself and not only due to global community shifts. The local nature …

A Memory and Time Scalable Parallelization of the Reptile Error-Correction Code

This paper details a distributed memory implementation of Reptile, a scalable and accurate spectrum based error-correction method. Reptile uses both k-mer and adjoining k-mers (called tiles) information along with the quality scores of bases to …

Aging data in dynamic graphs: A comparative study

Dynamic graphs are used to represent changing relational data. In order to create a dynamic graph representing relationships or interactions over time, it is necessary to choose a method of adding new data and removing, or otherwise de-emphasizing, …

cuSTINGER: Supporting dynamic graph algorithms for GPUs

cuSTINGER, a new graph data structure targeting NVIDIA GPUs is designed for streaming graphs that evolve over time. cuSTINGER enables algorithm designers greater productivity and efficiency for implementing GPU-based analytics, relieving programmers …

GABB 2016 Keynote

Emerging real-world graph problems include: detecting community structure in large social networks; improving the resilience of the electric power grid; and detecting and preventing disease in human populations. Unlike traditional applications in …

HiCOMB Introduction and Committees

Provides a listing of current committee members and society officers.

Mathematical foundations of the GraphBLAS

The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that can be used to …

New stopping criteria for spectral partitioning

Spectral partitioning (clustering) algorithms use eigenvectors to solve network analysis problems. The relationship between numerical accuracy and network mining quality is insufficiently understood. We show that analyzing numerical accuracy and …

Semantic database applications at the Samtavro Cemetery, Georgia

In 2013 a paper was offered to the CAA concerning archaeological legacy data and semantic database applications, with some preliminary results for a study conducted into the Samtavro cemetery, situated in the South Caucasus in the modern republic of …

A Dynamic Algorithm for Local Community Detection in Graphs

A variety of massive datasets, such as social networks and biological data, are represented as graphs that reveal underlying connections, trends, and anomalies. Community detection is the task of discovering dense groups of vertices in a graph. Its …

A fast, energy-efficient abstraction for simultaneous breadth-first searches

Optimized GPU kernels are sufficiently complicated to write that they often are specialized to input data, target architectures, or applications. This paper presents a multi-search abstraction for computing multiple breadth-first searches in parallel …

A Methodology for Co-Location Aware Application Performance Modeling in Multicore Computing

As multicore processor architectures are now prevalent in server nodes of parallel and distributed computing systems, it has become important to characterize the performance of applications run on these architectures. This study investigates the …

Fast Execution of Simultaneous Breadth-First Searches on Sparse Graphs

The construction of efficient parallel graph algorithms is important for quickly solving problems in areas such as urban planning, social network analysis, and hardware verification. Existing GPU implementations of graph algorithms tend to be …

Fast Incremental Community Detection on Dynamic Graphs

Community detection, or graph clustering, is the problem of finding dense groups in a graph. This is important for a variety of applications, from social network analysis to biological interactions. While most work in community detection has focused …

HiCOMB Introduction and Committees

Provides a listing of current committee members and society officers. The conference also offers a note of thanks and lists its reviewers.

Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures

Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an …

ParLearning Keynotes

Keynote Abstracts

A Lin-Kernighan Heuristic for the DCJ Median Problem of Genomes with Unequal Contents

In this paper, we designed a distance metric as DCJ-Indel-Exemplar distance to estimate the dissimilarity between two genomes with unequal contents (with gene insertions/deletions (Indels) and duplications). Based on the aforementioned distance …

A performance evaluation of open source graph databases

With the proliferation of large, irregular, and sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query …

Designing a Heuristic Cross-Architecture Combination for Breadth-First Search

Breadth-First Search (BFS) is widely used in real-world applications including computational biology, social networks, and electronic design automation. The most effective BFS approach has been shown to be a combination of top-down and bottom-up …

GABB Introduction

The Basic Linear Algebra Subprograms (BLAS), introduced over 30 years ago, had a transformative effect on linear algebra. By building Linear Algebra algorithms from a common set of highly optimized building blocks, researchers spend less time mapping …

HiCOMB Introduction and Committees

Provides a listing of current committee members and society officers.

Load balanced clustering coefficients

Clustering coefficients is a building block in network sciences that offers insights on how tightly bound vertices are in a network. Effective and scalable parallelization of clustering coefficients requires load balancing amongst the cores. This …

Optimizing energy consumption and parallel performance for static and dynamic betweenness centrality using GPUs

Applications of high-performance graph analysis range from computational biology to network security and even transportation. These applications often consider graphs under rapid change and are moving beyond HPC platforms into energy-constrained …

Revisiting Edge and Node Parallelism for Dynamic GPU Graph Analytics

Betweenness Centrality is a widely used graph analytic that has applications such as finding influential people in social networks, analyzing power grids, and studying protein interactions. However, its complexity makes its exact computation …

Scalable and High Performance Betweenness Centrality on the GPU (Best Student Paper Finalist)

Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is between ness centrality, which has applications in community …

A new parallel algorithm for connected components in dynamic graphs

Social networks, communication networks, business intelligence databases, and large scientific data sources now contain hundreds of millions elements with billions of relationships. The relationships in these massive datasets are changing at …

A statistical framework for streaming graph analysis

In this paper we propose a new methodology for gaining insight into the temporal aspects of social networks. In order to develop higher-level, large-scale data analysis methods for classification, prediction, and anomaly detection, a solid foundation …

Designing Hybrid Architectures for Massive-Scale Graph Analysis

Turning large volumes of data into actionable knowledge is a top challenge in high performance computing. Our previous work in this area demonstrated algorithmic techniques for massively parallel graph analysis on multithreaded systems. This work led …

Detecting insider threats in a real corporate database of computer usage activity

This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on …

Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality

High response quality is critical for many best-effort interactive services, and at the same time, reducing energy consumption can directly reduce the operational cost of service providers. In this paper, we study the quality-energy tradeoff for such …

Faster Clustering Coefficient Using Vertex Covers

Clustering coefficients, also called triangle counting, is a widely-used graph analytic for measuring the closeness in which vertices cluster together. Intuitively, clustering coefficients can be thought of as the ratio of common friends versus all …

HiCOMB Introduction

Presents the introductory welcome message from the conference proceedings.

Investigating Graph Algorithms in the BSP Model on the Cray XMT

Implementing parallel graph algorithms in large, shared memory machines, such as the Cray XMT, can be challenging for programmers. Synchronization, deadlock, hot spotting, and others can be barriers to obtaining linear scalability. Alternative …

Measuring the Sensitivity of Graph Metrics to Missing Data

The increasing energy consumption of high performance computing has resulted in rising operational and environmental costs. Therefore, reducing the energy consumption of computation is an emerging area of interest. We study the approach of data …

Multithreaded Community Monitoring for Massive Streaming Graph Data

Analyzing static snapshots of massive, graph-structured data cannot keep pace with the growth of social networks, financial transactions, and other valuable data sources. Current state-of-the-art industrial methods analyze these streaming sources …

Standards for graph algorithm primitives

It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. This paper is a position …

A Fast Algorithm for Streaming Betweenness Centrality

Analysis of social networks is challenging due to the rapid changes of its members and their relationships. For many cases it impractical to recompute the metric of interest, therefore, streaming algorithms are used to reduce the total runtime …

Analysis of streaming social networks and graphs on multicore architectures

Analyzing static snapshots of massive, graph-structured data cannot keep pace with the growth of social networks, financial transactions, and other valuable data sources. We introduce a framework, STING (Spatio-Temporal Interaction Networks and …

Enhancing Cache Coherent Architectures with access patterns for embedded manycore systems

One of the key challenges in advanced micro-architecture is to provide high performance hardware-components that work as application accelerators. In this paper, we present a Cache Coherent Architecture that optimizes memory accesses to patterns …

GPU Merge Path: A GPU Merging Algorithm

Graphics Processing Units (GPUs) have become ideal candidates for the development of fine-grain parallel algorithms as the number of processing elements per GPU increases. In addition to the increase in cores per system, new memory hierarchies and …

HCW 2012 Keynote Talk: Analyzing massive data using heterogeneous computing

HiCOMB Introduction

Presents the introductory welcome message from the conference proceedings.

Scalable Multi-threaded Community Detection in Social Networks

The volume of existing graph-structured data requires improved parallel tools and algorithms. Finding communities, smaller sub graphs densely connected within the sub graph than to the rest of the graph, plays a role both in developing new parallel …

STINGER: High performance data structure for streaming graphs (Best Paper Award)

The current research focus on “big data” problems highlights the scale and complexity of analytics required and the high rate at which data may be changing. In this paper, we present our high performance, scalable and portable software, …

Task-based parallel breadth-first search in heterogeneous environments

Breadth-first search (BFS) is an essential graph traversal strategy widely used in many computing applications. Because of its irregular data access patterns, BFS has become a non-trivial problem hard to parallelize efficiently. In this paper, we …

A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters

High energy consumption has become a critical problem for supercomputer systems. GPU clusters are becoming an increasingly popular architecture for building supercomputers because of its great improvement in performance. In this paper, we first …

HiCOMB Introduction

Presents the introductory welcome message from the conference proceedings.

Parallel Community Detection for Massive Graphs

Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to …

Semantic Databases and Supercomputers

Tracking Structure of Streaming Social Networks

Current online social networks are massive and still growing. For example, Face book has over 500 million active users sharing over 30 billion items per month. The scale within these data streams has outstripped traditional graph analysis methods. …

Modularity and Graph Algorithms

A number of graph partitioning algorithms are based on the concept of modularity. In particular Clauset, Newman and Moore (CNM) have developed a greedy agglomerative graph partitioning algorithm that scales well but is known to have several flaws. …

Analyzing Massive Social Networks Using Multicore and Multithreaded Architectures

Emerging real-world graph problems include detecting community structure in large social networks, improving the resilience of the electric power grid, and detecting and preventing disease in human populations. Unlike traditional applications in …

Evaluating Cell/B.E software cache for ClustalW

This paper evaluates the performance of the bioinformatics application ClustalW developed on Cell Broadband Engine(TM) (Cell/B.E.) using a software data cache for SPEs, instead of explicit DMA transfers. The software cache of the SPEs, once it has …

GPUMemSort: A High Performance Graphic Co-processors Sorting Algorithm for Large Scale In-Memory Data

HiCOMB 2010: Message from the workshop chairs

Welcome to the 9th International Workshop on High Performance Computational Biology (HiCOMB). Computational Biology and related disciplines are fast emerging as an important area for academic research and industrial application. The large size of …

Large scale complex network analysis using the hybrid combination of a MapReduce cluster and a highly multithreaded system

Complex networks capture interactions among entities in various application areas in a graph representation. Analyzing large scale complex networks often answers important questions-e.g. estimate the spread of epidemic diseases-but also imposes …

Massive Social Network Analysis: Mining Twitter for Social Good

Social networks produce an enormous quantity of data. Facebook consists of over 400 million active users sharing over 5 billion pieces of information each month. Analyzing this vast quantity of unstructured data presents challenges for software and …

Massive streaming data analytics: A case study with clustering coefficients

We present a new approach for parallel massive graph analysis of streaming, temporal data with a dynamic and extensible representation. Handling the constant stream of new data from health care, security, business, and social network applications …

Message from general chair

Welcome to Atlanta, Georgia and to the 24th International Parallel and Distributed Processing Symposium. It has been my honor to spend several years of planning for Georgia Institute of Technology to host this event and to work with the Computer …

On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment

The accuracy of Conditional Random Fields (CRF) is achieved at the cost of huge amount of computation to train model. In this paper we designed the parallelized algorithm for the Gradient Ascent based CRF training methods for biological sequence …

Scalable Graph Exploration on Multicore Processors

Many important problems in computational sciences, social network analysis, security, and business analytics, are data-intensive and lend themselves to graph-theoretical analyses. In this paper we investigate the challenges involved in exploring very …

A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets

We present a new lock-free parallel algorithm for computing betweenness centrality of massive complex networks that achieves better spatial locality compared with previous approaches. Betweenness centrality is a key kernel in analyzing the importance …

A Partition-Merge Based Cache-Conscious Parallel Sorting Algorithm for CMP with Shared Cache

To explore chip-level parallelism, the PSC (Parallel Shared Cache) model is provided in this paper to describe high performance shared cache of Chip Multi-Processors (CMP). Then for a specific application, parallel sorting, a cache-conscious parallel …

An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs

Due to power wall, memory wall, and ILP wall, we are facing the end of ever increasing single-threaded performance. For this reason, multicore and manycore processors are arising as a new paradigm to pursue. However, to fully exploit all the cores in …

Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis

Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socio-economic interactions, social networking Web sites, communication traffic, and scientific computing can be intuitively modeled as graphs. …

Faster FAST: Multicore Acceleration of Streaming Financial Data (Best Paper Award)

Generalizing k-Betweenness Centrality Using Short Paths and a Parallel Multithreaded Implementation

We present a new parallel algorithm that extends and generalizes the traditional graph analysis metric of betweenness centrality to include additional non-shortest paths according to an input parameter k. Betweenness centrality is a useful kernel for …

GTfold: a scalable multicore code for RNA secondary structure prediction

The prediction of the correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n4), so the computational requirements …

Simulating Individual-Based Models of Epidemics in Hierarchical Networks

Current mathematical modeling methods for the spreading of infectious diseases are too simplified and do not scale well. We present the Simulator of Epidemic Evolution in Complex Networks (SEECN), an efficient simulator of detailed individual-based …

Understanding the design trade-offs among current multicore systems for numerical computations

In this paper, we empirically evaluate fundamental design trade-offs among the most recent multicore processors and accelerator technologies. Our primary aim is to aid application designers in better mapping their software to the most suitable …

A Prediction Based CMP Cache Migration Policy

The large L2 cache's access latency, which is mainly caused by wire delay, is a critical problem to improve the performance of CMP (Chip Multi-Processor) in NUCA (Non-Uniform Cache Architecture). A CMP L2 cache accessing performance model is provided …

DOSA: design optimizer for scientific applications

In this paper we briefly introduce our new framework, called "design optimizer for scientific applications" (DOSA) which allows the programmer or compiler writer to explore alternative designs and optimize for speed (or power) at design-time and use …

Financial modeling on the cell broadband engine

High performance computing is critical for financial markets where analysts seek to accelerate complex optimizations such as pricing engines to maintain a competitive edge. In this paper we investigate the performance of financial workloads on the …

High performance MPEG-2 software decoder on the cell broadband engine

The Sony-Toshiba-IBM Cell Broadband Engine is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-processing units (SPEs) integrated on-chip. While the Cell/B.E. processor is designed with …

On the Design of Fast Pseudo-Random Number Generators for the Cell Broadband Engine and an Application to Risk Analysis

Numerical simulations in computational physics, biology, and finance, often require the use of high quality and efficient parallel random number generators. We design and optimize several parallel pseudo random number generators on the cell broadband …

Optimizing JPEG2000 Still Image Encoding on the Cell Broadband Engine

JPEG2000 is the latest still image coding standard from the JPEG committee, which adopts new algorithms such as embedded block coding with optimized truncation (EBCOT) and discrete wavelet transform (DWT). These algorithms enable superior coding …

Petascale Computing for Large-Scale Graph Problems

Graph theoretic problems are representative of fundamental kernels in traditional and emerging computational sciences such as chemistry, biology, and medicine, as well as applications in national security. Yet they pose serious challenges for …

SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks

We present SNAP (small-world network analysis and partitioning), an open-source graph framework for exploratory study and partitioning of large-scale networks. To illustrate the capability of SNAP, we discuss the design, implementation, and …

Lecture on Progress toward Petascale Applications in Bioinformatics and Computational Biology

Over the past several years there have been repeated analyses of the potential value of petascale bioinformatics and computational biology applications, as well as analyses of the system engineering steps required to implement applications and …

A Graph-Theoretic Analysis of the Human Protein-Interaction Network Using Multicore Parallel Algorithms

Protein-interaction network (PIN) analysis provides valuable insight into an organism's functional organization and evolutionary behavior. In this paper, we study a PIN formed by high-confidence human protein interactions obtained from various public …

Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture

We present a study of multithreaded implementations of Thorup's algorithm for solving the single source shortest path (SSSP) problem for undirected graphs. Our implementations leverage the fledgling multithreaded graph library (MTGL) to perform …

An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances

We present an experimental study of the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs using the Δ-stepping parallel algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel …

Approximating Betweenness Centrality

Betweenness is a centrality measure based on shortest paths, widely used in complex network analysis. It is computationally-expensive to exactly determine betweenness; currently the fastest-known algorithm by Brandes requires O(nm) time for …

DOSA: Design Optimizer for Scientific Applications

In this work, we propose an application composition system (ACS) that allows design-time exploration and automatic run-time optimizations so that we relieve application programmers and compiler writers from the challenging task of optimizing the …

FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine

The Fast Fourier Transform (FFT) is of primary importance and a fundamental kernel in many computationally intensive scientific applications. In this paper we investigate its performance on the Sony-Toshiba-IBM Cell Broadband Engine, a heterogeneous …

High-Performance Combinatorial Techniques for Analyzing Massive Dynamic Interaction Networks

Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socio-economic interactions, the world-wide web, communication networks, and scientific computing can be intuitively modeled as graphs. In this …

On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking

The Sony-Toshiba-IBM Cell Broadband Engine is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE), with eight SIMD coprocessing units (SPEs) integrated on-chip. We present a complexity model for designing …

Petascale Computing for Large-Scale Graph Problems

Graph theoretic problems are representative of fundamental kernels in traditional and emerging computational sciences such as chemistry, biology, and medicine, as well as applications in national security. Yet they pose serious challenges for …

Petascale Computing for Large-Scale Graph Problems

Graph theoretic problems are representative of fundamental kernels in traditional and emerging computational sciences such as chemistry, biology, and medicine, as well as applications in national security. Yet they pose serious challenges for …

SWARM: A Parallel Programming Framework for Multicore Processors

Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued performance on multicore processors now requires the exploitation of …

Symposium Evening Tutorial: High-performance Computing Methods for Computational Genomics

Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors

Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. …

Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

Graph abstractions are extensively used to understand and solve challenging computational problems in various scientific and engineering domains. They have particularly gained prominence for applications involving large-scale networks. In this paper, …

Efficient Implementation of Irregular Algorithms on Cell Multi-core Architecture. Poster Session.

ExactMP: An Efficient Parallel Exact Solver for Phylogenetic Tree Reconstruction Using Maximum Parsimony

Constructing phylogenetic trees in the study of the evolutionary history of a group organisms is an extremely challenging problem in computational biology. The problem becomes intractable with growing number of organisms. In this paper, we design and …

High-performance computing methods for computational genomics

The high computational requirements of several applications in computational genomics are aggravated by an exponential growth in biological databases. This tutorial will provide a detailed introduction to high-performance computing methods designed …

Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks

This paper discusses fast parallel algorithms for evaluating several centrality indices frequently used in complex network analysis. These algorithms have been optimized to exploit properties typically observed in real-world large scale networks, …

Parallel Shortest Path Algorithms for Solving Large-Scale Instances

We present an experimental study of parallel algorithms for solving the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs. We implement Meyer and Sander’s ∆-stepping algorithm and report performance …

Performance analysis of parallel programs via message-passing graph traversal

The ability to understand the factors contributing to parallel program performance are vital for understanding the impact of machine parameters on the performance of specific applications. We propose a methodology for analyzing the performance …

A Cache-Aware Parallel Implementation of the Push-Relabel Network Flow Algorithm and Experimental Evaluation of the Gap Relabeling Heuristic

The maximum flow problem is a combinatorial problem of significant importance in a wide variety of research and commercial applications. It has been extensively studied and implemented over the past 40 years. The push-relabel method has been shown to …

An Empirical Analysis of Parallel Random Permutation Algorithms on SMPs

We compare parallel algorithms for random permutation generation on symmetric multiprocessors (SMPs). Algorithms considered are the sorting-based algorithm, Anderson's shuffling algorithm, the dart-throwing algorithm, and Sanders' algorithm. We …

An Experimental Study of Parallel Biconnected Components Algorithms on Symmetric Multiprocessors (SMPs)

We present an experimental study of parallel biconnected components algorithms employing several fundamental parallel primitives, e.g., prefix sum, list ranking, sorting, connectivity, spanning tree, and tree computations. Previous experimental …

BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications

The exponential growth in the amount of genomic data has spurred growing interest in large scale analysis of genetic information. Bioinformatics applications, which explore computational methods to allow researchers to sift through the massive …

BioSPLASH: A sample workload from bioinformatics and computational biology for optimizing next-generation high-performance computer systems. Poster Session.

Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors (HiPC Most Impactful Papers Award)

Graph theoretic problems are representative of fundamental computations in traditional and emerging scientific disciplines like scientific computing and computational biology, as well as applications in national security. We present our design and …

High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology

Many large-scale optimization problems rely on graph theoretic solutions; yet high-performance computing has traditionally focused on regular applications with high degrees of locality. We describe our novel methodology for designing and implementing …

Incorporating life sciences applications in the architectural optimizations of next-generation petaflop-system

Advances in experimental techniques have transformed biology into a data-intensive science, with a rapid explosion of data at the genomic and proteomic level. Few comprehensive suites of computationally-intensive life science applications are …

On the Architectural Requirements for Efficient Execution of Graph Algorithms

Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric …

A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs)

We focus on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of …

A Novel FDTD Application Featuring OpenMP-MPI Hybrid Parallelization

We have developed a high performance hybridized parallel finite difference time domain (FDTD) algorithm featuring both OpenMP shared memory programming and MPl message passing. Our goal is to effectively model the optical characteristics of a novel …

A Parallel State Assignment Algorithm for Finite State Machines

This paper summarizes the design and implementation of a parallel algorithm for state assignment of large Finite State Machines (FSMs). High performance CAD tools are necessary to overcome the computational complexity involved in the optimization of …

Fast Shared-Memory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs

Minimum spanning tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical …

High Performance Bioinformatics

Bioinformatics is the science of managing, mining, and interpreting information from biological sequences and structures. Genome sequencing projects have contributed to an exponential growth in complete and partial sequence databases. Similarly, the …

Lock-Free Parallel Algorithms: An Experimental Study

Lock-free shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lock-free data structures include increasing fault tolerance of a (possibly heterogeneous) system and alleviating …

The Euler Tour Technique and Parallel Rooted Spanning Tree

Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find …

Broadcast on Clusters of SMPs with Optimal Concurrency

Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs)

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory …

HiCOMB 2002: Workshop Introduction

New approaches for using gene order data in phylogeny reconstruction

The rapid accumulation of whole genome sequences for a wide diversity of taxa is generating a huge amount of comparative data for biologists. The availability of whole genome sequences is providing a new set of molecular characters for phylogenetic …

A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study

Hannenhalli and Pevzner gave the first polynomial-time algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation …

A New Implmentation and Detailed Study of Breakpoint Analysis

Phylogenies derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Yet very few techniques are available for such phylogenetic reconstructions. One method is breakpoint analysis, …

High-Performance Algorithm Engineering for Computational Phylogenetics

Phylogeny reconstruction from molecular data poses complex optimization problems: almost all optimization models are NP-hard and thus computationally intractable. Yet approximations must be of very high quality in order to avoid outright biological …

Industrial applications of high-performance computing for phylogeny reconstruction

Phylogenies (that is, tree-of-life relationships) derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Real-world interest is strong in determining these relationships. For example, …

Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory …

Variation in vegetation growth rates: Implications for the evolution of semi-arid landscapes

An Improved Randomized Selection Algorithm With an Experimental Study

High-Performance Algorithms and Applications for SMP Clusters

The future of high-performance computing relies on the efficient and scalable use of clusters with symmetric multiprocessor (SMP) nodes and low-latency, high-bandwidth interconnection networks. Current examples of such platforms include Sun Ultra HPC …

Tutorial A: Design and Analysis of High Performance Clusters

A hierarchical data archiving and processing system to generate custom tailored products from AVHRR data

A novel indexing scheme is described to catalogue satellite data on a pixel basis. The objective of this research is to develop an efficient methodology to archive, retrieve and process satellite data, so that data products can be generated to meet …

Design and Analysis of the Alliance/University of New Mexico Roadrunner Linux SMP SuperCluster

This paper discusses high performance clustering from a series of critical topics: architectural design, system software infrastructure, and programming environment. This is accomplished through an overview of a large scale, high performance …

Parallel Algorithms for Image Enhancement and Segmentation by Region Growing with an Experimental Study

Presents efficient and portable implementations of a useful image enhancement process, the symmetric neighborhood filter (SNF), and an image segmentation technique which makes use of the SNF and a variant of the conventional connected components …

Parallel Algorithms for Personalized Communication and Sorting with an Experimental Study (Extended Abstract)

A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which execute efficiently on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become …

Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection

A common statistical problem is that of finding the median element in a set of data. This paper presents a fast and portable parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our …

Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study

This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use …