2

Dynamics signature based anomaly detection

Identifying anomalies, especially weak anomalies in constantly changing targets, is more difficult than in stable targets. In this article, we borrow the dynamics metrics and propose the concept of dynamics signature (DS) in multi-dimensional feature …

Linux and Supercomputing: How my passion for building COTS systems led to an HPC revolution

David A. Bader built the first Linux Supercomputer.

Interactive Graph Stream Analytics in Arkouda

Data from emerging applications, such as cybersecurity and social networking, can be abstracted as graphs whose edges are updated sequentially in the form of a stream. The challenging problem of interactive graph stream analytics is the quick …

Traversing Large Graphs on GPUs with Unified Memory

Due to the limited capacity of GPU memory, the majority of prior work on graph applications on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware and software advances make it possible to address much larger host …

Editorial from the Editor-in-Chief

Tailoring parallel alternating criteria search for domain specific MIPs: Application to maritime inventory routing

Parallel Alternating Criteria Search (PACS) relies on the combination of computer parallelism and Large Neighborhood Searches to attempt to deliver high quality solutions to any generic Mixed-Integer Program (MIP) quickly. While general-purpose …

Accelerating GPU betweenness centrality

Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in …

Alternating criteria search: a parallel large neighborhood search algorithm for mixed integer programs

We present a parallel large neighborhood search framework for finding high quality primal solutions for general mixed-integer programs (MIPs). The approach simultaneously solves a large number of sub-MIPs with the dual objective of reducing …

Incrementally updating Katz centrality in dynamic graphs

A variety of large datasets, such as social networks or biological data, can be represented as graphs. A common query in graph analysis is to identify the most important vertices in a graph. Centrality metrics are used to obtain numerical scores for …

Numerically approximating centrality for graph ranking guarantees

Many real-world datasets can be represented as graphs. Using iterative solvers to approximate graph centrality measures allows us to obtain a ranking vector on the nodes of the graph, consisting of a number for each vertex in the graph identifying …

A parallel local search framework for the Fixed-Charge Multicommodity Network Flow problem

We present a parallel local search approach for obtaining high quality solutions to the Fixed Charge Multicommodity Network Flow problem (FCMNF). The approach proceeds by improving a given feasible solution by solving restricted instances of the …

Designing and implementing a heuristic cross-architecture combination for graph traversal

Breadth-First Search (BFS) is widely used in real-world applications including computational biology, social networks, and electronic design automation. The most effective BFS approach has been shown to be a combination of top-down and bottom-up …

Editor's Note

Presents the introductory editorial for this issue of the publication.

Editor's Note

Presents the introductory editorial for this issue of the publication.

Exemplar or Matching: Modeling DCJ Problems with Unequal Content Genome Data

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels are considered. In this …

Graph Ranking Guarantees for Numerical Approximations to Katz Centrality

Graphs and networks are prevalent in modeling relational datasets from many fields of research. By using iterative solvers to approximate graph measures (specifically Katz Centrality), we can obtain a ranking vector consisting of a number for each …

Local Community Detection in Dynamic Graphs Using Personalized Centrality

Analyzing massive graphs poses challenges due to the vast amount of data available. Extracting smaller relevant subgraphs allows for further visualization and analysis that would otherwise be too computationally intensive. Furthermore, many real data …

Modeling the Power Variability of Core Speed Scaling on Homogeneous Multicore Systems

We describe a family of power models that can capture the nonuniform power effects of speed scaling among homogeneous cores on multicore processors. These models depart from traditional ones, which assume that individual cores contribute to power …

Spectral partitioning with blends of eigenvectors

Many common methods for data analysis rely on linear algebra. We provide new results connecting data analysis error to numerical accuracy in the context of spectral graph partitioning. We provide pointwise convergence guarantees so that spectral …

A New Parallel Method for Binary Black Hole Simulations

Simulating binary black hole (BBH) systems are a computationally intensive problem and it can lead to great scientific discovery. How to explore more parallelism to take advantage of the large number of computing resources of modern supercomputers is …

Editor's Note

Welcome to this year's first issue of IEEE Transactions on Parallel and Distributed Systems (TPDS). The author is privileged to serve our community proceeding into his third year as the editor-in-chief (EiC) of TPDS and looks forward to continuing …

Evolving MPI+X Toward Exascale

This installment of Computer's series highlighting the work published in IEEE Computer Society journals comes from IEEE Transactions on Parallel and Distributed Systems.

Exemplar or matching: modeling DCJ problems with unequal content genome data

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels (i.e., insertions and …

HPC node performance and energy modeling with the co-location of applications

Multicore processors have become an integral part of modern large-scale and high-performance parallel and distributed computing systems. Unfortunately, applications co-located on multicore processors can suffer from decreased performance and …

Tracking local communities in streaming graphs with a dynamic algorithm

A variety of massive datasets, such as social networks and biological data, are represented as graphs that reveal underlying connections, trends, and anomalies. Community detection is the task of discovering dense groups of vertices in a graph. Its …

Behavioral clusters in dynamic graphs

This paper contributes a method for combining sparse parallel graph algorithms with dense parallel linear algebra algorithms in order to understand dynamic graphs including the temporal behavior of vertices. Our method is the first to cluster …

Editor's Note

Presents the introductory editorial for this issue of the publication

Graphs, Matrices, and the GraphBLAS: Seven Good Reasons

The analysis of graphs has become increasingly important to a wide range of applications. Graph analysis presents a number of unique challenges in the areas of (1) software complexity, (2) data complexity, (3) security, (4) mathematical complexity, …

Spectral Partitioning with Blends of Eigenvectors

Many common methods for data analysis rely on linear algebra. We provide new results connecting data analysis error to numerical accuracy, which leads to the first meaningful stopping criterion for two way spectral partitioning. More generally, we …

State of the Journal

Reports on the current state of the IEEE Transactions on Computers.

WEC: Improving Durability of SSD Cache Drives by Caching Write-Efficient Data

Serving as cache disks, flash-based solid-state drives (SSDs) can significantly boost the performance of read-intensive applications. However, frequent data updating, the necessary condition for classical replacement algorithms (e.g., LRU, MQ, LIRS, …

Introduction to Special Issue ALENEX'12

Standards for Graph Algorithm Primitives

It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive …

State of the Journal

I am excited with my new role as incoming Editor-in-Chief (EiC) of IEEE Transactions on Parallel and Distributed Systems (TPDS) and look forward to serving the community over the next several years. As a brief introduction, I am a full professor at …

A Brief Study of Open Source Graph Databases

With the proliferation of large irregular sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query …

Faster Betweenness Centrality Based on Data Structure Experimentation

Betweenness centrality is a graph analytic that states the importance of a vertex based on the number of shortest paths that it is on. As such, betweenness centrality is a building block for graph analysis tools and is used by many applications, …

GraphCT: Multithreaded Algorithms for Massive Graph Analysis

The digital world has given rise to massive quantities of data that include rich semantic and complex networks. A social graph, for example, containing hundreds of millions of actors and tens of billions of relationships is not uncommon. Analyzing …

Massive streaming data analytics: a graph-based approach

PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly

The study of genomes has been revolutionized by sequencing machines that output many short overlapping substrings (called reads). The task of sequence assembly in practice is to reconstruct long contiguous genome subsequences from the reads. With …

Streaming Breakpoint Graph Analytics for Accelerating and Parallelizing the Computation of DCJ Median of Three Genomes

The problem of finding the median of three genomes is the key process in building the most parsimonious phylogenetic trees from genome rearrangement data. The median problem using Double-Cut-and-Join (DCJ) distance is NP-hard and the best exact …

Sustainable Software Development for Next-Gen Sequencing (NGS) Bioinformatics on Emerging Platforms

DNA sequence analysis is fundamental to life science research. The rapid development of next generation sequencing (NGS) technologies, and the richness and diversity of applications it makes feasible, have created an enormous gulf between the …

ACM journal on experimental algorithmics special issue on multicore algorithms

The recent switch to multicore processors brought a dramatic change that affects a large spectrum of systems from embedded and general-purpose to high-end computing systems. Parallelism is forcing major changes in software development. The aim of …

Efficient Data Migration to Conserve Energy in Streaming Media Storage Systems

Reducing energy consumption has been an important design issue for large-scale streaming media storage systems. Existing energy conservation techniques are inadequate to achieve high energy efficiency for streaming media computing environments due to …

GTfold: Enabling parallel RNA secondary structure prediction on multi-core desktops

Background Accurate and efficient RNA secondary structure prediction remains an important open problem in computational molecular biology. Historically, advances in computing technology have enabled faster and more accurate RNA secondary structure …

Rec-DCM-Eigen: Reconstructing a Less Parsimonious but More Accurate Tree in Shorter Time

Maximum parsimony (MP) methods aim to reconstruct the phylogeny of extant species by finding the most parsimonious evolutionary scenario using the species' genome data. MP methods are considered to be accurate, but they are also computationally …

Algorithm Engineering Challenges in Multicore and Manycore Systems

Modern multicore and manycore systems have the strong potential to deliver both high performance and high power efficiency. The large variance in memory access latency, resource sharing, and the heterogeneity of processor architectures in modern …

GPUMemSort: A High Performance Graphic Co-processors Sorting Algorithm for Large Scale In-Memory Data

In this paper, we present a GPU-based sorting algorithm, GPUMemSort, which achieves high performance in sorting large-scale in-memory data by exploiting high-parallel GPU processors. It consists of two algorithms: an in-core algorithm, which is …

Guest Editor's Introduction: Special Issue on High-Performance Computing with Accelerators

The 12 papers in this special issue on high-performance computing with accelerators discuss a range of different accelerator architectures and applications.

On the random access performance of Cell Broadband Engine with graph analysis application

The Cell Broad Engine (BE) Processor has unique memory access architecture besides its powerful computing engines. Many computing-intensive applications have been ported to Cell/BE successfully. But memory-intensive applications are rarely …

Computing discrete transforms on the Cell Broadband Engine

Discrete transforms are of primary importance and fundamental kernels in many computationally intensive scientific applications. In this paper, we investigate the performance of two such algorithms; Fast Fourier Transform (FFT) and Discrete Wavelet …

Faster FAST: multicore acceleration of streaming financial data

By 2010, the global options and equity markets will average over 128 billion messages per day, amounting to trillions of dollars in trades. Trading systems, the backbone of the low-latency high-frequency business, need fundamental research and …

A graph-theoretic analysis of the human protein-interaction network using multicore parallel algorithms

Due to fundamental physical limitations and power constraints, we are witnessing a paradigm shift in commodity microprocessor architecture to multicore designs. Continued performance now requires the exploitation of concurrency at the algorithm …

Guest editorial: High-performance computational biology

Dynamic Load Balancing in Distributed Systems in the Presence of Delays: A Regeneration-Theory Approach

A regeneration-theory approach is undertaken to analytically characterize the average overall completion time in a distributed system. The approach considers the heterogeneity in the processing rates of the nodes as well as the randomness in the …

High performance combinatorial algorithm design on the Cell Broadband Engine processor

The Sony–Toshiba–IBM Cell Broadband Engine (Cell/B.E.) is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-processing units (SPEs) integrated on-chip. While the Cell/B.E. processor is …

On the design of high-performance algorithms for aligning multiple protein sequences on mesh-based multiprocessor architectures

In this paper, we address the problem of multiple sequence alignment (MSA) for handling very large number of proteins sequences on mesh-based multiprocessor architectures. As the problem has been conclusively shown to be computationally complex, we …

Computational Grand Challenges in Assembling the Tree of Life: Problems and Solutions

The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute the tree of life represents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the …

Designing irregular parallel algorithms with mutual exclusion and lock-free protocols

Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an irregular application, fine-grained synchronization is one technique …

Designing Scalable Synthetic Compact Applications for Benchmarking High Productivity Computing Systems

One of the main objectives of the DARPA High Productivity Computing Systems (HPCS) program is to reassess the way we define and measure performance, programmability, portability, robustness and ultimately productivity in the High Performance …

Editorial: Special Section on High-Performance Computational Biology

Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs

Minimum spanning tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical …

A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. Many PRAM algorithms can be adapted to SMPs with few modifications. Yet there …

A Framework for Measuring Supercomputer Productivity

We propose a framework for measuring the productivity of high performance computing (HPC) systems, based on common economic definitions of productivity and on utility theory. We discuss how these definitions can capture essential aspects of HPC …

An improved, randomized algorithm for parallel selection with an experimental study

A common statistical problem is that of finding the median element in a set of data. This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In …

Computational biology and high-performance computing

Understanding evolution and the basic structure and function of proteins are two grand challenge problems in biology that can be solved only through the use of high-performance computing.

Special Issue: High Performance Computational Biology

Guest Editor's Introduction: Special issue on high-performance computational biology

Srinivas Aluru is an associate professor and Associate Chair for Graduate Education in the Department of Electrical and Computer Engineering at Iowa State University. He is affiliated with the Laurence H. Baker Center for Bioinformatics and …

Generalized block shift network for clusters

In this work, a generalized topology of block shift networks (BSNs), named generalized block shift network (GBSN), is propose or interconnection networks in clusters. The BSNs possess many desirable topological features, such as flexibility in node …

High-Performance Algorithm Engineering for Computational Phylogenetics

A phylogeny is the evolutionary history of a group of organisms; systematists (and other biologists) attempt to reconstruct this history from various forms of data about contemporary organisms. Phylogeny reconstruction is a crucial step in the …

A Linear-Time Algorithm for Computing Inversion Distance Between Two Signed Permutations with an Experimental Study

Hannenhalli and Pevzner gave the first polynomial-time algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation …

Applications

High performance computing algorithms for land cover dynamics using remote sensing data

Global and regional land cover studies need to apply complex models on selected subsets of large volumes of multi-sensor and multi-temporal data sets that have been derived from raw instrument measurements using widely accepted pre-processing …

Kronos: A software system for the processing and retrieval of large-scale AVHRR data sets

Raw remotely sensed satellite data have to be processed andmapped into a standard projection in order to produce a multi-temporal data set which can then be used for regional or global Earth science studies. However, traditional methods of processing …

A New, Architectural Paradigm for High-performance Computing

At first thought, one may not realize the need for a special issue of Parallel and Distributed Computing Practices focused on Cluster Computing when our community has already invested years of research and spent a wealth of resources on traditional …

SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMPs)

We describe a methodology for developing high performance programs running on clusters of SMP nodes. The SMP cluster programming methodology is based on a small prototype kernel (SIMPLE) of collective communication primitives that make efficient use …

A New Deterministic Parallel Sorting Algorithm with an Experimental Evaluation

We introduce a new deterministic parallel sorting algorithm for distributed memory machines based on the regular sampling approach. The algorithm uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good …

A Randomized Parallel Sorting Algorithm with an Experimental Study

Previous schemes for sorting on general-purpose parallel machines have had to choose between poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on …

Parallel algorithms for image enhancement and segmentation by region growing, with an experimental study

This paper presents efficient and portable implementations of a powerful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique that makes use of the SNF and a variant of the conventional connected …

Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study

This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use …

Practical Parallel Algorithms for Personalized Communication and Integer Sorting

A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become …

Scalable data parallel algorithms for texture synthesis using Gibbs random fields

This article introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov random field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood …