Due to the limited capacity of GPU memory, the majority of prior work on graph applications on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware and software advances make it possible to address much larger host …

Parallel Alternating Criteria Search (PACS) relies on the combination of computer parallelism and Large Neighborhood Searches to attempt to deliver high quality solutions to any generic Mixed-Integer Program (MIP) quickly. While general-purpose …

Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in …

We present a parallel large neighborhood search framework for finding high quality primal solutions for general mixed-integer programs (MIPs). The approach simultaneously solves a large number of sub-MIPs with the dual objective of reducing …

A variety of large datasets, such as social networks or biological data, can be represented as graphs. A common query in graph analysis is to identify the most important vertices in a graph. Centrality metrics are used to obtain numerical scores for …

Many real-world datasets can be represented as graphs. Using iterative solvers to approximate graph centrality measures allows us to obtain a ranking vector on the nodes of the graph, consisting of a number for each vertex in the graph identifying …

We present a parallel local search approach for obtaining high quality solutions to the Fixed Charge Multicommodity Network Flow problem (FCMNF). The approach proceeds by improving a given feasible solution by solving restricted instances of the …

Breadth-First Search (BFS) is widely used in real-world applications including computational biology, social networks, and electronic design automation. The most effective BFS approach has been shown to be a combination of top-down and bottom-up …

Presents the introductory editorial for this issue of the publication.

Presents the introductory editorial for this issue of the publication.

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels are considered. In this …

Graphs and networks are prevalent in modeling relational datasets from many fields of research. By using iterative solvers to approximate graph measures (specifically Katz Centrality), we can obtain a ranking vector consisting of a number for each …

Analyzing massive graphs poses challenges due to the vast amount of data available. Extracting smaller relevant subgraphs allows for further visualization and analysis that would otherwise be too computationally intensive. Furthermore, many real data …

We describe a family of power models that can capture the nonuniform power effects of speed scaling among homogeneous cores on multicore processors. These models depart from traditional ones, which assume that individual cores contribute to power …

Many common methods for data analysis rely on linear algebra. We provide new results connecting data analysis error to numerical accuracy in the context of spectral graph partitioning. We provide pointwise convergence guarantees so that spectral …

Simulating binary black hole (BBH) systems are a computationally intensive problem and it can lead to great scientific discovery. How to explore more parallelism to take advantage of the large number of computing resources of modern supercomputers is …

Welcome to this year's first issue of IEEE Transactions on Parallel and Distributed Systems (TPDS). The author is privileged to serve our community proceeding into his third year as the editor-in-chief (EiC) of TPDS and looks forward to continuing …

This installment of Computer's series highlighting the work published in IEEE Computer Society journals comes from IEEE Transactions on Parallel and Distributed Systems.

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels (i.e., insertions and …

Multicore processors have become an integral part of modern large-scale and high-performance parallel and distributed computing systems. Unfortunately, applications co-located on multicore processors can suffer from decreased performance and …

A variety of massive datasets, such as social networks and biological data, are represented as graphs that reveal underlying connections, trends, and anomalies. Community detection is the task of discovering dense groups of vertices in a graph. Its …

This paper contributes a method for combining sparse parallel graph algorithms with dense parallel linear algebra algorithms in order to understand dynamic graphs including the temporal behavior of vertices. Our method is the first to cluster …

Presents the introductory editorial for this issue of the publication

The analysis of graphs has become increasingly important to a wide range of applications. Graph analysis presents a number of unique challenges in the areas of (1) software complexity, (2) data complexity, (3) security, (4) mathematical complexity, …

Many common methods for data analysis rely on linear algebra. We provide new results connecting data analysis error to numerical accuracy, which leads to the first meaningful stopping criterion for two way spectral partitioning. More generally, we …

Reports on the current state of the IEEE Transactions on Computers.

Serving as cache disks, flash-based solid-state drives (SSDs) can significantly boost the performance of read-intensive applications. However, frequent data updating, the necessary condition for classical replacement algorithms (e.g., LRU, MQ, LIRS, …

It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive …

I am excited with my new role as incoming Editor-in-Chief (EiC) of IEEE Transactions on Parallel and Distributed Systems (TPDS) and look forward to serving the community over the next several years. As a brief introduction, I am a full professor at …

With the proliferation of large irregular sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query …

Betweenness centrality is a graph analytic that states the importance of a vertex based on the number of shortest paths that it is on. As such, betweenness centrality is a building block for graph analysis tools and is used by many applications, …

The digital world has given rise to massive quantities of data that include rich semantic and complex networks. A social graph, for example, containing hundreds of millions of actors and tens of billions of relationships is not uncommon. Analyzing …

The study of genomes has been revolutionized by sequencing machines that output many short overlapping substrings (called reads). The task of sequence assembly in practice is to reconstruct long contiguous genome subsequences from the reads. With …

The problem of finding the median of three genomes is the key process in building the most parsimonious phylogenetic trees from genome rearrangement data. The median problem using Double-Cut-and-Join (DCJ) distance is NP-hard and the best exact …

DNA sequence analysis is fundamental to life science research. The rapid development of next generation sequencing (NGS) technologies, and the richness and diversity of applications it makes feasible, have created an enormous gulf between the …

The recent switch to multicore processors brought a dramatic change that affects a large spectrum of systems from embedded and general-purpose to high-end computing systems. Parallelism is forcing major changes in software development. The aim of …

Reducing energy consumption has been an important design issue for large-scale streaming media storage systems. Existing energy conservation techniques are inadequate to achieve high energy efficiency for streaming media computing environments due to …

Background Accurate and efficient RNA secondary structure prediction remains an important open problem in computational molecular biology. Historically, advances in computing technology have enabled faster and more accurate RNA secondary structure …

Maximum parsimony (MP) methods aim to reconstruct the phylogeny of extant species by finding the most parsimonious evolutionary scenario using the species' genome data. MP methods are considered to be accurate, but they are also computationally …

Modern multicore and manycore systems have the strong potential to deliver both high performance and high power efficiency. The large variance in memory access latency, resource sharing, and the heterogeneity of processor architectures in modern …

In this paper, we present a GPU-based sorting algorithm, GPUMemSort, which achieves high performance in sorting large-scale in-memory data by exploiting high-parallel GPU processors. It consists of two algorithms: an in-core algorithm, which is …

The 12 papers in this special issue on high-performance computing with accelerators discuss a range of different accelerator architectures and applications.

The Cell Broad Engine (BE) Processor has unique memory access architecture besides its powerful computing engines. Many computing-intensive applications have been ported to Cell/BE successfully. But memory-intensive applications are rarely …

Discrete transforms are of primary importance and fundamental kernels in many computationally intensive scientific applications. In this paper, we investigate the performance of two such algorithms; Fast Fourier Transform (FFT) and Discrete Wavelet …

By 2010, the global options and equity markets will average over 128 billion messages per day, amounting to trillions of dollars in trades. Trading systems, the backbone of the low-latency high-frequency business, need fundamental research and …

Due to fundamental physical limitations and power constraints, we are witnessing a paradigm shift in commodity microprocessor architecture to multicore designs. Continued performance now requires the exploitation of concurrency at the algorithm …

A regeneration-theory approach is undertaken to analytically characterize the average overall completion time in a distributed system. The approach considers the heterogeneity in the processing rates of the nodes as well as the randomness in the …

The Sony–Toshiba–IBM Cell Broadband Engine (Cell/B.E.) is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-processing units (SPEs) integrated on-chip. While the Cell/B.E. processor is …

In this paper, we address the problem of multiple sequence alignment (MSA) for handling very large number of proteins sequences on mesh-based multiprocessor architectures. As the problem has been conclusively shown to be computationally complex, we …

The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute the tree of life represents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the …

Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an irregular application, fine-grained synchronization is one technique …

One of the main objectives of the DARPA High Productivity Computing Systems (HPCS) program is to reassess the way we define and measure performance, programmability, portability, robustness and ultimately productivity in the High Performance …

Minimum spanning tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical …

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. Many PRAM algorithms can be adapted to SMPs with few modifications. Yet there …

We propose a framework for measuring the productivity of high performance computing (HPC) systems, based on common economic definitions of productivity and on utility theory. We discuss how these definitions can capture essential aspects of HPC …

A common statistical problem is that of finding the median element in a set of data. This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In …

Understanding evolution and the basic structure and function of proteins are two grand challenge problems in biology that can be solved only through the use of high-performance computing.

Srinivas Aluru is an associate professor and Associate Chair for Graduate Education in the Department of Electrical and Computer Engineering at Iowa State University. He is affiliated with the Laurence H. Baker Center for Bioinformatics and …

In this work, a generalized topology of block shift networks (BSNs), named generalized block shift network (GBSN), is propose or interconnection networks in clusters. The BSNs possess many desirable topological features, such as flexibility in node …

A phylogeny is the evolutionary history of a group of organisms; systematists (and other biologists) attempt to reconstruct this history from various forms of data about contemporary organisms. Phylogeny reconstruction is a crucial step in the …

Hannenhalli and Pevzner gave the first polynomial-time algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation …

Global and regional land cover studies need to apply complex models on selected subsets of large volumes of multi-sensor and multi-temporal data sets that have been derived from raw instrument measurements using widely accepted pre-processing …

Raw remotely sensed satellite data have to be processed andmapped into a standard projection in order to produce a multi-temporal data set which can then be used for regional or global Earth science studies. However, traditional methods of processing …

At first thought, one may not realize the need for a special issue of Parallel and Distributed Computing Practices focused on Cluster Computing when our community has already invested years of research and spent a wealth of resources on traditional …

We describe a methodology for developing high performance programs running on clusters of SMP nodes. The SMP cluster programming methodology is based on a small prototype kernel (SIMPLE) of collective communication primitives that make efficient use …

We introduce a new deterministic parallel sorting algorithm for distributed memory machines based on the regular sampling approach. The algorithm uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good …

Previous schemes for sorting on general-purpose parallel machines have had to choose between poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on …

This paper presents efficient and portable implementations of a powerful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique that makes use of the SNF and a variant of the conventional connected …

This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use …

A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become …

This article introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov random field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood …