Large fault-tolerant network systems with high Quality of Service (QoS) guarantee are critical in many real world applications and entail diverse replica placement problems. In this paper, the replica placement problem in terms of minimizing the …

Identifying anomalies in millions of stars in real time is a great challenge. In this paper, we develop a matched filtering based algorithm to detect a typical anomaly, microlensing. The algorithm can detect short timescale microlensing events with …

This Scale free networks are abundant in many natural, social, and engineering phenomena for which there exists a substantial corpus of theory able to elucidate many of their underlying properties. In this paper we study the scalability of some …

The lines between data science (DS), machine learning (ML), deep learning (DL), and data mining continue to be blurred and removed. This is great as it ushers in vast amounts of capabilities, but it brings increased complexity and a vast number of …

Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we show that …

Counting common neighbors between all vertex pairs in a graph is a fundamental operation, with uses in similarity measures, link prediction, graph compression, community detection, and more. Current shared-memory approaches either rely on set …

The constant and massive influx of new data into analysis systems needs to be addressed without assuming we can pause the onslaught. Here we consider one aspect: non-stop graph analysis of streaming data. We formalize a new and practical algorithm …

List intersections are ubiquitous and can be found in a wide range of applications, including triangle counting and finding the maximal k-truss, both of which are part of the HPEC Static Graph Challenge. For many graph based problems it is necessary …

Sparse data computations are ubiquitous in science and engineering. Unlike their dense data counterparts, sparse data computations have less locality and more irregularity in their execution, making them significantly more challenging to parallelize …

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

Triangle counting is a building block for numerous graph applications and given the fact that graphs continue to grow in size, its scalability is important. As such, numerous algorithms have been designed for triangle counting - some of which are …

Emerging real-world graph problems include: detecting and preventing disease in human populations; revealing community structure in large social networks; and improving the resilience of the electric power grid. Unlike traditional applications in …

Many large datasets from several fields of research such as biology or society can be represented as graphs. Additionally in many real applications, data is constantly being produced, leading to the notion of dynamic graphs. A heavily studied problem …

Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness and betweenness, Katz centrality …

Many large datasets from a variety of fields of research can be represented as graphs. A common query is to identify the most important, or highly ranked, vertices in a graph. Centrality metrics are used to obtain numerical scores for each vertex in …

Dynamic graphs can capture changing relationships in many real datasets that evolve over time. One of the most basic questions about networks is the identification of the “most important” vertices in a network. Measures of vertex importance called …

PageRank is a fundamental graph algorithm to evaluate the importance of vertices in a graph. In this paper, we present an efficient parallel PageRank design based on an edge-centric scatter-gather model. To overcome the poor locality of PageRank and …

Triangle counting is an important building block for finding key players in a graph. It is an integral part of the popular clustering coefficient analytic and can be used for pattern matching in social networks. A triangle, which is also a 3-clique, …

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

The k-truss of a graph is a subgraph such that each edge is tightly connected to the remaining elements in the k-truss. The k-truss of a graph can also represent an important community in the graph. Finding the k-truss of a graph can be done in a …

Many graph datasets originating from online social network, financial or biological sources are too large to store or analyze. The analysis of such networks may be made more tractable if they are reduced to smaller subgraphs via sampling. While most …

Power is a primary concern for mobile, cloud, and high-performance computing applications. Approximate computing refers to running applications to obtain results with tolerable errors under resource constraints, and it can be applied to balance …

In this work we present a new local, vertex-level measure of community change. Our measure detects vertices that change community membership due to the actions (edges) of a vertex itself and not only due to global community shifts. The local nature …

This paper details a distributed memory implementation of Reptile, a scalable and accurate spectrum based error-correction method. Reptile uses both k-mer and adjoining k-mers (called tiles) information along with the quality scores of bases to …

Dynamic graphs are used to represent changing relational data. In order to create a dynamic graph representing relationships or interactions over time, it is necessary to choose a method of adding new data and removing, or otherwise de-emphasizing, …

cuSTINGER, a new graph data structure targeting NVIDIA GPUs is designed for streaming graphs that evolve over time. cuSTINGER enables algorithm designers greater productivity and efficiency for implementing GPU-based analytics, relieving programmers …

Emerging real-world graph problems include: detecting community structure in large social networks; improving the resilience of the electric power grid; and detecting and preventing disease in human populations. Unlike traditional applications in …

Provides a listing of current committee members and society officers.

The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that can be used to …

Spectral partitioning (clustering) algorithms use eigenvectors to solve network analysis problems. The relationship between numerical accuracy and network mining quality is insufficiently understood. We show that analyzing numerical accuracy and …

In 2013 a paper was offered to the CAA concerning archaeological legacy data and semantic database applications, with some preliminary results for a study conducted into the Samtavro cemetery, situated in the South Caucasus in the modern republic of …

A variety of massive datasets, such as social networks and biological data, are represented as graphs that reveal underlying connections, trends, and anomalies. Community detection is the task of discovering dense groups of vertices in a graph. Its …

Optimized GPU kernels are sufficiently complicated to write that they often are specialized to input data, target architectures, or applications. This paper presents a multi-search abstraction for computing multiple breadth-first searches in parallel …

As multicore processor architectures are now prevalent in server nodes of parallel and distributed computing systems, it has become important to characterize the performance of applications run on these architectures. This study investigates the …

The construction of efficient parallel graph algorithms is important for quickly solving problems in areas such as urban planning, social network analysis, and hardware verification. Existing GPU implementations of graph algorithms tend to be …

Community detection, or graph clustering, is the problem of finding dense groups in a graph. This is important for a variety of applications, from social network analysis to biological interactions. While most work in community detection has focused …

Provides a listing of current committee members and society officers. The conference also offers a note of thanks and lists its reviewers.

Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an …

Keynote Abstracts

In this paper, we designed a distance metric as DCJ-Indel-Exemplar distance to estimate the dissimilarity between two genomes with unequal contents (with gene insertions/deletions (Indels) and duplications). Based on the aforementioned distance …

With the proliferation of large, irregular, and sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query …

Breadth-First Search (BFS) is widely used in real-world applications including computational biology, social networks, and electronic design automation. The most effective BFS approach has been shown to be a combination of top-down and bottom-up …

The Basic Linear Algebra Subprograms (BLAS), introduced over 30 years ago, had a transformative effect on linear algebra. By building Linear Algebra algorithms from a common set of highly optimized building blocks, researchers spend less time mapping …

Provides a listing of current committee members and society officers.

Clustering coefficients is a building block in network sciences that offers insights on how tightly bound vertices are in a network. Effective and scalable parallelization of clustering coefficients requires load balancing amongst the cores. This …

Applications of high-performance graph analysis range from computational biology to network security and even transportation. These applications often consider graphs under rapid change and are moving beyond HPC platforms into energy-constrained …

Betweenness Centrality is a widely used graph analytic that has applications such as finding influential people in social networks, analyzing power grids, and studying protein interactions. However, its complexity makes its exact computation …

Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is between ness centrality, which has applications in community …

Social networks, communication networks, business intelligence databases, and large scientific data sources now contain hundreds of millions elements with billions of relationships. The relationships in these massive datasets are changing at …

In this paper we propose a new methodology for gaining insight into the temporal aspects of social networks. In order to develop higher-level, large-scale data analysis methods for classification, prediction, and anomaly detection, a solid foundation …

Turning large volumes of data into actionable knowledge is a top challenge in high performance computing. Our previous work in this area demonstrated algorithmic techniques for massively parallel graph analysis on multithreaded systems. This work led …

This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on …

High response quality is critical for many best-effort interactive services, and at the same time, reducing energy consumption can directly reduce the operational cost of service providers. In this paper, we study the quality-energy tradeoff for such …

Clustering coefficients, also called triangle counting, is a widely-used graph analytic for measuring the closeness in which vertices cluster together. Intuitively, clustering coefficients can be thought of as the ratio of common friends versus all …

Presents the introductory welcome message from the conference proceedings.

Implementing parallel graph algorithms in large, shared memory machines, such as the Cray XMT, can be challenging for programmers. Synchronization, deadlock, hot spotting, and others can be barriers to obtaining linear scalability. Alternative …

The increasing energy consumption of high performance computing has resulted in rising operational and environmental costs. Therefore, reducing the energy consumption of computation is an emerging area of interest. We study the approach of data …

Analyzing static snapshots of massive, graph-structured data cannot keep pace with the growth of social networks, financial transactions, and other valuable data sources. Current state-of-the-art industrial methods analyze these streaming sources …

It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. This paper is a position …

Analysis of social networks is challenging due to the rapid changes of its members and their relationships. For many cases it impractical to recompute the metric of interest, therefore, streaming algorithms are used to reduce the total runtime …

Analyzing static snapshots of massive, graph-structured data cannot keep pace with the growth of social networks, financial transactions, and other valuable data sources. We introduce a framework, STING (Spatio-Temporal Interaction Networks and …

One of the key challenges in advanced micro-architecture is to provide high performance hardware-components that work as application accelerators. In this paper, we present a Cache Coherent Architecture that optimizes memory accesses to patterns …

Graphics Processing Units (GPUs) have become ideal candidates for the development of fine-grain parallel algorithms as the number of processing elements per GPU increases. In addition to the increase in cores per system, new memory hierarchies and …

Presents the introductory welcome message from the conference proceedings.

The volume of existing graph-structured data requires improved parallel tools and algorithms. Finding communities, smaller sub graphs densely connected within the sub graph than to the rest of the graph, plays a role both in developing new parallel …

The current research focus on “big data” problems highlights the scale and complexity of analytics required and the high rate at which data may be changing. In this paper, we present our high performance, scalable and portable software, …

Breadth-first search (BFS) is an essential graph traversal strategy widely used in many computing applications. Because of its irregular data access patterns, BFS has become a non-trivial problem hard to parallelize efficiently. In this paper, we …

High energy consumption has become a critical problem for supercomputer systems. GPU clusters are becoming an increasingly popular architecture for building supercomputers because of its great improvement in performance. In this paper, we first …

Presents the introductory welcome message from the conference proceedings.

Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to …

Current online social networks are massive and still growing. For example, Face book has over 500 million active users sharing over 30 billion items per month. The scale within these data streams has outstripped traditional graph analysis methods. …

A number of graph partitioning algorithms are based on the concept of modularity. In particular Clauset, Newman and Moore (CNM) have developed a greedy agglomerative graph partitioning algorithm that scales well but is known to have several flaws. …

Emerging real-world graph problems include detecting community structure in large social networks, improving the resilience of the electric power grid, and detecting and preventing disease in human populations. Unlike traditional applications in …

This paper evaluates the performance of the bioinformatics application ClustalW developed on Cell Broadband Engine(TM) (Cell/B.E.) using a software data cache for SPEs, instead of explicit DMA transfers. The software cache of the SPEs, once it has …

Welcome to the 9th International Workshop on High Performance Computational Biology (HiCOMB). Computational Biology and related disciplines are fast emerging as an important area for academic research and industrial application. The large size of …

Complex networks capture interactions among entities in various application areas in a graph representation. Analyzing large scale complex networks often answers important questions-e.g. estimate the spread of epidemic diseases-but also imposes …

Social networks produce an enormous quantity of data. Facebook consists of over 400 million active users sharing over 5 billion pieces of information each month. Analyzing this vast quantity of unstructured data presents challenges for software and …

We present a new approach for parallel massive graph analysis of streaming, temporal data with a dynamic and extensible representation. Handling the constant stream of new data from health care, security, business, and social network applications …

Welcome to Atlanta, Georgia and to the 24th International Parallel and Distributed Processing Symposium. It has been my honor to spend several years of planning for Georgia Institute of Technology to host this event and to work with the Computer …

The accuracy of Conditional Random Fields (CRF) is achieved at the cost of huge amount of computation to train model. In this paper we designed the parallelized algorithm for the Gradient Ascent based CRF training methods for biological sequence …

Many important problems in computational sciences, social network analysis, security, and business analytics, are data-intensive and lend themselves to graph-theoretical analyses. In this paper we investigate the challenges involved in exploring very …

We present a new lock-free parallel algorithm for computing betweenness centrality of massive complex networks that achieves better spatial locality compared with previous approaches. Betweenness centrality is a key kernel in analyzing the importance …

To explore chip-level parallelism, the PSC (Parallel Shared Cache) model is provided in this paper to describe high performance shared cache of Chip Multi-Processors (CMP). Then for a specific application, parallel sorting, a cache-conscious parallel …

Due to power wall, memory wall, and ILP wall, we are facing the end of ever increasing single-threaded performance. For this reason, multicore and manycore processors are arising as a new paradigm to pursue. However, to fully exploit all the cores in …

Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socio-economic interactions, social networking Web sites, communication traffic, and scientific computing can be intuitively modeled as graphs. …

We present a new parallel algorithm that extends and generalizes the traditional graph analysis metric of betweenness centrality to include additional non-shortest paths according to an input parameter k. Betweenness centrality is a useful kernel for …

The prediction of the correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n4), so the computational requirements …

Current mathematical modeling methods for the spreading of infectious diseases are too simplified and do not scale well. We present the Simulator of Epidemic Evolution in Complex Networks (SEECN), an efficient simulator of detailed individual-based …

In this paper, we empirically evaluate fundamental design trade-offs among the most recent multicore processors and accelerator technologies. Our primary aim is to aid application designers in better mapping their software to the most suitable …

The large L2 cache's access latency, which is mainly caused by wire delay, is a critical problem to improve the performance of CMP (Chip Multi-Processor) in NUCA (Non-Uniform Cache Architecture). A CMP L2 cache accessing performance model is provided …

In this paper we briefly introduce our new framework, called "design optimizer for scientific applications" (DOSA) which allows the programmer or compiler writer to explore alternative designs and optimize for speed (or power) at design-time and use …

High performance computing is critical for financial markets where analysts seek to accelerate complex optimizations such as pricing engines to maintain a competitive edge. In this paper we investigate the performance of financial workloads on the …

The Sony-Toshiba-IBM Cell Broadband Engine is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-processing units (SPEs) integrated on-chip. While the Cell/B.E. processor is designed with …

Numerical simulations in computational physics, biology, and finance, often require the use of high quality and efficient parallel random number generators. We design and optimize several parallel pseudo random number generators on the cell broadband …

JPEG2000 is the latest still image coding standard from the JPEG committee, which adopts new algorithms such as embedded block coding with optimized truncation (EBCOT) and discrete wavelet transform (DWT). These algorithms enable superior coding …

Graph theoretic problems are representative of fundamental kernels in traditional and emerging computational sciences such as chemistry, biology, and medicine, as well as applications in national security. Yet they pose serious challenges for …

We present SNAP (small-world network analysis and partitioning), an open-source graph framework for exploratory study and partitioning of large-scale networks. To illustrate the capability of SNAP, we discuss the design, implementation, and …

Over the past several years there have been repeated analyses of the potential value of petascale bioinformatics and computational biology applications, as well as analyses of the system engineering steps required to implement applications and …

Protein-interaction network (PIN) analysis provides valuable insight into an organism's functional organization and evolutionary behavior. In this paper, we study a PIN formed by high-confidence human protein interactions obtained from various public …

We present a study of multithreaded implementations of Thorup's algorithm for solving the single source shortest path (SSSP) problem for undirected graphs. Our implementations leverage the fledgling multithreaded graph library (MTGL) to perform …

We present an experimental study of the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs using the Δ-stepping parallel algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel …

Betweenness is a centrality measure based on shortest paths, widely used in complex network analysis. It is computationally-expensive to exactly determine betweenness; currently the fastest-known algorithm by Brandes requires O(nm) time for …

In this work, we propose an application composition system (ACS) that allows design-time exploration and automatic run-time optimizations so that we relieve application programmers and compiler writers from the challenging task of optimizing the …

The Fast Fourier Transform (FFT) is of primary importance and a fundamental kernel in many computationally intensive scientific applications. In this paper we investigate its performance on the Sony-Toshiba-IBM Cell Broadband Engine, a heterogeneous …

Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socio-economic interactions, the world-wide web, communication networks, and scientific computing can be intuitively modeled as graphs. In this …

The Sony-Toshiba-IBM Cell Broadband Engine is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE), with eight SIMD coprocessing units (SPEs) integrated on-chip. We present a complexity model for designing …

Graph theoretic problems are representative of fundamental kernels in traditional and emerging computational sciences such as chemistry, biology, and medicine, as well as applications in national security. Yet they pose serious challenges for …

Graph theoretic problems are representative of fundamental kernels in traditional and emerging computational sciences such as chemistry, biology, and medicine, as well as applications in national security. Yet they pose serious challenges for …

Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued performance on multicore processors now requires the exploitation of …

Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. …

Graph abstractions are extensively used to understand and solve challenging computational problems in various scientific and engineering domains. They have particularly gained prominence for applications involving large-scale networks. In this paper, …

Constructing phylogenetic trees in the study of the evolutionary history of a group organisms is an extremely challenging problem in computational biology. The problem becomes intractable with growing number of organisms. In this paper, we design and …

The high computational requirements of several applications in computational genomics are aggravated by an exponential growth in biological databases. This tutorial will provide a detailed introduction to high-performance computing methods designed …

This paper discusses fast parallel algorithms for evaluating several centrality indices frequently used in complex network analysis. These algorithms have been optimized to exploit properties typically observed in real-world large scale networks, …

We present an experimental study of parallel algorithms for solving the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs. We implement Meyer and Sander’s ∆-stepping algorithm and report performance …

The ability to understand the factors contributing to parallel program performance are vital for understanding the impact of machine parameters on the performance of specific applications. We propose a methodology for analyzing the performance …

The maximum flow problem is a combinatorial problem of significant importance in a wide variety of research and commercial applications. It has been extensively studied and implemented over the past 40 years. The push-relabel method has been shown to …

We compare parallel algorithms for random permutation generation on symmetric multiprocessors (SMPs). Algorithms considered are the sorting-based algorithm, Anderson's shuffling algorithm, the dart-throwing algorithm, and Sanders' algorithm. We …

We present an experimental study of parallel biconnected components algorithms employing several fundamental parallel primitives, e.g., prefix sum, list ranking, sorting, connectivity, spanning tree, and tree computations. Previous experimental …

The exponential growth in the amount of genomic data has spurred growing interest in large scale analysis of genetic information. Bioinformatics applications, which explore computational methods to allow researchers to sift through the massive …

Graph theoretic problems are representative of fundamental computations in traditional and emerging scientific disciplines like scientific computing and computational biology, as well as applications in national security. We present our design and …

Many large-scale optimization problems rely on graph theoretic solutions; yet high-performance computing has traditionally focused on regular applications with high degrees of locality. We describe our novel methodology for designing and implementing …

Advances in experimental techniques have transformed biology into a data-intensive science, with a rapid explosion of data at the genomic and proteomic level. Few comprehensive suites of computationally-intensive life science applications are …

Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric …

We focus on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of …

We have developed a high performance hybridized parallel finite difference time domain (FDTD) algorithm featuring both OpenMP shared memory programming and MPl message passing. Our goal is to effectively model the optical characteristics of a novel …

This paper summarizes the design and implementation of a parallel algorithm for state assignment of large Finite State Machines (FSMs). High performance CAD tools are necessary to overcome the computational complexity involved in the optimization of …

Minimum spanning tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical …

Bioinformatics is the science of managing, mining, and interpreting information from biological sequences and structures. Genome sequencing projects have contributed to an exponential growth in complete and partial sequence databases. Similarly, the …

Lock-free shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lock-free data structures include increasing fault tolerance of a (possibly heterogeneous) system and alleviating …

Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find …

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory …

Hannenhalli and Pevzner gave the first polynomial-time algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation …

Phylogenies derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Yet very few techniques are available for such phylogenetic reconstructions. One method is breakpoint analysis, …

Phylogeny reconstruction from molecular data poses complex optimization problems: almost all optimization models are NP-hard and thus computationally intractable. Yet approximations must be of very high quality in order to avoid outright biological …

Phylogenies (that is, tree-of-life relationships) derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Real-world interest is strong in determining these relationships. For example, …

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory …

The future of high-performance computing relies on the efficient and scalable use of clusters with symmetric multiprocessor (SMP) nodes and low-latency, high-bandwidth interconnection networks. Current examples of such platforms include Sun Ultra HPC …

A novel indexing scheme is described to catalogue satellite data on a pixel basis. The objective of this research is to develop an efficient methodology to archive, retrieve and process satellite data, so that data products can be generated to meet …

This paper discusses high performance clustering from a series of critical topics: architectural design, system software infrastructure, and programming environment. This is accomplished through an overview of a large scale, high performance …

Presents efficient and portable implementations of a useful image enhancement process, the symmetric neighborhood filter (SNF), and an image segmentation technique which makes use of the SNF and a variant of the conventional connected components …

A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which exe- cute efficiently on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become …

A common statistical problem is that of finding the median element in a set of data. This paper presents a fast and portable parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our …

This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use …