High-performance computing methods for computational genomics

Srinivas Aluru, David A. Bader, Anantharaman Kalyanaraman

January, 2006

Abstract

The high computational requirements of several applications in computational genomics are aggravated by an exponential growth in biological databases. This tutorial will provide a detailed introduction to high-performance computing methods designed to address various large-scale problems in computational genomics. First, we will describe mpiBLAST and ScalaBLAST, which are parallelizations of the NCBI BLAST suite of programs used for querying against large sequence databases. Next, we will describe PaCE, which is a parallel DNA sequence clustering algorithm with applications to clustering Expressed Sequence Tags and whole genome assembly. Next, we describe GRAPPA, which is a high-performance software suite developed for phylogenetic reconstruction of a collection of organisms or genes. Throughout the tutorial, emphasis will be on scalability and effectiveness in exploiting large-scale state-of-the-art supercomputing technologies.The intended audience are academic and industry researchers, educators, and/or commercial application developers, with a computational background. No background in biology is assumed.

Type

Conference paper

Publication

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, November 11-17, 2006, Tampa, FL, USA

High-performance computing methods for computational genomics

Abstract

David A. Bader

Distinguished Professor, Associate Dean for Research, and Director of the Institute for Data Science