High-performance computing methods for computational genomics

Abstract

The high computational requirements of several applications in computational genomics are aggravated by an exponential growth in biological databases. This tutorial will provide a detailed introduction to high-performance computing methods designed to address various large-scale problems in computational genomics. First, we will describe mpiBLAST and ScalaBLAST, which are parallelizations of the NCBI BLAST suite of programs used for querying against large sequence databases. Next, we will describe PaCE, which is a parallel DNA sequence clustering algorithm with applications to clustering Expressed Sequence Tags and whole genome assembly. Next, we describe GRAPPA, which is a high-performance software suite developed for phylogenetic reconstruction of a collection of organisms or genes. Throughout the tutorial, emphasis will be on scalability and effectiveness in exploiting large-scale state-of-the-art supercomputing technologies.The intended audience are academic and industry researchers, educators, and/or commercial application developers, with a computational background. No background in biology is assumed.

Publication
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, November 11-17, 2006, Tampa, FL, USA