High-Performance Algorithms for Phylogeny Reconstruction with Maximum Parsimony

Abstract

All biological disciplines are united by the idea that species share a common history. These relationships are crucial to the understanding of biological evolution and biological mechanisms in medical and pharmaceutical research. The evolutionary history is usually represented by a phylogeny, an unrooted, binary tree where each leaf represents a species. Phylogeny reconstruction, the problem of inferring the evolutionary relationships from the set of leaves by using sequence (e.g., from the DNA of nuclear or organelle genomes), morphological, or gene-order data, and a plausible model of evolution, is a fundamental problem in computational biology and is increasingly used in drug discovery, epidemiology, and genetic engineering [4]. Unfortunately, most problems in phylogeny reconstruction are proven to be NP-hard problems that can take years to solve on realistic datasets [8, 32]. Despite the large number of available tools and approaches, even moderate-sized datasets can require months or years of computation. Many biologists throughout the world compute phylogenies involving weeks or years of computation without necessarily finding global optima. Certainly more such computational analyses will be needed for larger datasets. The enormous computational demands in terms of time and storage for solving phylogenetic problems can only be met through high-performance computing.

Publication
Handbook of Computational Molecular Biology