A Lin-Kernighan Heuristic for the DCJ Median Problem of Genomes with Unequal Contents

Abstract

In this paper, we designed a distance metric as DCJ-Indel-Exemplar distance to estimate the dissimilarity between two genomes with unequal contents (with gene insertions/deletions (Indels) and duplications). Based on the aforementioned distance metric, we proposed the DCJ-Indel-Exemplar median problem, to find a median genome that minimize the DCJ-Indel-Exemplar distance between this genome and the given three genomes. We adapted Lin-Kernighan (LK) heuristic to calculate the median quickly by utilizing the features of adequate sub-graph decomposition and search space reduction technologies. Experimental results on simulated gene order data indicate that our distance estimator can closely estimate the real number of rearrangement events; while compared with the exact solver using equal content genomes, our median solver can get very accurate results as well. More importantly, our median solver can deal with Indels and duplications and generates results very close to the synthetic cumulative number of evolutionary events.

Publication
Computing and Combinatorics - 20th International Conference, COCOON 2014, Atlanta, GA, USA, August 4-6, 2014. Proceedings