Distance Computation in the Space of Phylogenetic Trees
A phylogenetic tree represents the evolutionary history of a set of organisms. There are many different methods to construct phylogenetic trees from biological data. To either compare one such algorithm with another, or to find the likelihood that a certain tree is generated from the data, researchers need to be able to compute the distance between trees. In 2001, Billera, Holmes, and Vogtmann introduced a space of phylogenetic trees, and defined the distance between two trees to be the length of the shortest path between them in that space. We use the combinatorial and geometric properties of the tree space to develop two algorithms for computing this geodesic distance. In doing so, we show that the possible shortest paths between two trees can be compactly represented by a partially ordered set. We calculate the shortest distance between the start and target trees for each potential path by converting the problem into one of finding the shortest path through a certain subspace of Euclidean space. In particular, we show there is a linear time algorithm for finding the shortest path between a point in the all positive orthant and a point in the all negative orthant of R^k contained in the subspace of R^k consisting of all orthants with the first i coordinates non- positive and the remaining coordinates non-negative for 0 <= i <= k. This case is of interest, because the general problem of finding a shortest path through higher dimensional Euclidean space with obstacles is NP-hard. The resulting algorithms for computing the geodesic distance appear to be the best available to date.
phylogenetic trees; tree space; geodesic distance; Euclidean shortest path; combinatorics; computational geometry
dissertation or thesis