Reducing the effect of the data order in algorithms for constructing phylogenetic trees
Abstract
A major objective of biological systematics is to infer phylogenies (evolutionary trees) from available data on the species under investigation. This involves searching an unknown tree from data generated by a stochastic process which operates along the tree. These evolutionary trees are binary trees where the tips are occupied by the species or OTUs (Operational Taxonomic Units), in our case represented by DNA or RNA sequences, and the positions joining them until the root by their hypothetical ancestors or HTUs (Hypothetical Taxonomic Units). Since the number of possible topologies for this tree is so great that it is not possible to examine them all in a reasonable time, methods to infer tree topologies are only approximate. A common problem with these algorithms is their dependence on data order. The program shown here describes an implementation to reduce this dependence in the Camin âSokal parsimony method, although it may also be used with other methods. The program was written in Pascal using a Turbo Pascal compiler (version 3.01A) from Borland International, to run on an IBM PC or true compatible. Lists of pointers were used to store sequences instead of arrays. The array is a data structure more commonly