Phylogenetics

In biology, phylogenetics is the science and methodology of inferring the evolutionary relationships of species &mdash; reconstructing the tree of life.

Phylogeneticists select an appropriate source of data &mdash; morphology, anatomy, biochemistry, DNA sequences, etc., generate data for the members of the study group, and then establish homology. This step is crucial to ensure that only comparable traits are actually compared in the analysis; insect wings and bird wings are, for example, not the same type of organ and should consequently not be scored as the same morphological character in a phylogenetic analysis. In the case of DNA sequences, software is available to establish homology between nucleotide positions in the sequence, a process that is known as alignment building. The result of these efforts is a two dimensional data matrix of characters by tree terminals (usually samples or species) that can be subjected to analysis, today nearly always with the help of specialised software.

The major phylogenetic methods
The simplest tree building method is based on Minimum Distance. Using a measure of distance between two terminals in the data matrix, perhaps genetic distance or number of morphological differences, a distance matrix is calculated. The two most similar or least distant terminals are united, and the new distance of the resulting branchlet from all other terminals is calculated. The process is repeated until all terminals are united in the tree. This method is by far the fastest but does not have an explicit evolutionary logic; it can just as well be used to generate tree relationships of soil types or sample plots. It is therefore rarely used for inferring phylogenetic relationships in serious scientific publications, but it is often used to generate "good enough" starting trees for the more sophisticated methods below. For the same reason, distance trees are generally called phenograms instead of phylogenies or similar names.

Parsimony analyses follow a very different logic. Assuming that the simplest explanation is, all else being equal, probably the correct one, the phylogeny needing the least number of character changes along the tree branches (the "shorter" tree) is preferred. This method is a cladistic analysis in the strict sense of the term. It rests on minimal assumptions and is reasonably fast but not very sophisticated.

The same logic of searching for the best tree characterises Maximum Likelihood analyses, but here the criterion for choosing the preferred tree is not minimum number of changes but highest likelihood given the data and a model of character evolution. Calculating likelihood scores across large phylogenies is computationally intensive, and consequently this method is generally slower than the previous one. It also requires the a priori selection of an appropriate model.

Finally, Bayesian phylogenetics is also model based, but instead of conducting a simple search for the one best tree it estimates the posterior probability of each possible group of terminals forming a branch with a Markov chain Monte Carlo (MCMC) approach. MCMC methods sample a large number of data points and are extremely time-consuming, making Bayesian phylogenetics the slowest method. It is also the approach with the steepest learning curve and involves the largest number of assumptions; in addition to model selection it requires the setting of numerous priors. Despite these drawbacks and thanks to increasing computing power, this methodology is increasingly popular.

Which method to choose depends on the study a phylogeneticist wants to conduct. Likelihood and Bayesian phylogenetics are today generally favoured for DNA sequence data. In the case of fossils, only morphological data are available, and if there is no appropriate model of character evolution available a parsimony analysis may be preferred. In practice, phylogeneticists tend to use at least two different methods to test if the results are congruent.

Polarising the tree
Most phylogenetic analyses return only an unrooted tree; it shows which terminals are close together and which are far apart, but not in what order lineage splits took place in evolutionary history. Phylogeneticists use various approaches to "root" or polarise the tree. The most widespread is outgroup rooting. Here, at least one additional species is included in the analysis that has been independently inferred to be outside of the ingroup (the real study group), and the tree is then drawn so that the split between outgroup and ingroup is at the base of the phylogeny. Some phylogenetic methods, however, do return a rooted tree.

Types of phylogenetic trees
When interpreting relevant research papers, it is important to be aware that phylogenetic trees can be depicted in very different ways. The information they convey differs accordingly.

Cladograms show only the relationships of tree terminals, but there are no branch lengths.

Phylograms have branch lengths that show the number of changes inferred to have taken place in the lineage &mdash; either morphological character shifts or genetic mutations, depending on the data.

Chronograms have branch lengths that are proportional to time, showing the inferred times of lineage splits in the past, usually with generous error bars. They are produced under the assumption of at least a rough "molecular clock" (the more time passes, the more mutations should accumulate) and are usually calibrated with fossils for minimum branch ages and geological events such as the rise of volcanic islands for maximum branch ages. Sometimes mutation rates estimated from other studies in the same group of organisms may be used for calibration.

Phylogenetic trees can also be drawn in very different ways and orientations, e.g. left to right, right to left, top to bottom or center to periphery (circular).

The underlying assumption and the charge of circularity
Based on the theory of common descent, phylogenetic methods assume that there is a tree-like structure to relationships. At a practical level, this means that they will generally return a bi- or multifurcating tree diagram of relationships even if there is no tree structure in reality, for example because the phylogeneticist has included species of hybrid origin into their analysis. The scientists in question need to make sure that they take cases like these into account.

Creationists sometimes criticise evolutionary biologists for making the above assumption and argue that phylogenetics is a case of circular reasoning: the tree of life reconstructed by phylogenetic studies lends no support to the theory of common descent because the methodology forces all data into a tree shape. This, however, is comparable to saying that the successful construction of a working airplane does not lend support to the science of aerodynamics because it was constructed using aerodynamic principles; if the assumptions were totally wrong, the plane would not fly.

Similarly, phylogenetics has successfully recovered largely congruent relationships using many independent data sources ranging from anatomy and morphology to all parts of the genome, and the results are consistent with those of palaeontology and biogeography. Such congruence between several independent types of data would be very implausible under any theory except common descent.