Cladistics

In biology, cladistics (from ancient Greek κλάδος, klados, "branch"; originally called phylogenetic systematics) is a taxonomical technique for arranging organisms according to how they branch in the evolutionary tree of life.

A group of organisms is analysed and classified into a tree-like diagram called a cladogram, showing hypothesised lines of descent. The analysis may use morphological similarity (per earlier taxonomic methods), but most often DNA differences (molecular data) and biochemical data.

Cladistics has all but taken over from the older Linnaean taxonomy, which originated before the theories of evolution and common descent.

Cladistic techniques do not assume any particular theory of evolution, only the concept of descent with modification. As such, cladistic methods are usefully applied to non-biological systems, including historical linguistics and textual criticism. Even creationism had to create its own version of cladistics, called baraminology.

History
The school of thought now known as cladistics took inspiration from the work of Willi Hennig, though he did not use the word, calling his approach phylogenetic systematics. Hennig's work systematised techniques biologists had been using for decades.

The term "clade" was introduced in 1958 by Julian Huxley, "cladistic" by Cain and Harrison in 1960 and "cladist" (for an adherent of Hennig's school) by Mayr in 1965.

Cladograms


In a cladogram, all organisms lie at the leaves, and each inner node is ideally binary (two-way). The taxa on either side of a split are called sister taxa or sister groups. Each subtree (whether one item or a hundred thousand items) is called a clade. A natural group has all the organisms contained in any one clade that share a unique ancestor (one which they do not share with any other organisms on the diagram) for that clade. All of life forms a single clade.

Each clade is set off by a series of characteristics that appear in its members, but not in the other forms from which it diverged. These identifying characteristics of a clade are called synapomorphies (shared, derived characters). e.g., hardened front wings (elytra) are a synapomorphy of beetles, while circinate vernation, or the unrolling of new fronds, is a synapomorphy of ferns.

An example of a cladogram.
The cladogram, based on a 2008 DNA and protein analysis.

Cladistic classification


A monophyletic group is a clade, comprising an ancestral form and all of its descendants, and so forming one (and only one) evolutionary group.

A paraphyletic group is similar, but excludes some of the descendants that have undergone significant changes. For instance, the traditional class Reptilia excludes birds even though they evolved from the ancestral reptile. (The current clade Dinosauria includes birds.) Similarly, the traditional Invertebrates are paraphyletic because Vertebrates are excluded, although the latter evolved from an Invertebrate. Paraphyletic groups are not considered proper groups in systematic biology.

A group with members from separate evolutionary lines is called polyphyletic. For instance, the once-recognized Pachydermata was found to be polyphyletic because elephants and rhinoceroses arose from non-pachyderms separately. Evolutionary taxonomists consider polyphyletic groups to be errors in classification, often occurring because convergence or other homoplasy was misinterpreted as homology.

Cladistics v. Linnaean taxonomy
Since the 1960s, there has been a trend in biology called cladism or cladistic taxonomy that requires taxa to be clades. Cladists argue that the classification system should be reformed to eliminate all non-clades. Other taxonomists insist that groups reflect phylogenies and often make use of cladistic techniques, but allow both monophyletic and paraphyletic groups as taxa.

Following Hennig, cladists argue that paraphyly is as harmful as polyphyly. The idea is that monophyletic groups can be defined objectively, in terms of common ancestors or the presence of synapomorphies. In contrast, paraphyletic and polyphyletic groups are both defined based on key characters, and the decision of which characters are of taxonomic import is inherently subjective. Many argue that they lead to "gradistic" thinking, where groups advance from "lowly" grades to "advanced" grades, which can in turn lead to teleology. In evolutionary studies, teleology is usually avoided because it implies a plan that cannot be empirically demonstrated.

Going further, some cladists argue that ranks for groups above species are too subjective to present any meaningful information, and so argue that they should be abandoned. Thus they have moved away from Linnaean taxonomy towards a simple hierarchy of clades. The validity of this argument hinges crucially on how often in evolution gradualist near-equilibria are punctuated. A quasi-stable state will result in phylogenies which may be all but unmappable onto the Linnaean hierarchy, whereas a punctuation event that balances a taxon out of its ecological equilibrium is likely to lead to a split between clades that occurs in comparatively short time and thus lends itself readily for classification according to the Linnaean system.

Other evolutionary systematists argue that all taxa are inherently subjective, even when they reflect evolutionary relationships, since living things form an essentially continuous tree. Any dividing line is artificial, and creates both a monophyletic section above and a paraphyletic section below. Paraphyletic taxa are necessary for classifying earlier sections of the tree – for instance, the early vertebrates that would someday evolve into the family Hominidae cannot be placed in any other monophyletic family. They also argue that paraphyletic taxa provide information about significant changes in organisms' morphology, ecology, or life history – in short, that both taxa and clades are valuable but distinct notions, with separate purposes. Many use the term monophyly in its older sense, where it includes paraphyly, and use the alternate term holophyly to describe clades (monophyly in Hennig's sense). As an unscientific rule of thumb, if a distinct lineage that renders the containing clade paraphyletic has undergone marked adaptive radiation and collected many synapomorphies - especially ones that are radical and/or unprecedented -, the paraphyly is usually not considered a sufficient argument to prevent recognition of the lineage as distinct under the Linnaean system (but it is by definition sufficient in phylogenetic nomenclature). For example, as touched upon briefly above, the Sauropsida ("reptiles") and the Aves (birds) are both ranked as a Linnaean class, although the latter are a highly derived offshoot of some forms of the former which themselves were already quite advanced.

A formal code of phylogenetic nomenclature, the PhyloCode, is currently under development for cladistic taxonomy. It is intended for use by both those who would like to abandon Linnaean taxonomy and those who would like to use taxa and clades side by side. In several instances (see for example Hesperornithes) it has been employed to clarify uncertainties in Linnaean systematics so that in combination they yield a taxonomy that is unambiguously placing the group in the evolutionary tree in a way that is consistent with current knowledge.

How to do cladistics
A cladistic analysis is applied to a certain set of information. The information is organised by characters, which have character states. e.g., if one species has red feathers and another has blue feathers, then we have the character "colour of feathers" which has character states "red feathers" and "blue feathers".

The researcher decides which character states were present before the last common ancestor of the species group (plesiomorphies) and which were present in the last common ancestor (synapomorphies) by considering one or more outgroups - an organism considered not to be part of the group in question, but to be closely related to the group. (This makes the choice of an outgroup an important task, since this choice can profoundly change the topology of a tree.) Only synapomorphies are of use in determining clades.

Possible cladograms are then drawn up and evaluated. Ideally, clades have many "agreeing" synapomorphies, with a sufficient number of true synapomorphies to overwhelm homoplasies caused by convergent evolution - characters that resemble each other because of environmental conditions or function, not because of common ancestry. A character "presence of wings" is an example - though the wings of birds and insects serve the same function, each evolved independently, as can be seen by their anatomy. If a bird and a winged insect were scored for the character "presence of wings", a homoplasy would be introduced into the dataset and confound the analysis, possibly resulting in an erroneous picture of evolution. Homoplasies can often be avoided by defining characters more precisely and increasing their number, e.g., using "wings supported by bony endoskeleton" and "wings supported by chitinous exoskeleton" as characters.

When analyzing "supertrees" (datasets incorporating as many taxa of a suspected clade as possible), it may be unavoidable to introduce character definitions that are unprecise, as otherwise the characters might not apply at all to a large number of taxa. The "wings" example would be hardly useful if attempting a phylogeny of all Metazoa as most of these don't have wings at all. Cautious choice and definition of characters thus is another important element in cladistic analyses. With a faulty outgroup and/or character set, no method of evaluation is likely to produce a phylogeny representing the evolutionary reality.

Many cladograms are possible for any given set of taxa, but one is chosen based on the principle of parsimony: the most compact arrangement, that is, with the fewest character state changes (synapomorphies), is the hypothesis of relationship accept here (see Occam's razor for a discussion of the principle of parsimony and possible complications). Though at one time this analysis was done by hand, computers are now used to evaluate much larger data sets. Sophisticated software packages such PAUP allow the statistical evaluation of the confidence we can put in the veracity of the nodes of a cladogram.

Note that the nodes of cladograms do not represent divergences of evolutionary lineages per se, but divergences of character states between evolutionary lineages. DNA sequence characters can only diverge after gene flow between (sub)populations has been reduced below some threshold, whereas comprehensive morphological alterations, usually being epistatic (the product of interactions of several genes), usually occur only after lineages have already evolved separately for quite some time - biological subspecies can usually be distinguished genetically but often not by internal anatomy.

As DNA sequencing has become cheaper and easier, molecular systematics has become more popular. As well as a parsimony criterion, you can also use non-Hennigian methods such as maximum likelihood and Bayesian inference, which incorporate explicit models of sequence evolution. Another powerful method is the use of genomic retrotransposon markers, which are thought to be less prone to the problem of reversion that plagues sequence data. They are also generally assumed to have a low incidence of homoplasies because it was once thought that their integration into the genome was entirely random, although it now appears that this is sometimes not the case.

Ideally, morphological, molecular and possibly other (behavioral etc.) phylogenies should be combined into an analysis of total evidence: none of the methods is "superior", but all have different intrinsic sources of error. For example, character convergence (homoplasy) is much more common in morphological data than in molecular sequence data, but character state reversions that cannot be noticed as being such are more common in the DNA. Morphological homoplasies can usually be recognized as such if character states are defined with enough attention to detail.

Definitions
A character state that is present in both the outgroups and in the ancestors is called a plesiomorphy (meaning "close form", also called an ancestral state). A character state that occurs only in later descendants is called an apomorphy (meaning "separate form", also called a "derived" state) for that group. The adjectives plesiomorphic and apomorphic are used instead of "primitive" and "advanced" to avoid placing value-judgments on the evolution of the character states, since both may be advantageous in different circumstances. It is not uncommon to refer informally to a collective set of plesiomorphies as a ground plan for the clade or clades they refer to.

A species or clade is basal to another clade if it holds more plesiomorphic characters than that other clade. Usually a basal group is very species-poor as compared to a more derived group. It is not a requirement that a basal group be present. For example, when considering birds and mammals together, neither is basal to the other: both have many derived characters.

A clade or species located within another clade is nested within that clade.