Comparative method

The comparative method is a procedure used in historical linguistics to determine whether two or more languages are related, and to reconstruct their common ancestor. The comparative method involves demonstrating the existence of sound correspondences by which certain sounds in one language regularly correspond to other sets of sounds in the other. Though the comparative method is the foundation of historical linguistics, many pseudolinguists have never even heard of it.

Methodology
The application of the comparative method in the reconstruction of a proto-language entails several steps.

First, gathering a list of possible cognates from the languages in question (usually basic vocabulary, as these have a lesser tendency to be borrowed).

Second, analyzing the words to determine whether there are any systematic sound correspondences (or sound laws) between the two sets.

Third, reconstructing the original forms (or proto-sounds) of the sounds being compared. This may take several factors into consideration, including directionality (the idea that some sound changes tend to occur in a given direction, but not vice-versa), the numbers of languages in which the sounds are found (a sound found in most of the languages in question is more likely to be the original proto-sound), and economy (postulating a reconstruction that would involve the smallest number of phonetic changes in its descendants ).

Fourth, analyzing any similar correspondence sets. For instance, one sound correspondence may have an original sound * p remaining /p/ in three languages and evolving into /b/ in the remaining language, while another may have a proto-sound * p remaining /p/ in two languages, becoming /b/ in another, and /v/ in the last one. In such cases, it must be determined whether the difference is due to conditioning (e.g., * p changing into /v/ only before certain vowels), or due to there being in fact two separate phonemes in the proto-language.

Fifth, determining whether the phoneme inventory of the reconstructed proto-language is plausible. Languages tend to have symmetrical phonologies, and a symmetrical reconstruction would be preferred. For instance, it would be unlikely for a language to possess the voiceless stops /p/, /t/, and /k/, but only two voiced counterparts /b/ and /g/. Additionally, some types of phoneme inventories never occur (for instance, there are no languages without vowels).

Sixth, using the reconstructed proto-sounds to reconstruct morphemes or words in the proto-language.

The regularity of sound change
The comparative method is based on the assumption that sound change is regular. That is, at various points in a language's history, sounds in languages undergo sound changes (which may be influenced by the environment around the sound, a phenomenon known as conditioning). An example might be all instances of a proto-sound * k before the phoneme /a/ becoming /tʃ/ in a daughter language. Such regularity allows historical linguists to determine whether certain languages are related, as well as to establish etymological connections between words in different languages. For instance, the Latin consonant sequence /ct/ regularly corresponds to /t͡ʃ/ in Spanish, /tt/ in Italian, and /pt/ in Romanian. Thus, the Latin noctem is noche in Spanish, noapte in Romanian, and notte in Italian; octo corresponds to ocho, opt, otto; coctus corresponds to cocho,  copt, cotto; factus corresponds to hecho, fapt, and fatto; and pectus corresponds to pecho, piept, and petto.

Borrowing versus inheritance
Such correspondences can come in handy for determining whether a word is inherited or borrowed. For instance, the Spanish words directo and contactar contain the original consonants /ct/ and were borrowed directly from Latin, without undergoing the sound shift of inherited words. (Cf. derecho, which shows the expected change.) Indeed, it is often the case that a language will have both an inherited and a borrowed word from the same original source, as is the case with the Spanish dictado/dechado and causa/cosa. This may not be immediately apparent either from the meaning or the look of words, as evidenced by the English word "do", which is descended from the same Proto-Indo-European root as the Latin borrowing "fact" (which comes from facere – to make). It is through the comparative method that linguist Heinrich Hübschmann demonstrated that Classical Armenian was not an Iranian language, as commonly thought at the time, but rather a separate, non-Iranian branch of Indo-European with many Iranian loanwords.

It is nevertheless the case that patterns can appear not because of historical sound changes having acted on an inherited word, but rather because of phonological adaptation of a loanword at the time of borrowing. A language, when borrowing words with phonemes it does not possess, may use the closest available sounds in its phoneme inventory. For instance, Maori lacks many of the sounds of English and predictably transforms certain English sounds into their closest Maori equivalent. Maori also does not allow consonant clusters or syllable-final consonants, and may consequently insert vowels in loanwords. These factors may sometimes lead to the Maori word differing radically in form from the original English word, as in hāhi (church), haina (China, sign), hanara (sandal), haki (flag), hiraka (silk), and huka (sugar). Similar phonological factors were behind the transformation of the English "Christmas" into the Hawaiian Kalikimaka.

The comparative method versus pseudolinguistics
Many pseudolinguistic theories are created by amateur non-linguists who not only have no idea what the comparative method entails, but also are totally unaware of its existence. These pseudoscientists imagine that genetic relationships can be determined by merely listing similar words. This is a flawed methodology, however.

For one thing, lexical similarities are statistically certain to occur by chance. Because all languages have countless thousands of words, if you compare any two languages, even ones that have nothing to do with each other whatsoever, many words will be similar just by chance. Similarly, if you take a given word (like "or"), it is to be expected that of the many thousands of different languages in the world, some or even many will express the same concept with a coincidentally similar pronunciation. Since languages (entities with countless thousands of words each) evolve, a word may split into different and often unrecognizable forms, and unrelated words may coincidentally come to resemble each other. It is simply not possible for coincidences not to exist. For instance, in the Australian Mbabaram language, the word for 'dog' is dog. This is not evidence enough of a close relation between Mbabaram and English. Mbabaram is a part of a language family in which the original word for 'dog' is gudaga, and dog is its expected outcome in Mbabaram. By contrast, the English word dog originally referred to a specific kind of dog. The original word for dog lives on in English as hound, the ancient word cognate with German hund and Latin canis.

In addition, borrowing (either between the two languages or from a common third one) is another possible reason for lexical overlap. While listing large numbers of similarities in vocabulary, pseudolinguists simply assume that they must all be due to genetic relatedness, and more often than not completely ignore the possibility of borrowing.

Pseudolinguists who are completely clueless about historical linguistics are apt to dismiss etymologies involving dissimilar words because of their perceived implausibility, but in reality, common origin does not necessarily lead to similarity of form, and similarity of form is not necessarily due to common origin. Through the comparative method, relationships can be demonstrated to exist between words that have no sounds in common. Consider the Portuguese word chão (meaning "flat" or "floor"). If you look up its etymology up in a dictionary, you will see it comes from the Latin planus (meaning "flat"). A pseudolinguist would likely scoff at such a derivation, objecting that the words are completely different and coincide only in meaning; to the pseudolinguist, this proves that linguists are just grasping at straws and that "mainstream" linguistics is a sham. But there is a method to the (apparent) madness. Let us take the consonants of the first syllable. Is pl>ch a known Latin-Portuguese sound correspondence? Why, yes, it is. Examples of this sound change include pluvia>chuva (rain), plorare>chorar (to cry), plumbum>chumbo (lead), plaga>chaga (sore, wound), plicare>chegar (to fold; to arrive), plūmācium>chumaço (cushion, stuffing), and plenum>cheio (full). There are also sound correspondences by which the Latin -anus and -anis become ão in Portuguese, such as in sanus>são (healthy), vanus>vão (empty, vain), manus>mão (hand), canis>cão (dog), and panis>pão (bread). Hence, if we take Latin-Portuguese sound correspondences into account, we find that chão is what we would get if Latin had passed planus on to Portuguese.

Language change follows certain rules, and the comparative method provides a way to determine what those rules are. To base etymologies simply on superficial similarity, as many pseudolinguists do, is not enough.