Correlation does not imply causation



I used to think I could control ducks with my mind but it turns out ducks & I just have very similar ideas about what stuff ducks should do. Correlation does not imply causation is the logically valid idea that events which coincide with each other are not necessarily caused by each other. The form of fallacy that it addresses is known as post hoc, ergo propter hoc. For example: Both vaccination rates and autism rates are rising (perhaps even correlated), but that does not mean that vaccines cause autism anymore than it means that autism causes vaccines. The reality is that cause and effect can be indirect due to a third factor known as a confounding variable — or that causality can be the reverse of what is assumed.

The assumption of causation is false when the only evidence available is simple correlation. However: It is not true that correlation cannot imply causation. In a controlled scientific experiment, causation can be teased out of a correlation by removing all possible confounding variables. (Often, several identical experiments are run with different levels of the experimental variable, such as temperature or drug dosage.) If there's still a correlation, then causation is very, very likely.

Explanations and examples
For any given correlation, there are four basic possibilities. Let's assume that autism rates are correlated with vaccination rates:
 * 1) It's genuinely causal: [X] causes [Z]. (Vaccination cause autism.)
 * 2) The causality is reversed: [Z] causes [X]. (Autism causes vaccination.)
 * 3) A confounding variable is the cause: [X] is correlated with [Z]. [Y] (a confounding variable) is the true cause of [Z]. However, [Y] is highly correlated with [X] — making it appear as though [X] is the cause. (Both vaccination rates and the willingness of doctors to diagnose autism have risen over the years [not to mention that autism was not even named until 1910, albeit only as a symptom of schizophrenia at first, after Louis Pasteur introduced the second generation of vaccines]. This willingness explains most of the rise in autism rates.)
 * 4) It is simply coincidence or a statistical fluke (rare, but it does happen. The Moon is just far enough away to be the same size as the Sun in the sky because it had to be somewhere.)

Simple example
100% of people who drink water die. Two events can consistently correlate with each other but not have any causal relationship. An example is the relationship between reading ability and shoe size across the whole population of the United States. If someone performed such a survey, s/he would find that the larger shoe sizes correlate with better reading ability, but this does not mean large shoes cause good reading skills. Instead it's caused by the fact that young children have small feet and have not yet (or only recently) been taught to read. In this case, the two variables are more accurately correlated with a third: age.

The part age plays in this example is known as a "confounding variable" or "confounding factor", and is something that is not being controlled for in the experiment. In this case, age influences both reading ability and shoe size quite directly. A confounding variable can be what the actual cause of a correlation is, hence any studies must take these into account and find ways of dealing with them, usually by searching them out and trying to alter this variable directly.

The most common method to control confounding variables is with controlled studies. In these studies, the differences between the observations and the control group are minimised as best as possible, so that one can be more confident that a correlation is a valid indicator of causation. This is extremely important in compensating for the placebo effect in medical trials, but it is also important in other branches of science. In the age/shoe-size/reading-ability example, a controlled experiment would look for a correlation between reading ability and shoe size given a sample of people all the same age — or alternatively the hypothesis could be further tested by correlating age and reading ability given a sample of similar shoe size.

Risk factor
The term "risk factor" is used in medicine to mean "something that is positively correlated." For instance, obesity is a risk factor for Type 2 diabetes. The term is often incorrectly understood to mean "cause" (e.g. "I'm at risk for diabetes? But I'm not fat!"). Alternatively, a clear risk factor can be disputed on the basis that it's not a definitive cause — a classic use of the uncertainty tactic (e.g. "I smoke three packs a day and I don't have cancer!").

Even when there is a correlation, the risk factor does not necessarily go both ways: obesity is a risk for Type 2 diabetes, but untreated diabetes can, in certain situations, cause weight loss, as excess glucose is excreted in urine. In other words, correlation combined with a specific temporal relationship does, in fact, imply causation. Taking insulin can cause weight gain, as the previously wasted sugars can now be metabolized, but that risk factor correlates to the treatment, not the disease.

In science
Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there." In science, correlation studies are often used to test for the existence of interesting patterns, but they are never used exclusively to claim a cause. In order to make a causal claim you must run an experiment or series of experiments and further studies using the scientific method — i.e., test to see if it really is a cause by altering parameters and performing more experiments, making predictions and testing them. This is in order to validate that one event is indeed directly influencing the other and is the reason behind the detected correlation.

Many woo and pseudoscience pushers conflate correlation with causation in order to make a claim of validity but forget to attempt the later scientific steps of compensating for confounding variables and thoroughly testing the causal relationship. For example, if someone gets a cold, but takes Vitamin C, their cold will go away. The claim is then made that the Vitamin C caused the cold to go away. However, the cold would have gone away anyway, whether or not the Vitamin C was taken, and so the validity of the claim is false. The placebo effect is another correlation with "treatment" that quacks use to create false validity.

Correlations seem to tap into a deep part of human psychology. As pattern recognition machines, we are hyper-responsive to any potential signal in our environment. People will often take two completely unrelated events and decide that they must cause each other because they seem to correlate. Someone may decide that when she wears a given shirt she has good luck; this is often combined with a powerful confirmation bias to create magical thinking.

In parody
In the "Church of the Flying Spaghetti Monster", a key "belief" is that global warming is caused by a lack of pirates (not the modern kind, like in Somalia, but the old-timey swashbuckling kind) sailing the oceans. This is shown by a graph correlating increasing surface temperatures of the Earth with a decline in the number of pirates. While it is certainly true that piracy has decreased and temperatures have gone up, there is nothing directly connecting the two trends. Or is there?

Fallacy engineering
Care should be taken not to assume that the opposite is impossible (that correlation never implies causation). Correlations implying causations are successfully postulated and tested every day. Most scientific theories would not exist without this particular process of the scientific method.

Woo spinners, when cornered and on the defensive may try to claim that correlation never implies causation, in order to avoid statistical analysis (where the correlation's implication of causation is demonstrated). They ignore the scientific analysis by claiming that correlation doesn't imply causation with an effortless handwave. By arguing this, they are arguing that they know the truth without any evidential support, and your number magic be damned!

For example, when presented with evidence that use of a vaccine is followed by an almost complete reduction in infection rate, anti-vaccine proponents often discount this change as mere correlation — which ignores the sound methodology, control of variables, extremely low probability that this is merely an extreme example (low p-values), and generally clear-cut evidence that correlation is totally linked to a causal chain. Ironically, the antivaxers often, after casting doubt on this method, use that very method themselves (only without the hard work neccessary to support their case). They may cite their own statistics that show an increase of cases of autism (or some other terror du jour) when a vaccine is introduced. If they search long enough they may find some place at some time where this happened and then use this example as represenative of everywhere always.

This tactic is also often used by climate change deniers ("the highly statistically significant association of greenhouse gases and global temperature does not imply causation") as well as every type of woo spinner you can imagine (shockingly including even some economists and scientists).