Texas sharpshooter fallacy

A nuclear bomb always hits ground zero. The Texas sharpshooter fallacy (or clustering fallacy) occurs when the same data is used both to construct and test a hypothesis.

The fallacy is an imprecision fallacy and an informal fallacy.

Explanation
One way to make a list of accurate prophecies is[:] (1) Have a bunch of people make a bunch of prophecies. (2) Wait and see which ones come true. (3) Throw all the others in the trash. (4) Compile your list of prophecies that came true, and parade it around triumphantly. The fallacy's name comes from a parable in which a Texan fires his gun at the side of a barn, paints a bullseye around the bullet hole, and claims to be a sharpshooter. Though the shot may have been totally random, he makes it appear as though he has performed a highly non-random act. In normal target practice, the bullseye defines a region of significance, and there's a low probability of hitting it by firing in a random direction. However, when the region of significance is determined after the event has occurred, any outcome at all can be made to appear spectacularly improbable.

The Texas sharpshooter fallacy uses the same data to both construct and test a hypothesis. A hypothesis must be constructed before data is collected based on that hypothesis. If one data set is used to construct a hypothesis, then a new data set must be generated (ideally, in a different way, based on predictions made by the hypothesis) to test it.

Examples

 * The physicist Richard Feynman once started a lecture on statistical physics by reciting a license plate number he had seen on the way in and asking his students what the probability was that he had seen that particular number. The probability, of course, was quite low. But this is true no matter what license plate number one sees, and unless it has an independently defined significance, this probability is meaningless.


 * A million participant raffle was drawn, and Joe was found to be the winner. Afterwards, someone points out that the odds of Joe winning are a million to one, and thus, he couldn't have won randomly and must have cheated. Of course, the chances of anyone else winning was also a million to one, and this person could've accused the winner of cheating no matter who they were. However, the chances of there being a winner is 100% guaranteed. In this case, Joe lucked out. Somebody had to have lucked out.

"Random" evolution
Creationist and intelligent design arguments claim that the chances of a protein molecule forming "randomly", or a cell forming "randomly" via abiogenesis, or the Universe forming "randomly" into what we see today are incredibly low, and thus it must have been designed. This argument is extremely faulty in that it doesn't acknowledge that physical processes are not random, but are guided by the laws of physics, chemistry, and eventually, biology: evolution via variation and natural selection.

In late 2010, "Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect", a psychology paper by Daryl Bem, ostensibly provided evidence of precognition. In Bem's experiments, a small but statistically significant number of test subjects' responses appeared to be influenced by conditions which appeared later in the tests. However, Bem has acknowledged forming some of his conclusions after the tests, rather than testing fixed hypotheses as any rigorous application of the scientific method should. He has also stated that, before concluding and publishing his research, "I purposely waited until I thought there was a critical mass that wasn't a statistical fluke". While this may seem logical at first glance, deliberately waiting for such a "critical mass" actually means stopping research at a point when the results appear favourable to the hypotheses rather than continuing through a pre-set number of experiments before checking for overall findings.

Young Earth thinking
Much of young earth creationism relies on this form of post hoc reasoning. This is most clearly demonstrated in fundamentalist Christians' discussions of how the flood created geologic structures. Their ideas rely on finding data and constructing a hypothesis around that data, with no further testing of these ideas after this construction. This is a clear example of this particular fallacy.

Crazification
The crazification factor is often an example in popular usage: people jokingly point out about this factor about general online discourse though the angriest responders are obviously not representative of people online and are just people most likely to comment.

Alternative medicine
Those arguing in favor of alternative medicine often cite examples of traditional medicines (such as willow bark) or derivatives thereof that were incorporated into mainstream medicine, and urge that people should keep an open mind about whatever remedy the practitioner happens to be promoting. For instance, according to one homeopath: We learned that certain herbs had beneficial effects by trying them and passing on the information of what resulted: pure anecdotal evidence. But that’s how we know, for example, that milk thistle is good for the liver and hawthorn is good for the heart. No studies needed to be done. We learned through experience and anecdote.

This ignores, however, all the traditional treatments that were later scientifically shown to be ineffective (or outright dangerous), and that were also supported by anecdotal evidence. Over the course of history, countless remedies have been thought effective for virtually all diseases, and, obviously, this means that some of these indications will happen to be true just by chance. If you claim pretty much everything is good for dozens of, or even all, diseases, and it turns out that some of these remedies are good for a few of the countless original indications, this is a prediction failure rate of well over 99%. The author's two examples are cherry-picking, and certainly do not prove the validity of anecdotal evidence.

Cancer clusters
If you plot incidences of cancer or other (non-infectious) diseases with dots on a map then chance dictates that the dots will randomly seem to congregate in certain places, which get a higher than average number of occurrences. You may naively expect cases to be distributed evenly if the condition occurred purely randomly, but in practice the clustering illusion intrudes and a totally even distribution is much less probable than you might expect. Therefore, a cluster is likely to occur, and when you see a cluster there is a natural tendency to start panicking and look for a reason why people are dying. If you can find something to blame, then the result can be unwarranted condemnation. Two notable examples in the UK occurred with studies into childhood leukemia where what was probably a statistical artifact was mistaken for something else. In the first, a cluster was discovered near the nuclear reactor at Sellafield, and another near a nuclear reactor at Dounreay, although there was no evidence of higher radiation levels in this area or higher levels of leukemia in other areas near Sellafield, or anything that would suggest a statistically-significant causal relationship — but it provided ammunition for those who think nuclear power is bad. In the second case, a study of leukemia clusters by E. G. Knox found that they tended to occur near railways, leading to the hypothesis it involved fossil fuels, although it's probable he had failed to properly control for the fact that people tend to live near railways, as well as for the random occurrence of clusters within the population: clusters will occur and they will be where people live.

These examples were complicated by the fact that people on a map are not distributed randomly but cluster in towns (and near railways). Any cluster is likely to be near something interesting and potentially blamable (a town, a major employer, transport infrastructure, etc) rather than in the middle of nowhere. This shows that the Texas sharpshooter fallacy combines with other fallacies such as the gambler's fallacy and clustering illusion where the misinterpretation of random data is involved, and of course correlation does not imply causation. It was even suggested that childhood leukemia could have an infectious component which would lead to clusters as one person infected others nearby.

Demarcation
You do not commit this fallacy if you:


 * calculate the probability of a particular event after the fact based on a criterion that would have been clearly significant even before the event occurred.
 * re-test an observation to determine if previous clustering may have been due to chance.