Essay:Bayesian Inference and the Power of Skepticism

Bayesian inference is a powerful statistical method based upon a single equation developed by Thomas Bayes in the 18th century. Bayes' equation allows for a mathematically rigorous analysis and answer to the question "How probable is a given hypothesis?" It is this exact question, in various forms, that much of human epistemological and metaphysical thought seeks to answer. An analysis of Bayes' equation and of Bayesian inference can help illuminate the optimal methods for creating a hypothesis; analyzing the relationship between a hypothesis and data; and when to change our mind about what is the most probable explanation for a given phenomenon. This essay is an attempt to use Bayesian inference to show that skepticism and the scientific method are the optimal systems for discovering knowledge.

Brief introduction to probability
This section will introduce some of the ideas involving probability theory that will be used in this essay. If you are already familiar with basic probability theory you can skip ahead to other sections.

Probability of the hypothesis versus the data
The main focus of probability theory is assigning a given probability to a given statement. However, probabilities cannot be assigned in isolation. Probabilities are always assigned relative to some other statement. The sentence "the probability of winning the lottery is 1 in 100 million" is actually a fairly meaningless sentence. For example, if I never buy a lottery ticket my probability is significantly different than someone who buys 10 every week. A meaningful "sentence" in probability theory must be constructed with both the statement we seek to assign a probability to, the given probability, and the background information used to assign that probability. The essential form is "the probability of x given y is z." The probability calculus short hand for this sentence is P(x | y) = z.

When we seek answers to a given question or understanding of a given phenomenon we usually start by forming a hypothesis, data is then collected that seeks to provide information about the reality of that hypothesis. Ultimately we seek to know, what is the probability that our hypothesis is correct? In this case our background information is the relevant data we have collected. So in probability speak we seek to know what P(h | d) is equal to. In this case h is our hypothesis and d is our data.

Things start to get a little complicated from here. Lets imagine we are asking a simple question, like "is this coin I have fairly weighted?" We hypothesize that if it is fairly weighted flipping it should result in equal numbers of heads or tails. So we flip it 10 times and come up with 6 heads and 4 tails. So with this information can we answer the question of what P(h | d) equals? The answer is that we can not. The only question we can answer is what P(d | h) is equal to.

This is a subtle but very important point. Given only a hypothesis and some relevant data we can only ever answer how probable our data is given our hypothesis not the other way around. And the two hypotheses are not equal. To see why they are not equal lets form some probability sentences with every day concepts and see what happens when we reverse them:


 * The probability that it is cloudy outside given that it is raining does not equal the probability that it is raining given it is cloudy outside.
 * The probability that someone is drunk given they consumed 10 beers is not equal to the probability that someone consumed 10 beers given that they are drunk.
 * The probability that the anti-christ is coming given that its in the Bible does not equal the probability that the anti-christ is in the Bible given he is coming.

So given that P(h | d) does not equal P(d | h), what sorts of answers can we get about P(d | h)? This is classic approach to statics that has come to be called Frequentists approach.

Frequentist approaches
The Frequentist approach is the standard statistical model taught in most high schools, colleges and graduate school programs. It seeks to find the answer to what P( d | h) equals. Based on our example of a coin being flipped the frequentist essentially asks if one did a whole lot of sets of 10 flips how often would I get a distribution of 6 heads and 4 tails. The answer of course depends on if the coin is fairly weighted or not. So the frequentist will usually ask how often that distribution is likely to appear if the coin is weighted fairly and if it is not weighted fairly. Since being weighted fairly means that the results of heads or tails is essentially random that question can be generalized to asking what is the distribution if we assume our results are random. This is known as the null hypothesis and is the "omnibus" test for frequntist statisticians. You can take any data distribution and ask, what are chances of this data distribution appearing given it is caused at random. If there is a high chance that it can appear then we say that it appeared as random, if there is a low chance it appeared we say that something had to cause it. Usually some sort of percentage cut off is used, the standard being 5 percent. Meaning that there must be less than a 0.05 percent chance of forming a given distribution assuming random cause before we are willing to say there must be a non-random cause. This is called "significance" and is the holy grail of much of modern science.

There are many complicated statistical procedures that can be used to further differentiate causes beyond just "random" or "non-random" but all of them rest on the same basic idea of providing some sort of arbitrary statistical cut off where we assume something is unlikely enough that it has to be something else. This, however, is not really the case. As previously stated we can not actually assign a probability to our hypothesis. So if our data only has a 1 percent chance of appearing if the cause is random this does not mean that there is only a 1 percent chance that the hypothesis that our cause is random is true or that there is a 99 percent chance that our data is caused by something non-random. The frequentist approach while providing valuable information and hinting at the relationships between hypothesis can not tell us the given probability for a hypothesis.

To understand why this is the case, and to understand how we can do this we must turn to Bayesian inference.

Bayes' Equation
So we have generated our data, and run all the stats on it and come up with the encouraging results that our data should only appear 1 percent of the time if it's randomly caused. Why then are we not safe in assuming, at the very least, that it is not randomly caused and that our hypothesis is more likely than random chance. The answer to this question has to do with the prior probabilities of each hypothesis, or in Bayesian parlance just "priors." Let's illustrate this with a little story:

You're walking down the road when you hear a whisper in the alley way calling you over. Curious you enter and a stranger is standing against the wall. He starts to tell you an interesting tale, essentially he can predict any series of numbers that will be chosen by man or machine. This includes tonight's lottery numbers, and he is more than willing to tell you what they will be in exchange for $1000 in cash. That would certainly be a good deal...if his story is true. You, however, are skeptical of his claim for many obvious reasons and ask for him to prove it. The man agrees and asks you to pick a number between 1 and 5, you do so, and seconds later he tells you the exact number you picked. Would you hand over the thousand dollars now? Most people would not, for such a feat is not that impressive. Lets instead say he told you to choose a number between 1 and 100 and guessed it exactly, while this is more intriguing it is probably not worth the $1000. What about 1 in 1000, 1 in 100,000, 1 in 1,000,000, 1 in 100,000,000? Eventually we will reach a point where we are convinced enough to turn over our money.

Now lets look at a different story. Your enter into a novelty shop at the local mall and spot a package of dice that tell you they are weighted to roll a 6 every time. Intrigued you open a package and role a die, sure enough it comes up 6. Are you willing to say that these dice are probably weighted? Maybe you will role a second time, but how many people will remain unconvinced after the 2nd or 3rd role? Not many.

So in the first scenario most people are willing to ascribe to random chance when it was only 1 in 100 or 1 in 1000 likelihood, while in the second a 1 in 6 or 1 in 36 chance was all it took to convince people that it was not random. What is the difference? It is the prior likelihood for each hypothesis that separates these out. In this first scenario nothing makes sense, first of all everyone know psychic abilities have never been demonstrated, why is this guy in an alley and why is he selling a $100 million lottery ticket for $1000? Sure it might be true, but the chances are tiny. In the second scenario you are in a respected shop looking at a commercial good with clearly labeled and professional packaging, it's probably telling you the truth about being weighted.

Let's break this down a little more quantifiably. For the sake of argument lets assume that the chance the guy in the alley is telling the truth is 1 in 10,000,000. What is more likely the 1 in 10,000,000 chance that he is telling the truth or the 1 in 100 chance that he guessed your number randomly? In the second scenario lets say that the chances the package is lying about the dice is 1 in 100, so the chances its lying and you randomly rolled a 6 is 1 in 600 while the chances that the package is telling the truth is 99/100 (100 percent chance you will role a 6). In this case a single role of a 6 in a wrongly labeled package is less than the package being labeled correctly.

As these examples hopefully show the only way to move from P(d | h) to P(h | d) is to take into account our prior beliefs about the probability of each hypothesis. The Bayes equation is the equation that relates P(d | h) and our priors and calculates what P(h | d) equals. The equation is simple and is made up of three parts, the first is the priors which we just talked about, the second is called the likelihood probability which is simply P(d | h), and the last is called the posterior probability which is P(h | d). Since the posterior probability is the holy grail of most questions humanity has asked understanding the Bayes equation and its parts and how they relate to each other can tell us much about how the optimal way of gaining and testing knowledge about the world.

Priors, the supernatural and pseudoscience
The Bayesian prior is a fascinating probability. It is the missing piece in the puzzle that lets us assign a probability to a hypothesis. It is essentially the expectations we bring to bare on a problem. Optimizing priors has been rigorously studied in countless papers and whole books have been written about how to select priors for statistical modeling. What are the lessons we can learn from these treatments of priors about our own expectations and how they should be set when seeking knowledge and testing ideas?

One of the major lesson is that our priors should be set to reflect the information we actually know about the hypotheses. Many times in Bayesian analysis the posteriors from one set of data can become the priors for the next. This essentially means that the fact that something has repeatedly been demonstrated to exist or has repeatedly been demonstrated not to exist is important knowledge. The phrase "an absence of evidence is not evidence of absence" gets tossed around a lot, but Bayesian priors show that this is not always the case. For example, with psychics, repeated attempts to demonstrate psychic abilities have failed. These failed attempts can be incorporated into our prior probabilities allowing us to assign a smaller prior to the likelihood that there really is psychic abilities each time the demonstration of those abilities fail. Failed attempts provide meaningful information that can be incorporated in our priors.

Eventually, either due to the prima facie absurdity of the claim, or because of repeated failed attempts the probability of our prior for a given hypothesis gets very small. In order to overcome this infinitely small prior to create a posterior that might make us think our hypothesis is possible we need to show some extraordinary data. The smaller the prior the more unlikely the ability to generate our data from any other hypothesis has to be. This is a nice quantifiable way of demonstrating the oft used phrase of skeptics and rationalists that "extraordinary claims require extraordinary evidence."

If one examines the various ideas put forth in pseudoscience or religion and the supernatural you will see that most of them violate fundamental laws of the universe. This immediately places our prior probability for their existence at a small amount. In addition to that, many of these ideas have been tested...repeatedly...and have always failed the test. The James Randi Educational Foundation is just one example, as they have tested in hundreds of controlled settings many claims of the supernatural from dowsing to homeopathy, and every single test has come up negative. There have been hundreds of such groups doing hundreds of thousands of test on this phenomenon. This huge preponderance of evidence of the complete lack of reality should be incorporated into our priors.

Bayes equation demonstrates quantitatively and rigorously what skepticism has had to say about these topics for a long time. They are prima facie unlikely and the repeated failure to demonstrate their reality is evidence of their absence.

Specificity, sensitivity, and the failure of to account for priors
For many people it is intuitively obvious that the claims coming out of pseudoscience pushers or about the supernatural are a priori not likely to be true, and therefore would require a lot of evidence to be convincing. Unfortunately, this naturally emergent skepticism doesn't go very far. There are many things in the world where a firm understanding of Bayesian inference and priors can drastically change our outlook, but they are often overlooked. Even in professional fields with people trained in logical, empirically based thinking a failure to account for priors is disturbingly rampant.

Lets take a look at an interesting example. Many people are familiar with taking a "test" to see if they have a particular disease. They go down to a lab in the hospital, have some blood drawn, and a few days later receive a positive or negative result. The question is how does the test result translate into someone actually having or not having the disease. The first thing we have to know is how good the test actually is. This is quantified primarily by two measurements: specificity and sensitivity. Sensitivity is the percentage of positive results returned by the test that are true positives (meaning the person really has the disease), while specificity is the percentage of negative results returned by the test that are true negatives (meaning the person does not have the disease). The higher these values the better the test.

One of the best tests out there if for HIV, it has a 99 percent sensitivity and a 99 percent specificity. This is amazingly accurate (many test for other illnesses can be in the 50-60 percentage range). So if an individual takes this test and receives a positive result, what are the chances that they actually have HIV? Many people will say 99 percent.

This little thought example has been given to physicians in various studies and almost every time nearly unanimously you get back answers where the patient most certainly has the disease. One interesting example of such a study was actually done with HIV and AIDS councilors in Germany. A lab sent an informant to see 20 different councilors claiming to have a positive test. They asked each one what are the chances that they actually had HIV. They all reported that false positives didn't happen and many said it was a 100 percent chance.

So whats wrong with this? What are the chances? Well if you have been paying attention so far you probably know that it depends on the priors. Lets rework this problem to make it more obvious. The question "does this patient have HIV?" can be rephrased as two alternative hypotheses. The first is "this patient does have HIV" and the second being "this person does not have HIV." It should be clear then that the question we are asking is really P(h | d). The test result is our data which is P(d | h). Written like this it should now be obvious that to move from the test result to answering our question we have to bring in our priors and that means Bayes equation.

If a person comes in with absolutely none of the risk categories normally associated with HIV they have a very small prior for having the virus, if the person comes in with a lot of risk factors that prior goes up as well. If we assign a prior for our low risk patient as 1/1,000,000 we can see why there is actually a small chance they have the disease even after a positive test result. Essentially the test result will return 1 false positive for every 100 tests (meaning the person does not have HIV but test positive for it). What is more likely the 1/100 chance of a false positive or the 1/1,000,000 chance that the person actually has HIV? The answer, obviously, is a false positive.

When actual numbers are used and the Bayesian posterior is calculated the chances that a low-risk patient, after having tested positive for HIV, of actually having HIV is around 50 percent. One thing you will notice is that our posterior probability has jumped considerably higher than our initial prior. Thats because the positive test does provide us information, even though it can be wrong. The more tests we run that return a positive result, or any other additional data we get that points to the patient having HIV adjusts our posterior ever higher. So the data is not thrown out, but we can clearly see that testing positive even with a very accurate test does not mean someone is actually positive.

This is often initially counter-intuitive, but it clearly demonstrates how important priors are in our daily assessment of data and hypotheses if we want to really understand the world around us.

Progressive posteriors
One of the principles of both skepticism and the scientific method is that you must follow the evidence where it leads. This is in direct opposition to the belief systems of fundamentalists and others like them, who ignore evidence they don't like and over represent evidence they think points towards what they want to prove. While intuitively most people see the first method as being more intellectually honest and probably more accurate Bayesian inference can be used to clearly show its superiority.

As hinted at above there is a symmetrical relationship between priors and posterior probabilities. We start a problem with a set prior, then we calculate a new expectation based on the likelihood, and if we run another test we should use the posteriors from our previous test as our new priors. Essentially the posterior probabilities grow and progress based on the previous work.

Let us once more use an anecdote to illustrate the point. Let us consider for a moment a man who claims to be able to move a compass needle with his mind. After carefully making sure that he has no hidden magnets on his fingers or in his mouth (a la Uri Geller) we sit down to start our test. First we assign a prior to this phenomenon, probably a very low prior, and then we ask him to perform the act. Let us say he succeeds, this is fascinating, but due to our low priors to begin with we are probably not convinced. He offers to demonstrate it again. Once more we assign some sort of prior probability, this time though its going to be a lot higher that he will succeed simply because he did it the first time. Once more let us say he succeeds and the compass moves. Are we convinced yet? Depends on how low we set our initial priors. But each time he performs the act we become more and more convinced. This is because we are letting our posteriors from the last test roll over as our priors for the next test.

The same basic scenario applies if he did not succeed. Each time he failed to move the compass our belief about the probability he can do it becomes less and less. Again, all of this has to do with progressive posteriors.

What about someone who is selecting out data, much like the fundamentalist or Intelligent design proponent. Lets take a closer look at how we represent priors in probability calculus. Often times a prior is written as simply P(h) but this is merely short-hand. As mentioned above every probability must be based on something. Priors are actually written as P(h | i) where i is a variable that stands for all the background information that we have. It is an array variable, meaning that there are multiple elements within it, each element corresponding to some fact that lets us assess the probability. This can be represented as something like i={fact1,fact2...factN} where N is the number of facts that we have available. To start off with we are only going to be basing i on what we know coming to the problem. But with each test we do we can add another element to i with factN+1 being the results of our first experiment and factN+2 being the results of the second and this can keep extending for as many test as we do.

The accuracy of P(h | i) is directly and mathematically dependent upon the quality of facts in i and the number of elements in i. If there are two sets of i for a given hypothesis and all elements of i have equal quality than by definition the i with the larger number of elements will produce the more accurate probability for the given hypothesis. In this specialized case it is obvious that anyone that is selectively removing elements from i will arrive at a less accurate probability than someone who accepts all elements in i. However, it is rare or impossible in the real world that the quality of elements in i is going to be equal. In this case we may have to temper or weight additional elements in i. But more on this in the next section.

Pseudoscience pushers, YEC literalists, and smarmy Discovery Institute fellows all pretty much use the same algorithm for making sure their hypothesis probabilities look the way they know they should. The algorithm is essentially "if a fact increase the hypothesis I want, then include, else if it does not, adjust fact until it does and include, else if it can not be adjusted, exclude." Anyone should be able to see that this is not a valid algorithm for adjusting what elements in i you will include in your analysis. Therefore, anyone using the scientific method or a skeptical approach will have better probabilities for hypotheses than those using the above algorithm.

One anomaly does not make a theory
Lets delve a little deeper into just what i is and how we should treat its various elements. This gets into the heart of what is normally called evidence. A definite pattern exists with the standards of evidence for proponents of pseudoscience and anti-science being significantly different from the standards of evidence for a skeptic. Let's take a look at how Bayesian inference can help shed light on this difference, and how we can demonstrate that the formers "evidence" is ultimately meaningless.

One of the most coveted forms of evidence the pseudoscientist seeks is an anomalous result in an experiment performed by respected researchers using legitimate techniques. The reason this is so coveted is because the "quality" of this evidence is high enough that it can be integrated directly into i with out any sort of qualification. The pseudoscientist and those that follow alternative views to reality have their priors set much, much higher towards acceptance of a given alternative hypothesis than a reality based person. A single data point then pointing to their preferred hypothesis being correct is often enough to propel them into "certainty" range. As stated above, people in this category also have the nasty habit of excluding any elements from i that disagree with their preferred hypothesis.

Those of us based in reality that set an appropriately low prior for many of the crazier hypotheses out there are not as moved by a single anomalous data point. For all the reasons discussed above it is far more likely that the data is wrong than a crazy hypothesis is right. But we do adjust accordingly for the new data and test again, and again, and again. Usually those follow up tests come back showing that the hypothesis predictions are wrong and the first test data was anomalous. We then integrate these negative results into i and our probability assignment for the given hypothesis remains really low.

In this case it's clear that differences between what constitutes evidence for the pseudoscientist as opposed to the real scientist is one of establishing an appropriate reality-based prior to start with, and not selecting only the data that support the initial hypothesis. What happens though when new incoming data is not clearly of the same quality?

For example, the most frequent form of data that is presented by a quack for alternative medicine treatments is the data of the anecdote. This is merely some person somewhere, usually with no expertise to speak of, who reports wonderful results with the product. An anecdote is clearly not the best piece of data, since there are way to many problems with it. In order to figure out how an anecdote should change our priors we need to figure out the likelihood of reporting a positive result based on the two hypothesis, one being the alternative medicine treatment works and the other being it does not. The difference between these values dictates how much we alter our prior. If the positive result is incredibly likely for the first hypothesis and not likely at all for the second hypothesis then that really boosts up the probability that the first hypothesis is correct.

So for the hypothesis that the treatment works we can assign a really high probability that we will get a positive anecdote back. Pretty much 100 percent for all intents and purposes. What about for the alternative hypothesis that the treatment does not work? First of all we know the placebo effect is strong, at least 1/3 of the time you get a positive result from a sugar pill. So we can start at 30 percent. Then we have the self-selection bias of only reporting positive anecdotes not negative ones. That probably bumps us up to atleast 60-70 percent. Then you have the chances of fraud, lies, or bribery that might take us up to 85-90 percent. Then you are left with those people that will just report anything for their 15 minutes of fame. I would say we could argue that you will get a reported positive anecdote about 99 percent of the time. This means that the anecdote is entirely worthless as a point of data because it is predicted by both hypotheses.

The skeptic and the scientist can integrate the anecdote directly into their i but because it does not offer much meaningful information it is essentially weighted such that it has no effect. The pseudoscientist or the anti-scientist will never bother to calculate the likelihood of a positive result anecdote given the hypothesis that they are wrong. Therefore, they will integrate the anecdote in with no weighting change at all and allow it to significantly alter the probability for their preferred hypothesis.

In this case we see that the difference between standards of evidence comes from failing to calculate an important element in Bayes equation. Both of these examples show that the pseudoscientist perspective only maintains the high probability of its preferred hypothesis by directly violating the Bayes equation and ignoring whole sides of the equation. Once more skepticism and science are by far the more accurate systems.

Changing one's mind
Perhaps one of the most powerful elements of the scientific method is its ability to change its mind when confronted with enough evidence. Very rarely do you ever see someone out of the pseudoscience movement or the anti-science movement stand up, admit a mistake, and adjust their ideas to fit new data. This happens all the time in science. Major examples such as evolution or relativity capture public imagination but everyday in small ways the vast field of science updates its ideas to reflect new evidence.

This ability to follow the evidence where it leads is often used as a criticism against both science in general and specific theories. It is argued that you can not "falsify" a hypothesis because it is always changing in regards to the data. What keeps these people from being able to change their minds? What is it about science that allows it to update itself? Is the criticism leveled against it valid? Once again Bayesian inference can help shed light on these questions.

Bayesian inference captures the phenomenon of changing one's mind by comparing several alternative hypotheses and their relative probabilities through time. Let's start with the idea of water memory which is a concept from homeopathy and really lies at its core. There are two all-encompassing hypotheses that we will use in our thought experiment. The first is that water does have a memory and the alternative is that water does not have a memory. Now let's take two people with two very different views on the matter. The first is a proponent of homeopathy that has set his priors for water having a memory very high and his priors for it not having a memory very low. The second is a skeptic who has set his priors in the opposite way.

A series of experiments are going to be run now to test the idea of water memory. Since this is merely for the sake of illustration lets not worry about the specifics but assume that the test results come back "positive" if water demonstrated a memory and "negative" if it did not. There is noise in these test and they can go wrong, like any experiment. The first test is run and it comes back "positive" for water memory, the skeptics priors are going to be adjusted now with him assigning a higher likelihood to water having a memory and lowering the probability that it does not. Still he believes that water does not have a memory as his priors were set quite low. The homeopath doesn't adjust his priors very much at all since the result was exactly as expected. Two more test are run and they both come back positive. Suddenly the skeptic has his priors set for the two hypotheses as equal. He now thinks there is a 50 percent chance water can have memory and 50 percent that it can not. He has started to change his mind, but hasn't switched completely. A couple more positive test results and the skeptic's probabilities for the two hypothesis have shifted to the point that he now has changed his mind and believes that water does have a memory.

What is likely to happen though if we reverse this little thought experiment and instead have the test coming back negative. If the homeopath is true to the scientific method then he will eventually change his mind just as our skeptic did above. But this does not happen in practice. Whats going on? Well it probably depends on the individual but you can accomplish it in all the ways we stated above, by throwing out and ignoring negative results, or somehow weighting them such that they do not have an effect.

Again we see that proper application of Bayesian inference leads to evidence changing the most likely hypothesis. The only way to avoid changing one's mind is to violate Bayesian principles. We can see that it is the dogmatist who is really creating the error, not the skeptic or the scientist. We can also see that the criticism leveled against science as being "unfalsifiable" because of its ability to change is clearly false. Changing ones mind is a product of falsification. If something can not be falsified then one could never change their mind. The fact that science does update in response to new evidence is proof of the strength of its hypothesis and theory construction.

Popperian predictions
This essay has so far talked a lot about the relationship between likelihood, prior and posterior probabilities and has only briefly addressed the concept of data collection. Data is essentially the collected observations of answers to a particular question. In science the question involves the unique predictions made from the various hypotheses being tested. There are an infinite number of questions that can be asked, and an infinite number of data sets that can be collected. Likelihood probabilities can probably be calculated for most of those data sets. We are left then with the problem of what data sets and what questions are meaningful to improving our posterior probabilities.

Karl Popper one of the more widely quoted and known philosophers of science analyzed in his book The Logic of Scientific Discovery many of the issues revolving around what makes a question or a theory scientific. One of the points that Popper made was that the more specific and unique a given prediction from a hypothesis is the more valuable it is to providing support for the given hypothesis. Popper's example for this was the gravitational lensing effect predicted from Albert Einstein's Theory of relativity. The observation from Popper is intuitive to most scientist and skeptics a like while most of the proponents of pseudoscience seem to want to avoid specific predictions like a plague. Once more Bayesian inference can help shed light on this difference.

A Popperian prediction is a prediction that makes a highly specific and highly unique claim, we can see how this type of prediction is very valuable in calculating our posterior probabilities by looking at the likelihood function. The first element is how specific the prediction is. Lets once more turn to the example of a man trying to prove he can read your thoughts by naming off a number your thinking of. He asks you to pick a number between one and hundred and then responds by saying: “I see the number you picked is somewhere between 50-100.” If he happens to be right are you impressed? Probably not, and the likelihood probability bares this out he has a 50 percent chance of being correct and a 50 percent chance of being wrong. This will not change the posterior probabilities for him being able to read your mind or not. Now if he gives a more specific range such as 67-100 thats a little more evidence, but still not much. Things don't really start to get interesting till his predictions are very specific, maybe even the exact number. Once more this all falls out of Bayes equation. The smaller the likelihood of getting the prediction by random chance the larger the change in the posterior probability.

The second element is the uniqueness of the prediction. If you are testing between two hypotheses and each one makes a very specific prediction, it is meaningless to the problem at hand if both make the same prediction. Bayes equation calculates the changes in the posterior based on the differences between the relative likelihoods. If the predictions are the same so are the likelihoods so there will be no change in the posterior probabilities. This uniqueness factor of predictions is abused in several ways by pseudoscience pushers. One is that they try and claim a non-unique prediction really is unique. This is common in alternative medicine circles where they will claim that the placebo effect of improvement in health is really a unique prediction of their treatment and not something you will find with every single treatment. Another really deceitful abuse is when they claim that a specific prediction of science is really a non-unique universal claim. This is often summed up by the phrase “we share the same facts we just intemperate those facts differently.” Creationist love this one, and will take an observation an evolutionist makes and try and fit it post-hoc into their pseudoscience. However, they ignore the fact that the original observation as a specific prediction from evolutionary theory so it is evolutionary theory that gets the posterior probability bump.

Predicting everything
Another of Popper's ideas is falsifiability which has become one of the corner stones of modern scientific thought. Essentially the idea is that a scientific theory must make predictions that can be wrong. Hypotheses can fail falsifiability in several ways, they can make predictions that can not be tested at all or they can make every single prediction possible. Predictions that can not be test are usually either a failure of modern technology or a failure for a hypothesis to use methodological naturalism since no data can be collected to address the prediction it can not be used in any way to update posterior probabilities. It is completely invisible and worthless to Bayesian inference, and to skepticism and science in general. The far more interesting example is the hypothesis that predicts everything.

A classic formulation of this is faith healing, where proponents and charlatans will make the prediction that if you believe you will be healed. This is a nice little linguistic trick because if you are healed they made that prediction and if you are not healed, well you just didn't believe. So based on the faith healer's wording the predictions from the hypothesis that faith healing is real make all possible predictions. If you allowed this to stand every time you ran an experiment you would always bump up the posterior probability that faith healing is real. This is essentially what a proponent or supporter of faith healing does, where each time someone is healed or not healed it boost up their credulity towards faith healing.

This is where things get a little more philosophical. Bayes equation does not in and of itself handle a hypothesis that predicts everything. Instead we must pull out a little further to basic probability calculus and one of the fundamental axioms. Essentially this axiom states that the probability for X and the probability for not X must sum to 1. If this axiom is obeyed the hypothesis that predicts everything is prima facie in violation and is not considered a valid hypothesis. This axiom is much like Euclidean geometry stating that the angles of a triangle must add up to 180 degrees. It might be possible to create a new probability theory without such a fundamental axiom but the rules it would obey would be totally different. Trying to use Bayesian inference with a hypothesis that violates the fundamental axiom would be like using Euclidean geometry for non-Euclidean space. Its a bad idea, and thats essentially what the faith healer in this example is doing. They are violating the laws of probability and then using the theory to back up their hypothesis. Its a nice little logical fallacy that should be rejected by anyone with a grain of rationality.

Final thoughts
Additional discussion of how Bayes' theorem and Bayesian confirmation theory can be applied to evaluating supernatural and religious claims can be found in Fishman, Y. (2007) Can Science Test Supernatural Worldviews? [2]