Essay:Bayes' theorem and jurisprudence

Assuming that a government has laws and one of the primary functions of its court system is to determine who is or is not responsible for breaking said laws, a court system requires some way to evaluate evidence and come to conclusions about the likelihood of guilt or innocence. In philosophy, there are three main methods of inference used to derive these types of conclusions — deductive, inductive, and abductive reasoning. Two of these forms of reasoning, namely inductive and abductive, can be used with a statistical formula called Bayes' Theorem. Bayes' Theorem is useful because it allows one to use new information to determine how the statistical likelihood of a hypothesis has changed based upon this new information. The classic example of this is breast cancer screening, but it has many other applications such as law. Bayes' Theorem is a way of calculating conditional probabilities. According to Dr. R. E. Gaensslen, two of the fallacies committed by lawyers, namely the prosecutor's fallacy and the defense attorney's fallacy, "are misinterpretations of conditional probabilities". Proper use of Bayes' Theorem in a courtroom has the potential to both counteract these fallacies and generate more statistically robust conclusions about guilt and innocence. This page will discuss Bayes' Theorem and its relevance to jurisprudence. It shall also discuss how Bayes' Theorem and probability in general is being handled by the legal system.

Deductive reasoning
Deductive reasoning is one of the three main methods of inference. It often uses syllogisms, the classic example of which is of course…

If the premises are true, the terms are clear, and the rules of deductive logic are followed, then the conclusion reached is necessarily true. Deductive reasoning underlies mathematics and it has proven itself to be a useful way of coming to conclusions. Unfortunately, deductive reasoning has severe limitations when it comes to a courtroom setting. Consider, for example, a defendant who has blood from the victim found on their hands and a financial motive for murder, but no murder weapon was found. How is deductive reasoning supposed to address the issue? For example:

That is a valid syllogism, but as any good defense attorney would point out, deductive reasoning only results in certain conclusions if the premises are true and not all people with blood from a murder victim and financial motive are going to be guilty. For example, consider a father who came home from jogging, found his son bleeding to death, and tried to save his life before the paramedics arrived. Such a father would have blood from the victim and financial incentive as his son was just about to go off to college, but that is hardly strong evidence to support the idea that he is guilty of murder.

Deductive reasoning does not use Bayes' Theorem so we shall not explore this topic further on this page though it is good to know about as it is often useful.

Inductive reasoning
Inductive reasoning is more useful in a courtroom setting. This form of reasoning uses premises to support a probabilistic conclusion. An example of an inductive argument would be as follows:

That is a solid inductive argument though conceivably falsifiable if scientists accidentally release into the environment or if an alien from outer space crash-lands on Earth.

One of the things that makes inductive reasoning useful in a courtroom setting is that Bayes’ Theorem is a formula used to calculate conditional probabilities. Conditional probability is a measure of the probability of an event given that another event has occurred. If it is the probability of event A given that condition B occurred, this is usually written with the notation P(A|B). If instead it is the probability of event B given that condition A occurred, then this is usually written with the notation P(B|A). Bayes’ Theorem combines the definition for P(A|B) with the definition for P(B|A) to derive the following equation.

$$P(A|B)=\frac{P(B|A)\cdot P(A)}{P(B)}$$

It may not look like much, but Bayes' theorem is ridiculously powerful. It is used in medical diagnostics, self-driving cars, identifying email spam, decoding DNA, language translation, facial recognition, finding planes lost at the bottom of the sea, machine learning, risk analysis, image enhancement, analyzing "Who wrote the Federalist Papers", Nate Silver's FiveThirtyEight.com, astrophysics, archaeology and psychometrics (among other things). If you are into science, this equation should give you some serious tumescence. There are some great videos on the web about how to do conditional probability so check them out if you are wishing to know more about it. External links are provided on the bottom of this page.

Let us now use breast cancer screening as a example of how Bayes' theorem is used in real life. Please keep in mind that this is just an illustration. If you have concerns about your health, then you should consult with an oncologist.

Let us say that a person is a 40-year-old woman who has gone to the doctor for a mammogram and it came back positive. Now the person wants to know what is the likelihood of having cancer? According to statistics found at cancer.gov, the lifetime risk of developing cancer in women is somewhere around 12.4%. However, we are not interested in lifetime risk as a mammogram is only going to occur at a specific point in time. In this example, that point in time is at 40 years old. So what is the risk of a 40-year-old woman developing cancer? Her risk is 1.47%. In other words, approximately 1 out of 68 women will develop breast cancer around the age of 40. In Bayes' theorem, this value is our P(A) as it represents the value of any random 40-year-old woman developing breast cancer.

Now our patient is not just any random person because we know something about her — she has had a positive mammogram. So how does this condition effect the probability that the woman has cancer? Well we are going to need to know some additional values in order to derive at this probability. One of the values we will need is the the likelihood of detecting breast cancer through a mammogram. According to statistics found from The Susan G. Komen Breast Cancer Foundation, the likelihood of a positive test given that a patient has cancer is 87%. In Bayes' equation, this is our P(B|A). In other words, it is the probability of event B (Positive test) given that condition A has occurred (cancer). Now to derive at P(A|B), which is the probability that one has cancer (event A) given that one has a positive test (condition B), we need to multiply P(B|A) by the likelihood that a 40-year-old woman has cancer (P(A)) and divide it by all the possible outcomes of achieving condition B which is noted as P(B) in Bayes' Theorem. Now there are two possible ways of getting a positive test. Either you have a positive test and have cancer {P(B|A)P(A)} or you have a positive test and don't have cancer. The population that doesn't have cancer is pretty simple to figure out as one either has cancer which is 1.47% for a 40-year-old woman or one doesn't have cancer which is 98.53%. In order to know the likelihood that someone without cancer will show up as having cancer, one needs to know the false positive rate for the test. The false positive rate for mammograms ranges from 7-12% so we will average it out at 10%. So the second part of calculation for P(B) will be the likelihood that a forty-year old woman doesn't have cancer (.9853) multiplied by the false positive rate which is .10. We now have all the information we need to calculate P(A|B) so let us now place these values into Bayes' Theorem.

$$P(A|B)=\frac{P(B|A)\cdot P(A)}{P(B)}$$

$$P(A|B)=\frac{(.87)(.0147)}{(.87)(.0147) + (.10)(.9853)}$$

$${P(A|B) = 11.5%}$$

Thus, if you go for a mammogram as a 40 year old woman, there is only a 11.5% chance that you have cancer if you have a positive mammogram. Obviously this is not an ideal situation so what can be done about it? Well one option is to use another test. One such test that is used is something called needle biopsy. Finding reliable data on this test was more difficult to obtain though some numbers were found. A research paper compiled figures for these tests and the false positive rates ranged from 0 to 1.7% and the false negative rates ranged from 0.5 to 9%. In any case, we need numbers for this illustration so we will go ahead and use a 1.0% false positive rate and a 6.0% false negative rate though the accuracy of these numbers is disputed. In this next scenario, our forty-year-old female patient has undergone both a mammogram and a needle biopsy and both have come up as being positive for cancer. Now what we want to know is "What is the likelihood of cancer given that both a mammogram and a needle biopsy have come up as positive?" For the purposes of this illustration, we shall assume that mammogram results are independent of needle biopsy results. Well our P(A) is still the same as nothing has changed the fact that only 1.47% of 40-year-old women have cancer. What has changed is our P(B|A) as our condition is now a positive mammogram and a positive needle biopsy. That is expressed mathematically as (.87)(.94). Our P(B) has also changed as there are only two ways in which one can get two positive tests. Either you have cancer (.87)(.94)(.0147) or you have had two false positives (.10)(.01)(.9853). So let us now use these results to calculate the odds of the patient having cancer.

$$P(A|B)=\frac{(.87)(.94)(.0147)}{(.87)(.94)(.0147) + (.10)(.01)(.9853)}$$

$${P(A|B) = 92.4%}$$

That is still not 100% certainty, but it is a lot better than our initial probability of 1.47%.

So that is how Bayesian statistics works with something like breast cancer. Some people have come to the realization that Bayesian statistics could also be used in jurisprudence. In that scenario, instead of things like mammograms, the factors that are going to affect the statistical outcome are things like DNA evidence. While there are some important differences between breast cancer detection and jurisprudence, the basic idea is the same. In both cases, new evidence is introduced and Bayes’ theorem is used to determine how this new evidence effects the statistical likelihood of the hypothesis. In the case of breast cancer, the hypothesis is whether someone has cancer and Bayes’ Theorem is used to calculate how medical tests effect the statistical likelihood of that hypothesis. In the case of jurisprudence, the hypothesis is whether someone is guilty and Bayes’ theorem would be used to calculate how materials and testimonies presented in court effect the new statistical likelihood of guilt.

Bayes’ Theorem is sometimes written as $$P(H|E)=\frac{P(E|H)\cdot P(H)}{P(E)}$$

P(H|E) is the probability of a hypothesis (H) given a new piece of evidence (E), P(E|H) is the probability of the evidence given the hypothesis, P(H) is the prior probability of the hypothesis, and P(E) is the prior probability of the evidence. This is the same thing. It is just that different symbols are being used. P(D) with the D standing for 'data' is also sometimes used in place of P(E).

Abductive reasoning
There is a third form of reasoning that could be used called abductive reasoning. In abductive reasoning, one begins with an observation or set of observations then seeks to find the simplest and most likely explanation. For example, someone could observe a boy blushing after they are told a particular girl likes him and conclude that the blushing was caused by the information given to the boy as that is the most likely explanation for it. Abduction can also use Bayes’ theorem and this form of reasoning is commonly used in artificial intelligence research and diagnostic expert systems. One example of a diagnostic expert system is Mammonet which is designed to increase the diagnostic effectiveness of mammograms. A video of a Bayesian expert system in action can be found here. Theoretically, Bayesian expert systems could be used to determine the statistical likelihood of guilt as well, provided one is fine with artificial intelligence determining people’s guilt or innocence. Given that both inductive and abductive reasoning can use Bayes' theorem, whether a form of reasoning that uses Bayes' theorem is inductive or abductive can become cloudy. But what is clear is that it is Bayesian.

Prosecutor’s fallacy
In a court of law, a prosecutor’s function is to present evidence supporting the idea that a particular suspect is guilty of a particular crime. This can sometimes lead to fallacies, one of which is called the error of the transposed conditional (also known as confusion of the inverse). This fallacy occurs when one assumes that P(A|B) = P(B|A) which is a situation that is true only when P(A) = P(B) and that is rarely the case. In the breast cancer screening example mentioned earlier, it would be like an oncologist assuming that 87% is your cancer risk because a mammogram came back positive (P(B|A)) even though the actual P(A|B) = 11-12%.

In the case of a prosecutor, one might make the argument that because a person’s DNA was found at the scene of a crime and that particular test has a false positive rate of 1/1,000,000, then the likelihood of that happening if the suspect didn’t commit the crime was “one is a million”. That is not necessarily the case. For example, let us say that the suspect was found by searching a DNA database that contains the DNA profiles of one million people. Simply by pure chance and the false positive rate of the DNA test, it was quite likely that the person discovered was a false positive. Far from being the conclusive “One in a million” probability the prosecutor was claiming it to be, finding a DNA match was actually statistically insignificant unless there is other evidence connecting the suspect to the crime.

There are other fallacies committed by prosecutors during the course of prosecuting a case, though they don’t receive the title “the prosecutor’s fallacy”. Other examples would be Ipse dixit, which occurs when a prosecutor falsely believes a suspect who is lying, and selective attention, which occurs when a prosecutor sees and presents only that evidence which supports his side of the case and not the evidence that contradicts it.

Defense attorney’s fallacy
The defense attorney’s fallacy is another fallacy of conditional probability. This one causes a jury to think that the evidence against a suspect is weaker than it is. For example, let us say that a person has literally been caught red handed as the blood from the victim was found on the suspect’s hands soon after he committed the crime. The DNA evidence still has the same false positive rate in this example, namely one in a million. The defense attorney argues there are 7 billion people in the world and that means the DNA could have come from 7000 different people only one of whom was the victim. That is not the case. 4.4 billion of those people live in Asia and thus couldn’t be a source of the blood. Of the remaining 2.6 billion people, only a small percentage of them would have been near where the crime took place and an even smaller percentage of them would have been bleeding at the time. Given these factors, the initial “999,999 out of a million” probability that the blood was from the victim is actually a good estimate of the statistical likelihood for the DNA evidence. Obliviously this example is a bit of an exaggeration used to explain the basic idea, but it is illustrative of a legal strategy used by defense attorneys.

There are other fallacies made by defense attorneys that do not usually receive the title “The Defense Attorney’s Fallacy”. Just like prosecutors, defense attorneys suffer from the selective attention fallacy and only see and present that evidence which supports their side and not the evidence that contradicts it. They also commonly "blame the victim" in rape cases which is an ad hominem attack used to discredit the witness's testimony.

Problem of induction
All right, so let us say that we have gone through the trial, the prosecution has committed the prosecutor’s fallacy, and the defense attorney has committed the defense attorney fallacy as we actually expect the attorneys to do. How is the court supposed to decide whether the suspect is guilty or not? Well the way the US court system works is by expecting a jury to make sense of all the information given to them and come to a conclusion as to whether or not guilt has been achieved “beyond a reasonable doubt”.

So how is the jury supposed to make sense of complicated information like DNA evidence and not commit the prosecutor’s fallacy or the defense attorney's fallacy? For the purposes of illustration, let us suppose that a society is acting rationally and has concluded that using Bayesian inductive reasoning is the most reliable way to determine the likelihood of a person’s guilt or innocence. This brings us to the popularized by the philosopher David Hume. As we mentioned earlier, inductive reasoning is probabilistic and what this means is that one will never derive at 100% certainty for anything using this methodology. Consider, for example, the "rising" of the Sun. Every day for the entire history of mankind, the Earth has rotated on its axis to give the appearance of a rising sun. Moreover, it has done so ever since it was formed. Given given this evidence, it is highly, highly probable that the Sun will "rise" the next day, but it is not certain. And, once the Sun goes big red on us, the inductive argument that says the Sun will rise the next day may fail if the Earth becomes engulfed by it. Another example used to illustrate the problem of induction is swans. Suppose society was exposed to 5,000,000 swans and all of those swans were white. The society then used this information to induce the conclusion "All swans are (probably) white." Then one day a black swan was born. This black swan invalidates the inductive argument that said all swans are white. The birth of the black swan is what is known as a "black swan event" and such events show us that inductive arguments cannot provide us with 100% certainty. In terms of a courtroom setting, what this means is that 100% certainty cannot be the standard of proof one must reach because reaching that level of certainty is impossible.

Reasonable doubt
Instead what the court system in the USA asks jurors to do is judge whether someone is "guilty beyond a reasonable doubt." This leads one to ask "What is the statistical likelihood one must have for someone to be considered guilty by this standard of proof?

Is it 50%?

Is it 75%?

95%?

99%?

99.9999%?"

Well it is not 50.0000001% as that is the standard of proof for civil cases and the level of proof for criminal cases is supposed to be higher than for civil ones. But beyond that, the court system has avoided the issue and it is not like it hasn't come up. The issue of reasonable doubt reached the Supreme Court in Victor v. Nebraska though exactly what was decided in regards to standard of proof is difficult to say. An essay that appeared in the Harvard Law Review argued that reasonable doubt should be left undefined. Think about that for a minute. You are on trial for a crime that you may not have committed, you want to prove your innocence, and a prominent journal has an article in it arguing that the standard of proof the government has to prove shouldn't be defined?

Judge Jon Newman said the following about reasonable doubt. I find it rather unsettling that we are using a formulation (of standard of proof) that we believe will become less clear the more we explain it.

Laws of human stupidity
Now I know this requires one to stretch one's imagination to epic proportions, but let us imagine that society has decided to act rationally, use Bayes' formula in the courtroom, and lawyers have decided to define reasonable doubt as something other than BS. Let us say that 99% is the standard of proof. Given these conditions, it is possible for the court to come to a statistical determination as to whether or not standard of proof has been reached, but to do so rationally one must overcome yet another problem, namely the laws of human stupidity.

Consider, for example, this hypothetical scenario. The prosecuting and defense attorneys have magically been turned into statisticians, the judge has also been turned into a statistician, and they have presented incredibly cogent and intelligent arguments in court. Now it is up for the jury to decide whether or not the 99% standard of proof has been reached and the 12 member jury is comprised of the jury from hell: a creationist, Jenny McCarthy, a Scientologist, a Ufologist, a flat-earther, four Kardashians, George W. Bush, Alex Jones, and Michio Kaku. While the court may have no problem explaining their arguments to Michio, how are they supposed to explain their arguments to the other eleven jurors? You don't have to be a string theorist like Michio to understand Bayesian statistics, but you do have to be more intelligent than someone like Jenny McCarthy or a flat-earther. Even if a jury was stacked full of people who understand mathematics, there would still be disagreements about the strength of evidence used to derive the probabilistic conclusion. So how are 12 mathematically-inclined jurors, who will naturally vary in the assumptions one makes about the strength of evidence, supposed to come to a group decision as to whether or not reasonable doubt has been reached? The legal system provides no guidance as to how to deal with even this idealized scenario.

Before dismissing this as mere philosophical conjecture, please keep in mind that this is philosophy that is applied. People can disagree all they want to about whether God exists or not and their debate is not going to effect anyone's life, but the philosophy that underlies jurisprudence is different. Bad philosophy will result in false convictions and the release of guilty people ultimately endangering the welfare of the society at large.

Bayes' theorem in the court system
Let's continue with our Michio example to explore how the philosophy of jurisprudence is currently being applied. So you are Michio and you have been called as a jury member to evaluate the strength of the evidence presented to you and you are now being asked to come to a determination as to whether or not someone is guilty. As you are Michio, doing this is going to be no sweat for you provided that you are given enough data to work with. But are you given that data? The answer is probably not. Consider, for example, eyewitness testimony. In order to determine the statistical impact eyewitness testimony should have on the outcome of the case, one needs to know the probability of the evidence given the hypothesis (P(E|H)) and the prior probability of the evidence (P(E)). While some research has been done on the topic of the statistical significance of eyewitness testimony and could conceivably be presented in court, the more salient issue may be the statistical significance of how the police have conducted their investigation. For example, testimony of a witness who correctly identifies someone the police already suspect out of a 2000 person photo book is going to have a different statistical significance than a ten person lineup with the police giving subtle hints as to who the witness should choose. The innocence project has detailed several faulty police procedures in regards to collecting witness testimony. Without providing Michio with the data he needs about a particular wittiness's testimony based upon how the police have conducted their investigation, it is impossible for even someone as intelligent as him to come to a statistical determination about guilt.

Beyond the problem that admitted evidence is not usually presented as P(E|H) and P(E) is the problem that probabilistic data can be rejected by the court. The famous example of this is the blue bus problem, which is named after a case called Smith v. Rapid Transit that occurred in 1941. In this case, the witness knew that a bus ran her off the road causing her to crash and she also knew that there was only one company that ran routes on that street during that time of day. Thus, while she wasn't able to see the bus number of the bus who ran her off the road, she had good probabilistic data about who was responsible. The Massachusetts Supreme Court ruled "that probabilistic proof, by itself, is not sufficient to prove one’s case." Here is the fundamental problem with that ruling. What form of evidence isn't probabilistic? Eyewitness testimony is probabilistic and the blue bus evidence is likely to be a lot more reliable than it. Fingerprint evidence is probabilistic. Even something like DNA evidence is probabilistic, though it is reliable. So what evidence, according to the Massachusetts Supreme Court, can be admitted?

In England, a judge even ruled that Bayes' Theorem couldn't be used to evaluate evidence. This leads one to ask, "Then exactly how is evidence supposed to be evaluated if not by Bayes' Theorem?"

Seriously. Inquiring minds want to know.

Videos on how to do conditional probability and Bayes' theorem

 * Critical Thinker Academy
 * Veritasium
 * Ian Olasov
 * Trefor Bazett
 * Trefor Bazett explaining false positives and multiple tests
 * patrickJMT
 * Brandon Rohrer explains another way of doing Bayesian inference.

Sites that explain deductive, inductive, and abductive reasoning

 * Stanford Encyclopedia of Philosophy
 * Fact Myth - The Different Types of Reasoning Methods Explained and Compared by Thomas DeMichele

Sites that discuss the prosecutor's and defense attorney's fallacies

 * Interpretation of statistical evidence in criminal trials: The Prosecutor's Fallacy and the Defense Attorney's Fallacy by William Thompson and Edward Schumann
 * FORENSICS: Examining the Evidence
 * The Bayesian flip: Correcting the prosecutor's fallacy by William P. Skorupski and Howard Wainer  Royal Statistical Society 06 August 2015
 * A Powerpoint Presentaiton on Likelihood Ratio's from the University of Vermont
 * A more in-depth analysis of likelihood ratios that a flat-earther won't be able to understand. (Likelihood Ratio as Weight of Forensic Evidence: A Closer Look by Steven P. Lund and Hari Iyer, Statistical Engineering Division, Information Technology Laboratory)
 * David Colquhoun explains the prosecutor's fallacy.

Sites that discuss how courts treat probabilistic bvidence and Bayes' theroem

 * The Blue Bus Problem
 * Bayes and the Law by Norman Fenton, Martin Neil, and Daniel Berger Annu Rev Stat Appl. 2016 Jun; 3: 51–77.
 * A General Structure for Legal Arguments About Evidence Using Bayesian Networks by Norman Fenton, Martin Neil, David A. Lagnado Cognitive Science 30 October 2012
 * A formula for justice by Angela Saini ''The Guardian'
 * Courts’ use of statistics should be put on trial by Tom Sigfried