Garbage in garbage out

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

Garbage in, garbage out (GIGO) is a long-standing principle of computer programming and algorithm design. It means that it doesn't matter how good your logic is; if your input is bad, your output will be nonsense. In philosophical terms, your argument may be valid, but not sound.

Let's examine Pascal's wager as an example of how this applies to debate tactics:
 * 1) If you believe in God and God does exist, you will be rewarded with eternal life in Heaven: thus an infinite gain.
 * 2) If you do not believe in God and God does exist, you will be condemned to remain in Hell forever: thus an infinite loss.
 * 3) If you believe in God and God does not exist, you will not be rewarded: thus a finite loss.
 * 4) If you do not believe in God and God does not exist, you will not be rewarded, but you have lived your own life: thus a finite gain.

A purely logical analysis of the statements as such suggests that the best option must necessarily be to believe in God. However, it can easily be argued that Pascal's premises were wrong. Pascal assumes that only one kind of faith (and therefore, one God) mattered. Pascal ignores the variety of human religious experiences, immediately invalidating the premises themselves, and therefore making the logical conclusion garbage.

Common sources of garbage input include cherry-picked or elided data, disproved "common sense", small sample sizes (and anecdotes), and outright lies, as well as equivocation over definitions. Garbage outputs are occasionally correct, but if they got there by the wrong path, they aren't of very much use.

Important caveat
There are various statistical methods that can take a collection of unreliable data, and "correct" it (or at least come closer to the truth). There are a few restrictions to these methods, that's probably also worth mentioning here, as they form another problem:


 * 1) The most notable is that the bad data must be at least slightly good, relevant, and based on something real. You're almost certainly not going to get an accurate temperature from a random number generator, or a measurement unit that's measuring the wrong thing; in other words, these methods merely slightly expand the realm of "non-garbage" data.
 * 2) You have to know the input is unreliable beforehand. If you go in assuming the input is better than it actually is, you'll wind up with garbage output regardless.
 * 3) There usually needs to be something accurate to compare it to. We can measure and compare the temperature inside a clothes drier directly (if expensively), which allows us to figure out with relative certainty how bad the cheaper sensors we'll actually install in the drier are. Measuring the temperature of distant stars, which we have only predictions and very indirect methods of measuring, is going to have a much larger error bar.
 * 4) Such methods have error ranges; because our tools are inaccurate and imprecise, we can only speak in terms of percent certainty (e.g., "There's a 95% chance the inside of the drier is between 70 and 73 degrees Celsius").