Standard deviation

Standard deviation is a measure in statistics for how much a set of values varies. If the data is normally distributed, it allows for us to find how likely it is for a specific value to be obtained by doing a Z-test.

Standard deviation
The standard deviation of a set of values is a measure of how widely the values differ from each other. Specifically, standard deviation follows the equation:


 * $$\sigma(x) = \sqrt {\sum(x - \bar x)^2 \over n - 1}$$

This is the square root of the variance, which is:


 * $$Var(x) = {\sum (x - \bar x)^2 \over n - 1}$$

Where:
 * $$\bar x$$ is the arithmetic mean of all values of x
 * $$\sum$$ is the summation function
 * $$n$$ is the number of $$x$$ values

If the values are normally distributed, then they follow the Empirical rule, which states that:


 * 68% of the values will fall within 1$$\sigma$$ of the mean.
 * 95% of all values will fall within 2$$\sigma$$ of the mean.
 * 99.7% of all values will fall within 3$$\sigma$$ of the mean.

To put this in more rookie terms
Not everyone understands what a Σ means. For those of us unfortunate enough to be in this position, please refer to this section.

The formula for standard deviation when every value in the population studied is known is as follows:

$$\sqrt{\frac{(\text{first value} - \text{average})^2 + (\text{second value} - \text{average})^2 + \dots + (\text{final value} - \text{average})^2}{\text{number of values in the set}}}$$

If we do not have knowledge about the whole population, but rather of a sample within the population, the sample standard deviation (an estimate of the true standard deviation) is:

$$\sqrt{\frac{(\text{first value} - \text{average})^2 + (\text{second value} - \text{average})^2 + \dots + (\text{final value} - \text{average})^2}{\text{number of values in the set} - 1}}$$

Let's give it a shot
\sqrt{\tfrac{(1 - 10)^2 + (4 - 10)^2 + (7 - 10)^2 + (12 - 10)^2 + (17 - 10)^2 + (19 - 10)^2}{6}} &= \sqrt{\tfrac{(-9)^2 + (-6)^2 + (-3)^2 + 2^2 + 7^2 + 9^2}{6}} \\ &= \sqrt{\tfrac{81 + 36 + 9 + 4 + 49 + 81}{6}} \\ &= \sqrt{\tfrac{260}{6}} \\ &= 6.582805886 \end{align}$$
 * 1) Suppose you have a data set including only the 6 values: 1, 4, 7, 12, 17, 19.
 * 2) To derive the standard deviation, you must first determine the average, or arithmetic mean, of all our values.  So we calculate:
 * $$\tfrac{1 + 4 + 7 + 12 + 17 + 19}{6} = 10$$
 * 1) In this case, we know every single value, so we use the first formula:
 * $$\begin{align}

So, the standard deviation of our data set is 6.582805886.

This value, 6.582805886, can be considered to be 1 standard deviation. If we double the number, we get 13.165611772, or 2 standard deviations.

If the distribution of the values is "normal", then they follow the Empirical rule, which states that:


 * 68% of the values will fall within 1 standard deviation of the mean.
 * 95% of all values will fall within 2 standard deviations of the mean.
 * 99.7% of all values will fall within 3 standard deviations of the mean.

In this case, only 50% (4, 7, 12) of the values fall within 1 standard deviation of the mean, although 100% of the values fall within two standard deviations of mean.

Using your brand new standard deviation
OK, now that you've calculated this crazy thing, what next? Use it! Build a 95% confidence interval:


 * 1) Start with the average up above, 10. Not only is it the the average of the data, but it's also the best guess of the average of the population you sampled to get the data.
 * 2) Take that standard deviation and divide it by the square root of the sample size minus 1. So, that's the square root of 5 (which is 6-1) or 2.236067977. That's the standard error.
 * $${SE(x) = {6.582805886\over 2.236067977} = 2.919436675}$$
 * 1) Multiply by 2 to get 5.838873350.
 * 2) Now, subtract that number from the average and, separately, add it. These two new numbers form an approximate 95% confidence interval (4.161126650, 15.83887335)  for the population average.

Roughly speaking, this confidence interval means you have a relatively high level of confidence that the true population mean falls somewhere between 4.161 and 15.839.