NUMBAT OER - Open Educational Resources

4. Variance

Whilst range and interquartile range are defined simply by particular points in the spread of data, neither measure captures the contribution of all data points to the overall variability. The quantity called 'variance' is based on the difference between the value for each observation and the mean of all of the observations. It can only be used with scale data, and would normally be associated with a mean as the description of the population average.

The calculation of variance, s2, can be represented by the word equation:

s2 = sum of squared deviations from the mean / degrees of freedom

The deviation of a data point from the mean is simply written as:

(x – xmean)

where xmean represents the mean value of all observations. If a data point is similar in value to the mean, the deviation will be small, whilst a large deviation arises from a data point that is very different from the mean. The squared deviation is written as:

(x – xmean)2

Note that whilst a deviation can be negative, its square is always positive. So when we add the deviations together, they reflect the differences between all points and the mean – positives and negatives don't cancel out. The sum of squared deviations is written as:

SSd = Σ(x – xmean)2

where the symbol Σ represents the sum, in this case of all of the deviations calculated for all data points.

The denominator in the variance word equation was called the 'degrees of freedom'. This expression is common in statistics, and is related clearly to the number of observations or categories. In the case of variance, the number of degrees of freedom is simply the number of observations minus one:

df = (n – 1)

So we can re-write the word equation in terms of the two expressions above:

s2 SSd
df

or:

s2 = Σ(x – xmean)2 / (n – 1)

Looking at the formula, you will see that a high average deviation, or a few very high values, will increase variance. For large samples, the number of degrees of freedom is very similar to the number of observations (df ≈ n), and the variance is almost the same as the average squared deviation from the mean. However, for very small samples the number of degrees of freedom is significantly smaller than the number of observations (df << n), so that the variance is larger than the average squared deviation from the mean.

You can use a spreadsheet to calculate variance, or it can be generated using a statistical software package such as SPSS. The purpose of describing the calculation here is simply to help you to understand how it works.