Friday, March 2, 2012

The true meaning of the average (arithmetic mean) of a list of numbers!

For too long we have been averaging numbers without knowing what the average really is. The process of dividing the total by the amount of numbers is so simple that we use it whenever we can without realizing what it actually means to do so.

What the arithmetic mean of a list of numbers is, is the number which is closest to all the other numbers. It is the number which when compared to all the other numbers will give the least error. The fact that it is the number which is least erroneous to all the numbers in the list makes it a good approximation of every number in the list.

Given this definition, should the mean be used as extensively as it is? In an exam result, wouldn't it be more useful to know the percentage of students who passed and failed rather than know the average mark? Or the percentage of students that scored each category of marks (A, B, C, etc)? Just by knowing the single approximate mark of every student will not really let you know how good your mark is when compared to every other mark. Percentages would be more useful.

But I digress. Let's focus on what the arithmetic mean really means.

What does "error" mean? If I say 7 when I should have said 5, what was my error? The most intuitive answer would be 2, which is the absolute difference between the two numbers. Let's see where this takes us.

Error as absolute difference
We want to find a number which has the least error when compared to every number in the list. This is less obvious that one might think as there are many ways to interpret such a statement. What does it mean for a number to have the least error when compared to every number in the list? Two ways I can think of are:

The least maximum error
One way would be to find the number which after computing its error to all the numbers, the maximum error it gives will be the least possible.

So if we have a list [1,3,7], the error the number 5 has to each number is [|1-5|, |3-5|, |7-5|] = [|-4|, |-2|, |2|] = [4, 2, 2]. Now pick the maximum error of these and we have 4. Using this interpretation, 4 is the error 5 has when compared to [1,3,7]. Now we want to find the number which will give the least maximum error.

So since we want to find the number which gives the least maximum error, we are looking for a number "x" such that
max([|1-x|, |3-x|, |7-x|])
will be the least possible.

Let's plot the graphs of each "x" separately, that is, y = |1-x|, y = |3-x| and y = |7-x|.


Each graph is a plot of how the error when compared to each number changes as different "x" are used. It is clear that the lines giving the maximum error are the ones highest up, which are the ones belonging to the largest number in the list (green) and the smallest number in the list (red) but only one of them is the maximum error at any point, depending of whether they have intersected or not. So when x < 4, the maximum error is described by |1-x| and when x > 4, the maximum error is described by |7-x|. When x = 4, they intersect and hence give the same error.

The minimum error of of these maximum errors are where |x-1| and |x-7| intersect, that is, at 4. So 4 is the number which gives the least maximum error. This error would be 3. If you try any other number instead of 4 to represent these 3 numbers, the maximum error you will get will be greater.

In general, we want to find where |max_num-x| and |min_num-x| intersect, as these will obviously always be the lines highest up and the intersection will always be their minimum. We notice that the 2 lines above the intersection are y = max_num-x and y = x-min_num so to find the intersection we put "x" subject of the formula in max_num-x = x-min_num (simultaneous equations) and we have
x = (max_num + min_num)/2
so the number which best approximates the numbers would be (max_num + min_num)/2, which is the number exactly in the middle of max_num and min_num. As you can notice, only the maximum and minimum numbers are taken into consideration using this interpretation which will not give a very representative number of all the numbers.

The least sum of errors
Another way would be to find the number which after computing its error to all the numbers, the sum of all errors it gives will be the least possible.

So if we have a list [1,3,7], the error the number 5 has to each number is [|1-5|, |3-5|, |7-5|] = [|-4|, |-2|, |2|] = [4, 2, 2]. Now add the errors together and we have 4+2+2 = 8. Using this interpretation, 8 is the error 5 has when compared to [1,3,7]. Now we want to find the number which will give the least sum of errors.

So since we want to find the number which gives the least sum of errors, we are looking for a number "x" such that
sum([|1-x|, |3-x|, |7-x|])
will be the least possible.

Let's plot the graphs of each "x" separately, that is, y = |1-x|, y = |3-x| and y = |7-x|, together with their sum.


The blue line is the sum of errors. This looks promising. There is a global minimum at x = 3. So 3 would give the least sum of errors when compared to [1,3,7]. But a little analysis of how absolute differences add up together will show that the minimum will always occur at the minimum of the median number, that is 3 in [1,3,7]. In fact if we should try this on a list with no median number, such as [1,3,5,7], the following graph will emerge:


As you can see there is no single global minimum. There is a plateau which spans between the 2 middle numbers 3 and 5 in [1,3,5,7]. Again, just like in the previous case, the median does not take in account all the numbers and so is not a good representative of all the numbers.

Error as a square of the difference
Absolute errors don't seem to be too promising. But we need to somehow make sure that the differences are always positive. We can't have any negative errors as otherwise they will cancel out with the positive errors. One way to make sure we have positive differences is by squaring the differences, since square numbers are always positive, unless they are complex, which is not a relevant issue here.

We shall now consider the two interpretations of "error when compared to all the numbers" again using this definition of "error".

The least maximum error
We now plot the 3 graphs again but this time using y = (1-x)^2, y = (3-x)^2 and y = (7-x)^2.


So what is the maximum error of these errors? Again it's the highest lines which are composed of y = (max_num-x)^2 and y = (min_num-x)^2 so again we will not get a representative of all the numbers because only the maximum and minimum numbers are considered.

The least sum of errors
What happens if we should add the errors together?


Again, the sum is the blue line, and again there's a peak. But this time it's not just a peak, it's a quadratic equation's turning point. y = (1-x)^2, y = (3-x)^2 and y = (7-x)^2 are all quadratic equations and when you add up quadratic equations you get another quadratic equation which will always have a peak. What's more is that it is not just the median as in the previous case, so what is it?

y = (1-x)^2 + (3-x)^2 + (7-x)^2
y = (1+x^2-2x) + (9+x^2-6x) + (49+x^2-14x)
y = 3x^2 - 22x + 59

So the equation which describes the sum of the errors when compared to a particular "x" is y = 3x^2 - 22x + 59. Where is this equation's minimum? Using the quadratic equation formula x@min = -b/(2a) (derived by using differentiation),

x@min = -(-22)/(2*3) = 11/3

Does 11/3 make you think of anything? It's (1+3+7)/3, which is the arithmetic mean of [1,3,7].

In fact if we use a general list of "n" numbers [a_1, a_2, ..., a_n], the sum of errors when compared with "x" will be:

y = (a_1-x)^2 + (a_2-x)^2 + ... + (a_n-x)^2
y = (a_1^2+x^2-2xa_1) + (a_2^2+x^2-2xa_2) + ... + (a_n^2+x^2-2xa_n)
y = (x^2 + x^2 + ... + x^2) + (-2xa_1 + -2xa_2 + ... + -2xa_n) + (a_1^2 + a_2^2 + ... + a_n^2)
y = nx^2 - 2(a_1 + a_2 + ... + a_n)x + (a_1^2 + a_2^2 + ... + a_n^2)

x@min = -(-2(a_1 + a_2 + ... + a_n))/(2n) = (a_1 + a_2 + ... + a_n)/n

And there you have it, the derivation of the arithmetic mean.

So what is the arithmetic mean? It is the number which gives the least sum of square errors. This seems to be the best interpretation of "least error when compared to all numbers in the list" as it considers all the numbers in the list. In fact square errors are a common occurrence in statistics.

It is interesting to note that if instead of squaring we use 4th powers, or 6th powers, or other even powers which also make the differences positive, the turning points will not be at the same value of "x". But we'll leave the analysis of this observation for another post.