Is it possible to derive the equation for the arithmetic mean?

512 Views Asked by At

As I understand it, the arithmetic mean is a measure of central tendency, i.e. it is a value that quantifies the location of the centre of a distribution of data points (the point about which the data tends to cluster).

My question is, is it possible to derive the formula for the arithmetic mean, $\bar{x}$ of a discrete set of data: $$\bar{x}=\frac{1}{N}\sum_{i=1}^{N}x_{i}$$ where $N$ is the total number of data points and $\lbrace x_{i}\rbrace$ are the data points.

I have attempted a derivation, but I'm unsure whether it is valid:

Suppose one has a finite, discrete set of $N$ data points $\lbrace x_{i}\rbrace_{i=1,\ldots N}$. Assuming this set has a central value (i.e. a mean value), then by definition, the sum of positive deviations should be equal to the sum of negative deviations from this central value (where by positive deviation, we mean that a given data point $x_{i}$ is deviated from the central value $\bar{x}$ by an amount $x_{i}-\bar{x}$ and by negative deviation, that a given data point is deviated from the central value by an amount $\bar{x}-x_{i}$). Now, if we rearrange the set of data points into two subsets, one containing all points each of whose value is less than the central value, i.e. $x_{1},x_{2},\ldots,x_{i}<\bar{x}$, and the other containing all points each of whose value is greater than, or equal to the central value, i.e. $x_{i+1},x_{i+2},\ldots,x_{N}\geq\bar{x}$. It follows that, $$\left(x_{N}-\bar{x}\right)+\left(x_{N-1}-\bar{x}\right)+\cdots +\left(x_{i+1}-\bar{x}\right)-\left(\bar{x}-x_{i}\right)-\left(\bar{x}-x_{i-1}\right)-\cdots -\left(\bar{x}-x_{1}\right)=0$$ Which, upon rearranging terms, gives $$x_{1}+x_{2}+\cdots +x_{i-1}+x_{i}+x_{i+1}+\cdots +x_{N-1}+x_{N}-N\bar{x}=\sum_{i=1}^{N}x_{i}-N\bar{x}=0\\ \Rightarrow\qquad\bar{x}=\frac{1}{N}\sum_{i=1}^{N}x_{i}\;\;.$$

1

There are 1 best solutions below

1
On

Here is the required derivation of the mean (arithmetic average): Suppose we want to find a single number that minimizes the error between our guess and any pick from a distribution of a random variable. Call our number "mu." The sum of all the errors between mu and all our picks from the random variable are given by: errors = (z_1 - mu) + (z_2 - mu) + ... + (z_N - mu); errors = z_1 + z_2 + ... + z_N - mu - mu - ... - mu; errors = z_1 + z_2 + ... + z_N - N mu. Of course we would want our value of mu to minimize the errors; in order to do so, we should let our errors equal zero (this is unbiasedness). Now our formula is: 0 = z_1 + z_2 + ... + z_N - N mu; N mu = z_1 + z_2 + ... + z_N; or: mu = 1/N(z_1 + z_2 + ... + z_N). This last is the formula for the arithmetic average, or mean, which is the requirement of the original poster. Note that I have assumed we can (in theory) exhaustively sample the population (i.e., mu is the population mean). If you are content with the sample mean, you can replace mu with xbar. To make this rigorous for a continuous variable takes more work, but proceeds in a similar fashion--the important constraint is the condition of unbiasedness.