Correct Intuition? Standard Deviation and distance in $n$ dimensional space.

1.8k Views Asked by At

Basic Question

Is there an intuitive explanation of standard deviation in terms of Euclidean distance in $n$ dimensional space?

Longer Version of Question

To begin a more detailed sketch of my question, for simplicity let's just focus on the simple case of a discrete random variable that is uniformly distributed. In this case, the variance is given by the following formula, which I've abducted straight from Wikipedia:

$$ \frac1{n}\sum_{i =1}^n (x_i - \mu)^2$$

where $\mu$ is the mean. The standard deviation is then the square root of this. Now, I can't help noticing that the square root of the sum returns the euclidean distance from the vector $X = (x_1, x_2, \dots, x_n)$ to the vector $\vec \mu = (\mu, \mu, \dots, \mu)$. That is, the standard deviation can be expressed as:

$$ \frac1{\sqrt{n}}|X - \vec \mu |$$

So I wonder, is there any significant conceptual relationship between this distance $|X - \vec \mu |$ and standard devation or is this just a coincidence?

Even More Details...

I have looked up many explanations of standard deviation and its cousin variance. Here are some that I've seen already, each sort of following from the previous one:

  • We square the values before summing to get rid of the sign, which is obviously not important. This explanation is often criticised by hardcore statisticians and I can sort of see why: it doesn't explain why squaring beats taking the absolute value.
  • We square the values so that we pay a greater price for greater deviations. This explains why squaring beats taking absolute values. But why not raise to the power of $4$, or $6$, or any other even power before summing? What is so special about $2$?
  • The thing that is so special about $2$ is that it's the second moment of intertia, whereas the mean is the first moment, so mechanically it makes sense. I don't follow this. My intuition is totally OK with the mean: the point where, if I put my finger, the weights on either side will balance. But the second moment is harder for me to imagine physically like this.

Note, this is a question about intuition. I "understand" the mathematical formula at a shallow level: what all its terms mean, how to calculate it given a dataset. But I am not comfortable with my grasp on why this formula is "the best" one to use in so many applications e.g. the least squares method to fit data. I'm particularly confused as to why squaring is chosen as opposed to raising to some other even power e.g. $9234324$.

And this is where my intuition steps in and tries to provide an explanation that goes right back to the fundamental theorem of Pythagoras: euclidean distance. Here is my thought process: "The number $2$ is special. It's the unique power that makes Euclidean distance work. So maybe it's also the unique number that makes variance work." But then why the multiplying factor of $\frac1{\sqrt{n}}$? Is it just simply a case of: swallow it up and accept the definition, or can this intuition be resolved somehow?

1

There are 1 best solutions below

4
On BEST ANSWER

There is certainly a very clear "conceptual relationship" between the standard deviation and Euclidean distance: If we treat the whole available sample (the $x_i$'s) as a vector, then the Euclidean distance is a measure of how much this vector deviates from the vector containing the mean value, which is "the center" of the population.

But the standard deviation attempts to measure how much a single observation, not the whole sample, deviates "on average" from the mean value. Ah, then, why are we dividing by $\sqrt {n}$ and not by $n$?

Well, this becomes clear if we consider a vector $\mathbf x = (x,x,x,...x)$ Then, euclidean distance becomes

$$\sqrt {\sum_{i =1}^n (x_i - \mu)^2}=\sqrt n |x-μ|$$

So due to the square-root, Euclidean distance is not linearly additive, as we move from the one dimension, to $n$ dimensions : it does not increase by a factor of $n$, but only by a factor of $\sqrt {n}$. So to recover the "individual distance on average" we have to divide by $\sqrt n$.