why is variance so famous that it appears in almost half of the probability textbook?

1.4k Views Asked by At

why is variance so famous that it appears in almost half of the probability textbook?

What is its significant history so that a statistical model would appear in such textbooks and what does it help to evaluate probability?

In another word, what useful properties makes variance to get a lot of coverage in some probabilty textbook?

3

There are 3 best solutions below

0
On BEST ANSWER

Sure there are many reasons related in one way or another to the stability of systems. I'll give you an example that I'm working on. Imagine you use a cloud computing system. You log on to a server and you need to wait for some time to access your information. This is a queueing network where 'jobs' (customers) get in the queue upon arrival and wait for the 'server' (actual server bottleneck).

Image you have two options: one server has mean waiting time of 5 min and variance of 100, and second has mean 10 and variance of 0.1. What does it mean? The mean of waiting time/queue length of server 1 is lower, but if you want to join it you pretty much don't know how long you need to wait for: it can be 1 minute, it can be 15 min, 30 seconds, 40 mins etc. Essentially, mean is a bad estimate of how long you need to wait.

Contrary to that, waiting time of the second server is a much better estimate, because variance is very small: it is probably going to be something like 9.5 minutes, 10.1 minutes. etc.

So variance says something about the stability of the system.

0
On

I'm not sure I understand what you mean by it being a "statistical model." The way I've always looked at it is, variance is a probabilistic property of a distribution. It just happens to be extremely useful in statistics, since it is a standard parameter for some important distributions.

5
On

Here's a kind of off-the-wall idea.

A Gaussian random variable, whether scalar or vector, is naturally described, and completely determined, by its mean and variance. This is not the case for every distribution we might consider, but it's true for the Gaussian. And since the mean in many models is so often zero, the variance is arguably the most important of the two measures for study.

The Gaussian distribution happens to be the most important, or at least the most common, distribution that statisticians (or really, anyone dealing with random variables) consider. I can think about a couple of reasons why this is the case. For one thing, the Central Limit Theorem means we really are likely to encounter Gaussian distributions a lot. Many processes have fundamental underlying non-Gaussian variables, but they feed them into averaging processes whose outputs are therefore Gaussian (or nearly so).

Even if the CLT isn't saving our proverbial backsides, and the distribution we're trying to study isn't Gaussian, we often like to go ahead and pretend it is. And this is my second, more cynical reason: we use Gaussian methods because we know how.

For instance, least squares is maximum likelihood estimation when the residuals are expected to be Gaussian i.i.d. But we use least squares all the time in situations where we can't make that assumption reliably. There are a handful of other ML problems that are tractable (two-sided exponential and uniform come to mind). But even then, we might just go with least squares because, well, that's what the software on our computers knows how to compute.

That's not to say we should do this, and it certainly gets people in trouble sometimes to use Gaussian techniques on non-Gaussian statistics. But we do.

So to sum up, perhaps the reason that variance is so important is because it is the most important measurement in many cases; and we treat it like it is sometimes even when it's not.