Why are variance and expected value all we care about?

302 Views Asked by At

My statistics background is almost $\varnothing$, so I apologise if my question has already been asked, and I just didn't know the right terminology to find it here.

Suppose $p_X$ is the probability density of a random variable $X$ (say, defined over $\mathbb{R}$). Then $$\int_{\mathbb R} p_X(x) \mathrm{d}x = 1,\quad \mathbb{E}(X) = \int_{\mathbb R} xp_X(x)\mathrm{d}x, \quad \mathrm{Var}(X) = \mathbb{E}(X^2) - \mathbb{E}(X)^2.$$ The above three conditions/definitions depend precisely on $\mathbb{E}(X^0), \mathbb{E}(X^1)$, and $\mathbb{E}(X^2)$. This makes me wonder why I've never encountered higher $n$, i.e. why is $\mathbb{E}(X^n)$ not important for bigger $n$? (Of course the obvious answer is that they are important, but I've just never seen them because of my limited stats background. However, this feeds into a second related question below.)


The above question organically formed when I was reading a proof of the law of large numbers using Fourier transforms. Assume $p_{X_i}$ has mean $0$ and finite variance $D$ for every $i$. Let $Z_m = \frac{1}{m}(X_1 + \cdots + X_m)$. We wish to prove that $\lim_{m\to\infty}\mathbb{E}(Z_m) = 0$.

Recalling the formula $$\hat{f}^{(n)}(\zeta) = (-2\pi i)^n (\widehat{x^nf})(\zeta),$$ we have $\hat{p}_X(0) = 1, \hat{p}_X'(0) = 0, \hat{p}_X''(0) = -4\pi^2D$ and so on. The proof goes on to say that for any choice of $\zeta$, $$\lim_{m \to \infty} \hat{p}_{Z_m}(\zeta) = \lim_{m\to\infty} \hat{p}_X(\zeta/m)^m = \lim_{m\to\infty}\Big(1 - \frac{2\pi^2D^2 \zeta^2}{m^2}\Big)^m = 1.$$

The last step is expressing $\hat{p}_X$ as a Taylor series (expanded about 0) and ignoring higher order terms. However, without any additional information about higher order derivatives of $\widehat{p}_X$ (i.e. the values of $\mathbb{E}(X^n)$ for larger $n$) the assumption that the Taylor series converges everywhere is surely wrong. My dissatisfaction with this proof can again be summarised as "why are the mean and variance enough?". Is there an unspoken assumption that all probability distributions are analytic almost everywhere? Or do the mean and variance truly pin down the probability density function in such a way that we can ignore all higher order terms?

2

There are 2 best solutions below

0
On BEST ANSWER

The expected value and the variance describe the central trend and the spread. These are the minimal quantities one cares about for a first grasp of the data. They are sufficient to define a linear transformation of the data to normalize it (nonlinear transformations aren't so prized).

Higher moments (classically up to order $4$) can be used for a finer description (symmetry and resemblance to a normal law) but they are much less useful, and the tools for their estimation are much less developed.

Also think that the mean and standard deviation are all it takes to describe a Normal distribution, and by the CLT, this distribution is the "destiny" of all samples.

4
On

They are not all we care about. The next powers are known as skewness and curtosis. All positive integer powers together uniquely determine a distribution (if they exist and don't grow too quickly), the first two powers never uniquely determine a distribution.

Although your proof of the law of large numbers has its problems (the statement is wrong and the proof is too big of a hammer in my opinion), the existence of the first two moments is indeed enough here. (A more difficult proof shows that even the first moment alone is enough. For your proof, note that you don't need convergence of the Taylor series, you just need the standard error bounds for the second order Taylor expansion)