My statistics background is almost $\varnothing$, so I apologise if my question has already been asked, and I just didn't know the right terminology to find it here.
Suppose $p_X$ is the probability density of a random variable $X$ (say, defined over $\mathbb{R}$). Then $$\int_{\mathbb R} p_X(x) \mathrm{d}x = 1,\quad \mathbb{E}(X) = \int_{\mathbb R} xp_X(x)\mathrm{d}x, \quad \mathrm{Var}(X) = \mathbb{E}(X^2) - \mathbb{E}(X)^2.$$ The above three conditions/definitions depend precisely on $\mathbb{E}(X^0), \mathbb{E}(X^1)$, and $\mathbb{E}(X^2)$. This makes me wonder why I've never encountered higher $n$, i.e. why is $\mathbb{E}(X^n)$ not important for bigger $n$? (Of course the obvious answer is that they are important, but I've just never seen them because of my limited stats background. However, this feeds into a second related question below.)
The above question organically formed when I was reading a proof of the law of large numbers using Fourier transforms. Assume $p_{X_i}$ has mean $0$ and finite variance $D$ for every $i$. Let $Z_m = \frac{1}{m}(X_1 + \cdots + X_m)$. We wish to prove that $\lim_{m\to\infty}\mathbb{E}(Z_m) = 0$.
Recalling the formula $$\hat{f}^{(n)}(\zeta) = (-2\pi i)^n (\widehat{x^nf})(\zeta),$$ we have $\hat{p}_X(0) = 1, \hat{p}_X'(0) = 0, \hat{p}_X''(0) = -4\pi^2D$ and so on. The proof goes on to say that for any choice of $\zeta$, $$\lim_{m \to \infty} \hat{p}_{Z_m}(\zeta) = \lim_{m\to\infty} \hat{p}_X(\zeta/m)^m = \lim_{m\to\infty}\Big(1 - \frac{2\pi^2D^2 \zeta^2}{m^2}\Big)^m = 1.$$
The last step is expressing $\hat{p}_X$ as a Taylor series (expanded about 0) and ignoring higher order terms. However, without any additional information about higher order derivatives of $\widehat{p}_X$ (i.e. the values of $\mathbb{E}(X^n)$ for larger $n$) the assumption that the Taylor series converges everywhere is surely wrong. My dissatisfaction with this proof can again be summarised as "why are the mean and variance enough?". Is there an unspoken assumption that all probability distributions are analytic almost everywhere? Or do the mean and variance truly pin down the probability density function in such a way that we can ignore all higher order terms?
The expected value and the variance describe the central trend and the spread. These are the minimal quantities one cares about for a first grasp of the data. They are sufficient to define a linear transformation of the data to normalize it (nonlinear transformations aren't so prized).
Higher moments (classically up to order $4$) can be used for a finer description (symmetry and resemblance to a normal law) but they are much less useful, and the tools for their estimation are much less developed.
Also think that the mean and standard deviation are all it takes to describe a Normal distribution, and by the CLT, this distribution is the "destiny" of all samples.