I was studying Fisher 1925 and while reading i had some trouble with this part.
- Fitting the Normal Distribution From a sample of $n$ individuals of a normal population the mean and the standard deviation of the population may be estimated by using two easily calculated statistics. The best estimate of $m$ is $x$ where
$$\overline{x} = \frac{1}{n} \, S(x),$$
while for the best estimate of sigma , we calculate s from $$S^2= \left(\frac{1}{n-1}\right)(X-\overline{X})^2$$ these two statistics are calculated from the sums of the first two powers of the observations (see Appendix, p. 73), and are specially related to the normal distribution, in that they summarise the whole of the information which the sample provides as to the distribution from which it was drawn, provided the latter was normal. Fitting by sums of powers, and especially by the particular system of statistics known as moments, has also been widely applied to skew (asymmetrical) distributions, and others which are not normal.
On the other hand, on page 73 he writes:
A. Statistics derived from sums of powers. If we have $n$ observations of a variate $x$, it is easy to calculate for the sample the sums of the simpler powers of the values observed, these we may write \begin{align} s_l &= S(x) \\ s_2 &= S(X^2) \\ s_3 &= S(X^3) \\ s_4 &= S(X^4) \end{align} and so on.
It is convenient arithmetically to calculate from these the sums of powers of deviations from the mean defined by the equations \begin{align} S_2 &= s_2- \frac{1}{n} \cdot s_1^2 \\ S_3 &= s_3 - \frac{3}{n} \cdot s_2\cdot s_1 + \frac{2}{n^2} \cdot s_1^3 \\ S_4 &= s_4 - \frac{4}{n} \cdot s_3 \cdot s_1 + \frac{6}{n^2} \cdot s_2 \cdot s_1^2 - \frac{3}{n^3} \cdot s_1^4 \end{align}
Many statistics in frequent use are derived from these
values.
(i) Moments about the arbitrary origin, $x = 0$;
these are derived simply by dividing the corresponding
sum by the number in the sample; in general if $p$ stand
for $p= I, 2, 3, 4, \ldots$, they are defined by the formula
$$m'p = \frac{1}{n} \cdot sp,$$
where ($p$ is index).
Clearly m'l is the arithmetic mean, usually written $\overline{x}$ (ii) In order to obtain values independent of the arbitrary origin, and more closely related to the intrinsic characteristics of the 'population sampled, values called" moments about the mean" are widely used, which are found by dividing the sums of powers about the mean by the sample number; thus if $p=2, 3, 4, \ldots$
$$mp= \frac{1}{n}\cdot S p$$ (again $p$ is the index)
My question are: 1) which is the intuition behind moment ( i searched on google but it was a bit complicated to get the point.) Pls try to explain the logic and not so many equations. 2) Where are these equations coming from? \begin{align} S_2 &= s_2 - \frac{1}{n}\cdot s_1^2 \\ S_3 &= s_3 - \frac{3}{n}\cdot s_2\cdot s_1 + \frac{2}{n^2}\cdot s_1^3 \\ S_4 &= s_4 - \frac{4}{n}\cdot s_3\cdot s_1 + \frac{6}{n^2}\cdot s_2 \cdot s_1^2 - \frac{3}{n^3}\cdot s_1^4 \text{ ?} \end{align} 3) Why in the case of normal distribution, sums of first and second power they summarise the whole of the information which the sample provides as to the distribution from which it was drawn, provided the latter was normal, and in other distributions you have to calculate greater power sums?
Thanks in advance!
The normal distribution is one of the few fairly complex distributions which is completely specified by the mean and standard deviation. Other distributions may have more than two parameters, e.g. involving higher moments. Technically speaking, you can define just about continuous probability density that has as many pre-specified moments as you want, and still have lots of degrees of freedom in how you define the density. The only constraints you have to satisfy are the specified moments. Densities that are piecewise constant (step) functions are a good way to illustrate this, and then you can extend a step function with finitely many steps to an arbitrarily close continuous probability density.
The equations for moments from averages of powers are simply ways to express the mean of $(x_i - \mu)^k$, where $\mu$ is the mean of the samples, in terms of the means of the powers of the $x_i$.