I have some questions related to Fisher's book 1925.

136 Views Asked by At

I was studying Fisher 1925 and while reading i had some trouble with this part.

  1. Fitting the Normal Distribution From a sample of $n$ individuals of a normal population the mean and the standard deviation of the population may be estimated by using two easily calculated statistics. The best estimate of $m$ is $x$ where

$$\overline{x} = \frac{1}{n} \, S(x),$$

while for the best estimate of sigma , we calculate s from $$S^2= \left(\frac{1}{n-1}\right)(X-\overline{X})^2$$ these two statistics are calculated from the sums of the first two powers of the observations (see Appendix, p. 73), and are specially related to the normal distribution, in that they summarise the whole of the information which the sample provides as to the distribution from which it was drawn, provided the latter was normal. Fitting by sums of powers, and especially by the particular system of statistics known as moments, has also been widely applied to skew (asymmetrical) distributions, and others which are not normal.

On the other hand, on page 73 he writes:

A. Statistics derived from sums of powers. If we have $n$ observations of a variate $x$, it is easy to calculate for the sample the sums of the simpler powers of the values observed, these we may write \begin{align} s_l &= S(x) \\ s_2 &= S(X^2) \\ s_3 &= S(X^3) \\ s_4 &= S(X^4) \end{align} and so on.

It is convenient arithmetically to calculate from these the sums of powers of deviations from the mean defined by the equations \begin{align} S_2 &= s_2- \frac{1}{n} \cdot s_1^2 \\ S_3 &= s_3 - \frac{3}{n} \cdot s_2\cdot s_1 + \frac{2}{n^2} \cdot s_1^3 \\ S_4 &= s_4 - \frac{4}{n} \cdot s_3 \cdot s_1 + \frac{6}{n^2} \cdot s_2 \cdot s_1^2 - \frac{3}{n^3} \cdot s_1^4 \end{align}

Many statistics in frequent use are derived from these values. (i) Moments about the arbitrary origin, $x = 0$; these are derived simply by dividing the corresponding sum by the number in the sample; in general if $p$ stand for $p= I, 2, 3, 4, \ldots$, they are defined by the formula $$m'p = \frac{1}{n} \cdot sp,$$
where ($p$ is index).

Clearly m'l is the arithmetic mean, usually written $\overline{x}$ (ii) In order to obtain values independent of the arbitrary origin, and more closely related to the intrinsic characteristics of the 'population sampled, values called" moments about the mean" are widely used, which are found by dividing the sums of powers about the mean by the sample number; thus if $p=2, 3, 4, \ldots$

$$mp= \frac{1}{n}\cdot S p$$ (again $p$ is the index)

My question are: 1) which is the intuition behind moment ( i searched on google but it was a bit complicated to get the point.) Pls try to explain the logic and not so many equations. 2) Where are these equations coming from? \begin{align} S_2 &= s_2 - \frac{1}{n}\cdot s_1^2 \\ S_3 &= s_3 - \frac{3}{n}\cdot s_2\cdot s_1 + \frac{2}{n^2}\cdot s_1^3 \\ S_4 &= s_4 - \frac{4}{n}\cdot s_3\cdot s_1 + \frac{6}{n^2}\cdot s_2 \cdot s_1^2 - \frac{3}{n^3}\cdot s_1^4 \text{ ?} \end{align} 3) Why in the case of normal distribution, sums of first and second power they summarise the whole of the information which the sample provides as to the distribution from which it was drawn, provided the latter was normal, and in other distributions you have to calculate greater power sums?

Thanks in advance!

3

There are 3 best solutions below

1
On BEST ANSWER

The normal distribution is one of the few fairly complex distributions which is completely specified by the mean and standard deviation. Other distributions may have more than two parameters, e.g. involving higher moments. Technically speaking, you can define just about continuous probability density that has as many pre-specified moments as you want, and still have lots of degrees of freedom in how you define the density. The only constraints you have to satisfy are the specified moments. Densities that are piecewise constant (step) functions are a good way to illustrate this, and then you can extend a step function with finitely many steps to an arbitrarily close continuous probability density.

The equations for moments from averages of powers are simply ways to express the mean of $(x_i - \mu)^k$, where $\mu$ is the mean of the samples, in terms of the means of the powers of the $x_i$.

0
On

In answer to $\#3$, the usual rationale is that if $\vphantom{\dfrac 1 1} X_1,\ldots,X_n\sim\text{ i.i.d. } N(\mu,\sigma^2)$ then the conditional probability distribution of $X_1,\ldots,X_n$ given $X_1+\cdots+X_n$ and $X_1^2+\cdots+X_n^2$ does not depend on $\mu$ and $\sigma^2$. That is what it means to say that that pair of statistics is a "sufficient statistic" for the family of normal distributions. That is taken to mean that all information in $X_1,\ldots,X_n$ that is relevant to estimation of $\mu$ and $\sigma^2$ is contained in those two statistics. If one subtracts the sample mean from each of $X_1,\ldots,X_n$ and then divides by the sample standard deviation, one gets a vector that gives no information relevant to such estimation provided one is certain that the population is normally distributed. But the information in that vector is exactly where one looks for any information in the data that would indicate that the population is not normally distributed. In fact, if you then sort those numbers into increasing order, you're discarding information about the order in which they appeared originally, and that information is treated as relevant neither to estimation of $\mu$ and $\sigma^2$ nor to diagnosing non-normality. Plot those sorted standardized values against the quantiles $i/(n+1)$ of the standard normal distribution for $i=1,\ldots,n$, and you've got a "normal probability plot". In R the command for this is "qqnorm". Probably your typical statistician looks at that plot and applies the interocular impact test, and only if that doesn't look good does one resort to something like the Shapiro–Wilk test.

0
On

You have many questions, and I don't have my copy of Fisher's book at hand, but I'll try to answer some questions.

NOTATION. First, the notation in Fisher's book is largely from another era. He tended to use $S$ instead of $\Sigma$ or $\sum_{i=1}^n$ to indicate summation. There are intricate differences between small $s$ and capital $S,$ and between $x$ and $X$, sometimes including the convention $x = (X - \bar x).$ (Idea $x$ has been 'reduced' by subtracting off the mean from $X$.) I have seen your post before and after editing, notation was confused in the original, and not all edits have been beneficial.

Near the beginning, I believe it should be something like $s = \frac{1}{n-1}S(X - \bar x)^2$ for what we would not write as $s = \frac{1}{n-1}\sum_{i=1}^n (X - \bar X)^2$ or $s = \frac{1}{n-1}\sum_{i=1}^n (x - \bar x)^2.$ Also, notation can change from one chapter to the next. A high priority was put on using as few characters as possible in formulas. The message here is that you have to watch font cases very carefully and check to be sure you are tuned into the current notation. You are not the first person to find Fisher's notation difficult.

PARAMETERS. There are many distributional families that use one or two (or sometimes more) 'parameters'. Once the parameters are assigned specific numerical values, one particular member of the family has been identified. The density function of the normal family of distributions has parameters $\mu$ (corresponding to the center of symmetry) and $\sigma$ (corresponding to the scale or spread). Wikipedia on 'normal distribution' has the general formula and colored curves for various members of the family. The gamma family of distributions also has two parameters, but they do not directly correspond to mean and standard deviation. The Poisson family has one parameter (often written $\lambda$) which happens to correspond to both the mean and the variance. And so on.

POPULATION MOMENTS. Moments are expected values of random variables associated with a distribution. (The name traces back to physics. You may have heard of 'moment of inertia'. But there is not enough intuitive connection for a discussion of that connection here.) There are central moments and noncentral moments depending on whether the mean has been subtracted or not. For a random variable $X$ with the normal distribution, the first moment is $E(X) = \mu,$ the second moment is $E(X^2) = \mu^2 + \sigma^2,$ the second central moment is $E[(X - \mu)^2] = \sigma^2.$ For the normal distribution the odd numbered central moments are all $0$ because of the symmetry of the distribution; for example, $E[(X - \mu)^{15}]= 0.$ At the end of the 'moments' section of Wikipedia on 'normal distribution', there is a table of moments.

SAMPLE MOMENTS AND ESTIMATION. There are also 'sample moments': for example, $\frac{1}{n}\sum_{i = 1}^n X_i$ is the first noncentral sample moment (often called just the mean). The second sample noncentral sample moment is $\frac{1}{n}\sum_{i = 1}^n X_i^2.$ The second central sample moment is $\frac{1}{n}\sum_{i=1}^n (X_i - \mu)^2$, which is 'almost' the usual sample variance, except that for technical and historical reasons it is customary to use $s^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \mu)^2$ instead.

This is not the place to catalog the uses of moments in statistical theory. One use may have some intuitive appeal. In 'nethod of moments' estimation one equates corresponding population and sample moments to get estimators. In a very simple example, for the normal distribution, we may set the first noncentral sample moment equal to the first noncentral population moment to get "$\mu = \bar X$". Then we write $\hat \mu = \bar X$ where the 'hat' indicates estimation. If we don't know the mean $\mu$ of a normal distribution, a good estimate of it is the mean $\bar X$ of a sample (the bigger the sample the more reliable the estimate).

Also, one can show that if one knows the numerical values of all the moments of a distribution, then the distribution is uniquely identified (with a few technical exceptions I won't pursue here). This has to do with something called 'moment generating functions' which have many uses in mathematical statistics.

If you know you have some sort of NORMAL distribution, then we have seen that knowing the first noncentral moment $\mu$ and the second central moment $\sigma^2$ is enough to say exactly which normal distribution we have. (Knowing the first noncentral moment $\mu$ and second noncentral moment $\mu^2 + \sigma^2$ is also enough, because then we can solve for $\mu$ and $\sigma.$)