Approximate distribution for sample mean of a small sample

846 Views Asked by At

Problem: Let $X_1,...,X_n$ be a random sample from a distribution with the density function $$f(x)=6x(1-x), \quad if \space 0\lt x\lt 1 \quad and\space 0 \space elsewhere$$

$$\bar{X}_n=\frac{X_1+...+X_n}{n}$$

(1) what is an approximate distribution of $\bar {X_n}$?

(2) For a sample of size 77, what are an approximate mean and variance for $\bar{X_n}$?

From the given density function, the pdf of random variable $X$ seems to exhibit Beta distribution with $\alpha = 2$ and $\beta=2$

(1)

By knowing MGF of $M_\bar{X_n}$, which is equal to $M_{X_n}=(\frac{t}{n})^n$, we can then plug it into MGF of Beta distribution with $\alpha = 2$ and $\beta=2$ and given conditions. However, that will be very difficult to compute or maybe there is another way to solve the problem?

(2)

By knowing the distribution and pdf of $\bar{X_n}$, we can find the mean by solving for $E[X]$ over 1 to 77 and variance by using $E[X^2]-(E[X])^2$ ?

2

There are 2 best solutions below

4
On BEST ANSWER

The PDF of $X$ is already bell-shaped, so even for small $n$, the sample mean is well-approximated by a normal distribution. Indeed, we can explicitly calculate $$f_{\bar X_2}(z) = \begin{cases} \frac{96}{5} z^3 (5-10z+4z^2), & 0 \le z \le 1/2, \\ \frac{96}{5} (1-z^3)(-1+2z+4z^2), & 1/2 < z \le 1, \\ 0, & \text{otherwise}, \end{cases}$$ and see that its plot on $[0,1]$ is quite well approximated with a normal distribution with mean $\mu = 1/2$ and variance $\sigma^2 = 1/40$. In general, you would expect that the sample mean will be approximately normal with mean and variance $$\mu = \frac{1}{2}, \quad \sigma^2 = \frac{1}{20n}.$$ In fact, in the answer to part (b), this is the exact mean and variance, because this calculation does not rely on a normal approximation, only the linearity of expectation and the linearity of variance for independent random variables.


But perhaps we can do better for part (a)? Indeed, we might argue that we could approximate the sampling distribution by a suitable beta distribution itself. How would we do this? We recall that the variance of a $\operatorname{Beta}(a,a)$ distribution will be $\frac{1}{4(2a+1)}$, thus if we equate this with the variance of the sampling distribution of the sample mean $\frac{1}{20n}$, we get $$a = \frac{5n-1}{2}.$$ So we could conjecture that perhaps a better approximation could be $$\bar X_n \sim \operatorname{Beta}\left(\tfrac{1}{2}(5n-1), \tfrac{1}{2}(5n-1)\right).$$ As an exercise, I invite you to ascertain whether this is indeed "better." In fact, what might one mean by saying that one approximation is "better" than another? In what sense could you quantitatively judge this?

1
On

According to Wikipedia on 'Beta distribution', if $X \sim \mathsf{Beta}(\alpha, \beta),$ then $E(X) = \frac{\alpha}{\alpha+\beta}$ and $Var(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}.$ In particular, for $X \sim \mathsf{Beta}(2,2),$ we have $\mu = E(X) = 1/2$ and $\sigma^2 = Var(X) = 1/20,$ which are easy to find from the PDF. Even for relatively small $n$ and $X_i \sim \mathsf{Beta}(2,2)$, the sample mean $\bar X \stackrel{aprx}{\sim} \mathsf{Norm}(\mu_n = \mu,\, \sigma_n = \sqrt{\sigma^2/n}).$

I see that while I have been typing this @Heropup (+1) has given you a similar answer. So I will only add simulated results for a million iterations for $\bar X_{17},$ which agree with the analytic results to two or three decimal places. The sketch shows good agreement with the normal approximation (blue density curve).

[Addendum: After seeing @Heropup's suggested Beta approximation, I added a dashed red curve to the sketch to show the PDF of $\mathsf{Beta}(43,43);$ at the scale of the plot, the beta and normal PDF curves are distinguishable only near the mean. Of course, the support of beta is $(0,1)$ and the support of normal is $(-\infty,\infty),$ but the approximating normal PDF is very nearly $0$ outside $(0,1).$]

m = 10^6;  n = 17
a = replicate(m, mean(rbeta(n, 2, 2)))
mean(a);  sd(a)
## 0.4999954    # aprx expected value of sample mean
## 0.05421685   # aprx SD of sample mean

enter image description here