Estimate the sample distribution given mean and root mean square

66 Views Asked by At

Question: Given a dataset $\{x_k\}_{k=1}^n$ with $n=500000$ points which have an average $\frac1n\sum_{k=1}^nx_n=13.06$ and a square root mean $\sqrt{\frac1n\sum_{k=1}^nx_n^2}=13.67$. Under this condition, at most how many data points could have a value greater than 17.1? please give a nontrivial upper bound.

A first glimpse of this question reminded me of something like Chebyshev inequality. But the question asks for the distribution of the sample points, so I think such results in probability theory may fail to work here. So I tried to use some elementary algebra approaches, but I don’t know where to start with.

Many thanks.

1

There are 1 best solutions below

0
On

Whether you call it Cantelli's inequality or a one-sided Chebyshev inequality, you need to find the variance $13.67^2 - 13.06^2 = 16.3053$ and then look at $$500000 \times \left(\dfrac{16.3053}{16.3053+(17.1-13.06)^2}\right) \approx 249875.1$$ but counts have to be integers so the maximum possible number of observations over $17.1$ is $249875$, almost half of the sample size; this is not much of a surprise as the difference between $17.1$ and the mean is just over one standard deviation

So for example if $249875$ observations were actually all $17.100001665 > 17.1$ and $250125$ observations were actually all $9.024036317$, then you would recover the $13.06$ and $13.67$ of the question to $9$ decimal places