Imagine that somebody had chosen $N$ numbers from a normal distribution with mean $\mu$ and variance $1$ ($\mu$ is unknown to you) and only showed you all $n \le N$ numbers which are greater that $\mu$. Is there a way to find an unbiased estimator of $\mu$ based on the given sample?
This does not come from any textbook, I've came up with this problem recently (maybe it's even known but I haven't found anything), so feel free to play with the conditions (for example you may assume that $N$ is known or not or even assume an unknown variance). I found it interesting because in some situations you are presented with only one side of the coin and you somehow have to make a judgement out of the evidence you have.
There are obviously some easy estimates (as for example a minimum which seems to be a $MLE$) but they are biased. Also I understand that since the mean of folded normal distribution has such a terribly looking formula the estimator might not be that pretty. And also what about other distributions? For example for uniform on $[0,\theta]$ the variable greater than the mean will have expectation $\frac{3}{4}\theta$ so the unbiased estimator will be $\hat{\theta}=\frac{4}{3}\overline{X}$ (or $\frac{2}{3}\overline{X}$ as an estimator for the mean).
2026-03-26 11:16:30.1774523790
Estimating mean from a biased sample
113 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
Interesting, apparently original question. Only a skeletal answer is possible without more specifics. Towards the end of your question, it seems you may be relaxing the assumption that the original data are normal, so my initial answer will be nonparametric.
Let vector
x.Ndenote the original, presumably random sample from the population, andx.rthe subsample of $n \approx N/2$ with all values exceeding $\mu.$Especially if $N$ is large and $\sigma$ is small, then you are right that
min(x.r)is a logical estimator of $\mu$ and the (upward) bias will be small. You can get a rough idea of the density of the population distribution near $\mu,$ and hence of the amount of bias by looking at the spacing of a few of the smallest values inx.r.Here is one way to implement this approach in R, for a sample of size $N = 500$ from $\mathsf{Norm}(\mu=100, \sigma = 15),$ rounded to two places.
For subsequent runs with initial samples from unknown seeds, some results were as follows: 99.96, 99.97, 99.98, 100.00, 100.05, 100.10, and 100.19. Bear in mind that the 95% margin of error based on the entire original sample is about $\pm 0.35.$
Addindum: (1) Unless $\sigma$ is large or $N$ is small, bias is hardly an issue, Because the person making the "half-sample" knows $\mu,$ you have a huge advantage over someone estimating $\mu$ from the mean of the whole sample. My estimator is less biased than just using
min(x.r), but still a little biased. In the scenario simulated above, 100,000 iterations give the expectation of the raw minimum to be 100.08, and the expectation of my estimator to be 100.04.(2) Another approach might be to estimate $\sigma$ from the half-sample, then to match $\hat \sigma$ with quantiles of the half-sample,
(3) If you want a theoretically optimal math solution for normal data, this is is the right site. For a full range of practical solutions--and in case this question is not as novel as I suppose--maybe also ask on our sister 'stat' site Cross-Validated, linking to this Question.
(4) Mention of an actual application would be helpful. In particular, what are typical values of $N?$ And are you mainly interested only in normal data?
[Recovering from hand surgery, so typing this 'hunt and peck'. Please excuse (any remaining) typos.]