Number of samples needed before sample maximum greater than some value

497 Views Asked by At

Let's say you have a standard normal distribution, and you are sampling from this $N$ times. How many samples will it take before the maximum observed value will be at least 3 (or in general some value $K$)?

To solve this problem I considered the CDF of the normal distribution for when $x >= 3 $. This gives that the probability of finding a value of $x >= 3$ in one sample is $0.0013499$. Since we know all our samples are independent of each other, the answer would appear to be the mean of the geometric distribution that results with $p=0.0013499$, which is $740.97$.

However, by simulating a large number of trials I found that the true answer is around 444 trials. (Here's the mathematica code to show this Table[Table[RandomVariate[NormalDistribution[]], {x, 1, 444}] // Max, {k, 1, 1000}] // Mean

This can also be verified mathematically by solving the reverse problem: the expected sample maximum from $N$ trials. Note that $[Pr(x <= K)]^{444}$ — the probability that the results from all 444 trials are less than k — constitutes a CDF for all 444 trials. From this the corresponding PDF (albeit in terms of Erf function) can be found by differentiating, and finding the expected value of this PDF (or letting mathematica approximate the integral numerically) indeed gives that 444 trials is sufficient to have an expected sample maximum of 3.

So why did my attempt to solve the problem overshoot the answer?

2

There are 2 best solutions below

0
On BEST ANSWER

If I read your post correctly (but beware that I checked none of the numerical values involved), you are successively solving two different problems.

In both cases, one is given a sequence $(X_n)_{n\geqslant1}$ i.i.d. standard normal and one considers its running maximum defined for every $n\geqslant1$ as $M_n=\max\{X_k\mid1\leqslant k\leqslant n\}$.

Approach "741": Let $\theta_3=E(T_3)$ where $T_3=\inf\{n\geqslant1\mid X_n\geqslant3\}$, then $\theta_3=P(X_1\geqslant3)^{-1}$ and you say that $\theta_3\approx741$.

Approach "444": Let $\mu_3=\inf\{n\geqslant1\mid E(M_n)\geqslant3\}$, then you say that $\mu_3\approx444$.

Since $T_3$ is also $T_3=\inf\{n\geqslant1\mid M_n\geqslant3\}$, one is considering either $$E(\inf\{n\geqslant1\mid M_n\geqslant3\})$$ or $$\inf\{n\geqslant1\mid E(M_n)\geqslant3\}$$ which need not coincide.

1
On

For given $p=Q(3)$ value $E[T]=1\times p+ (1+E[T])\times (1-p)=740.7967$ (approximately) as you have indicated, Simply your simulation must be incorrect, here is my very layman piece of Matlab code

N=100000;
res=zeros(1,N);
parfor (ii=1:N)
k=1; 
while(randn(1,1)<=3)
k=k+1;
end
res(ii)=k;
end
mean(res)

For my sample run it gave 742.5393